Description Usage Arguments Details Value Warning Note Author(s) References See Also Examples

View source: R/family.categorical.R

Fits a cumulative link regression model to a (preferably ordered) factor response.

1 2 |

`link` |
Link function applied to the |

`parallel` |
A logical or formula specifying which terms have
equal/unequal coefficients.
See below for more information about the parallelism assumption.
The default results in what some people call the
The |

`reverse` |
Logical.
By default, the cumulative probabilities used are
This should be set to |

`multiple.responses` |
Logical.
Multiple responses? If |

`whitespace` |
See |

In this help file the response *Y* is assumed to be a factor
with ordered values *1,2,…,J+1*.
Hence *M* is the number of linear/additive predictors
*eta_j*;
for `cumulative()`

one has *M=J*.

This VGAM family function fits the class of
*cumulative link models* to (hopefully) an ordinal response.
By default, the *non-parallel* cumulative logit model is fitted, i.e.,

*
eta_j = logit(P[Y<=j])*

where *j=1,2,…,M* and
the *eta_j* are not constrained to be parallel.
This is also known as the *non-proportional odds model*.
If the logit link is replaced by a complementary log-log link
(`clogloglink`

) then
this is known as the *proportional-hazards model*.

In almost all the literature, the constraint matrices associated
with this family of models are known. For example, setting
`parallel = TRUE`

will make all constraint matrices
(except for the intercept) equal to a vector of *M* 1's.
If the constraint matrices are equal, unknown and to be estimated, then
this can be achieved by fitting the model as a
reduced-rank vector generalized
linear model (RR-VGLM; see `rrvglm`

).
Currently, reduced-rank vector generalized additive models
(RR-VGAMs) have not been implemented here.

An object of class `"vglmff"`

(see `vglmff-class`

).
The object is used by modelling functions such as `vglm`

,
and `vgam`

.

No check is made to verify that the response is ordinal if the
response is a matrix;
see `ordered`

.

The response should be either a matrix of counts (with row sums that
are all positive), or a factor. In both cases, the `y`

slot
returned by `vglm`

/`vgam`

/`rrvglm`

is the matrix
of counts.
The formula must contain an intercept term.
Other VGAM family functions for an ordinal response include
`acat`

,
`cratio`

,
`sratio`

.
For a nominal (unordered) factor response, the multinomial
logit model (`multinomial`

) is more appropriate.

With the logit link, setting `parallel = TRUE`

will fit a
proportional odds model. Note that the `TRUE`

here does
not apply to the intercept term.
In practice, the validity of the proportional odds assumption
needs to be checked, e.g., by a likelihood ratio test (LRT).
If acceptable on the data,
then numerical problems are less likely to occur during the fitting,
and there are less parameters. Numerical problems occur when
the linear/additive predictors cross, which results in probabilities
outside of *(0,1)*; setting `parallel = TRUE`

will help avoid
this problem.

Here is an example of the usage of the `parallel`

argument.
If there are covariates `x2`

, `x3`

and `x4`

, then
`parallel = TRUE ~ x2 + x3 -1`

and
`parallel = FALSE ~ x4`

are equivalent. This would constrain
the regression coefficients for `x2`

and `x3`

to be
equal; those of the intercepts and `x4`

would be different.

If the data is inputted in *long* format
(not *wide* format, as in `pneumo`

below)
and the self-starting initial values are not good enough then
try using
`mustart`

,
`coefstart`

and/or
`etatstart`

.
See the example below.

To fit the proportional odds model one can use the
VGAM family function `propodds`

.
Note that `propodds(reverse)`

is equivalent to
`cumulative(parallel = TRUE, reverse = reverse)`

(which is
equivalent to
`cumulative(parallel = TRUE, reverse = reverse, link = "logitlink")`

).
It is for convenience only. A call to
`cumulative()`

is preferred since it reminds the user
that a parallelism assumption is made, as well as being a lot
more flexible.

Thomas W. Yee

Agresti, A. (2013).
*Categorical Data Analysis*,
3rd ed. Hoboken, NJ, USA: Wiley.

Agresti, A. (2010).
*Analysis of Ordinal Categorical Data*,
2nd ed. Hoboken, NJ, USA: Wiley.

Dobson, A. J. and Barnett, A. (2008).
*An Introduction to Generalized Linear Models*,
3rd ed. Boca Raton, FL, USA: Chapman & Hall/CRC Press.

McCullagh, P. and Nelder, J. A. (1989).
*Generalized Linear Models*, 2nd ed. London: Chapman & Hall.

Simonoff, J. S. (2003).
*Analyzing Categorical Data*,
New York: Springer-Verlag.

Yee, T. W. (2010).
The VGAM package for categorical data analysis.
*Journal of Statistical Software*,
**32**, 1–34.
https://www.jstatsoft.org/v32/i10/.

Yee, T. W. and Wild, C. J. (1996).
Vector generalized additive models.
*Journal of the Royal Statistical Society, Series B, Methodological*,
**58**, 481–493.

`propodds`

,
`R2latvar`

,
`ordsup`

,
`prplot`

,
`margeff`

,
`acat`

,
`cratio`

,
`sratio`

,
`multinomial`

,
`pneumo`

,
`Links`

,
`hdeff.vglm`

,
`logitlink`

,
`probitlink`

,
`clogloglink`

,
`cauchitlink`

,
`gordlink`

,
`pordlink`

,
`nbordlink`

,
`logistic1`

.

1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 49 50 | ```
# Fit the proportional odds model, p.179, in McCullagh and Nelder (1989)
pneumo <- transform(pneumo, let = log(exposure.time))
(fit <- vglm(cbind(normal, mild, severe) ~ let,
cumulative(parallel = TRUE, reverse = TRUE), data = pneumo))
depvar(fit) # Sample proportions (good technique)
fit@y # Sample proportions (bad technique)
weights(fit, type = "prior") # Number of observations
coef(fit, matrix = TRUE)
constraints(fit) # Constraint matrices
apply(fitted(fit), 1, which.max) # Classification
apply(predict(fit, newdata = pneumo, type = "response"),
1, which.max) # Classification
R2latvar(fit)
# Check that the model is linear in let ----------------------
fit2 <- vgam(cbind(normal, mild, severe) ~ s(let, df = 2),
cumulative(reverse = TRUE), data = pneumo)
## Not run: plot(fit2, se = TRUE, overlay = TRUE, lcol = 1:2, scol = 1:2)
# Check the proportional odds assumption with a LRT ----------
(fit3 <- vglm(cbind(normal, mild, severe) ~ let,
cumulative(parallel = FALSE, reverse = TRUE), data = pneumo))
pchisq(2 * (logLik(fit3) - logLik(fit)),
df = length(coef(fit3)) - length(coef(fit)), lower.tail = FALSE)
lrtest(fit3, fit) # More elegant
# A factor() version of fit ----------------------------------
# This is in long format (cf. wide format above)
Nobs <- round(depvar(fit) * c(weights(fit, type = "prior")))
sumNobs <- colSums(Nobs) # apply(Nobs, 2, sum)
pneumo.long <-
data.frame(symptoms = ordered(rep(rep(colnames(Nobs), nrow(Nobs)),
times = c(t(Nobs))),
levels = colnames(Nobs)),
let = rep(rep(with(pneumo, let), each = ncol(Nobs)),
times = c(t(Nobs))))
with(pneumo.long, table(let, symptoms)) # Should be same as pneumo
(fit.long1 <- vglm(symptoms ~ let, data = pneumo.long, trace = TRUE,
cumulative(parallel = TRUE, reverse = TRUE)))
coef(fit.long1, matrix = TRUE) # Should be as coef(fit, matrix = TRUE)
# Could try using mustart if fit.long1 failed to converge.
mymustart <- matrix(sumNobs / sum(sumNobs),
nrow(pneumo.long), ncol(Nobs), byrow = TRUE)
fit.long2 <- vglm(symptoms ~ let, mustart = mymustart,
cumulative(parallel = TRUE, reverse = TRUE),
data = pneumo.long, trace = TRUE)
coef(fit.long2, matrix = TRUE) # Should be as coef(fit, matrix = TRUE)
``` |

```
Loading required package: stats4
Loading required package: splines
Call:
vglm(formula = cbind(normal, mild, severe) ~ let, family = cumulative(parallel = TRUE,
reverse = TRUE), data = pneumo)
Coefficients:
(Intercept):1 (Intercept):2 let
-9.676093 -10.581725 2.596807
Degrees of Freedom: 16 Total; 13 Residual
Residual deviance: 5.026826
Log-likelihood: -25.09026
normal mild severe
1 1.0000000 0.00000000 0.00000000
2 0.9444444 0.03703704 0.01851852
3 0.7906977 0.13953488 0.06976744
4 0.7291667 0.10416667 0.16666667
5 0.6274510 0.19607843 0.17647059
6 0.6052632 0.18421053 0.21052632
7 0.4285714 0.21428571 0.35714286
8 0.3636364 0.18181818 0.45454545
normal mild severe
1 1.0000000 0.00000000 0.00000000
2 0.9444444 0.03703704 0.01851852
3 0.7906977 0.13953488 0.06976744
4 0.7291667 0.10416667 0.16666667
5 0.6274510 0.19607843 0.17647059
6 0.6052632 0.18421053 0.21052632
7 0.4285714 0.21428571 0.35714286
8 0.3636364 0.18181818 0.45454545
[,1]
1 98
2 54
3 43
4 48
5 51
6 38
7 28
8 11
logit(P[Y>=2]) logit(P[Y>=3])
(Intercept) -9.676093 -10.581725
let 2.596807 2.596807
$`(Intercept)`
[,1] [,2]
[1,] 1 0
[2,] 0 1
$let
[,1]
[1,] 1
[2,] 1
1 2 3 4 5 6 7 8
1 1 1 1 1 1 1 3
1 2 3 4 5 6 7 8
1 1 1 1 1 1 1 3
[1] 0.5142885
Call:
vglm(formula = cbind(normal, mild, severe) ~ let, family = cumulative(parallel = FALSE,
reverse = TRUE), data = pneumo)
Coefficients:
(Intercept):1 (Intercept):2 let:1 let:2
-9.593308 -11.104791 2.571300 2.743550
Degrees of Freedom: 16 Total; 12 Residual
Residual deviance: 4.884404
Log-likelihood: -25.01905
[1] 0.7058849
Likelihood ratio test
Model 1: cbind(normal, mild, severe) ~ let
Model 2: cbind(normal, mild, severe) ~ let
#Df LogLik Df Chisq Pr(>Chisq)
1 12 -25.019
2 13 -25.090 1 0.1424 0.7059
symptoms
let normal mild severe
1.75785791755237 98 0 0
2.70805020110221 51 2 1
3.06805293513362 34 6 3
3.31418600467253 35 5 8
3.51154543883102 32 10 9
3.67630067190708 23 7 8
3.8286413964891 12 6 10
3.94158180766969 4 2 5
VGLM linear loop 1 : deviance = 428.98474
VGLM linear loop 2 : deviance = 411.73439
VGLM linear loop 3 : deviance = 408.69065
VGLM linear loop 4 : deviance = 408.54867
VGLM linear loop 5 : deviance = 408.54833
VGLM linear loop 6 : deviance = 408.54833
Call:
vglm(formula = symptoms ~ let, family = cumulative(parallel = TRUE,
reverse = TRUE), data = pneumo.long, trace = TRUE)
Coefficients:
(Intercept):1 (Intercept):2 let
-9.676092 -10.581724 2.596806
Degrees of Freedom: 742 Total; 739 Residual
Residual deviance: 408.5483
Log-likelihood: -204.2742
logit(P[Y>=2]) logit(P[Y>=3])
(Intercept) -9.676092 -10.581724
let 2.596806 2.596806
VGLM linear loop 1 : deviance = 428.98474
VGLM linear loop 2 : deviance = 411.73439
VGLM linear loop 3 : deviance = 408.69065
VGLM linear loop 4 : deviance = 408.54867
VGLM linear loop 5 : deviance = 408.54833
VGLM linear loop 6 : deviance = 408.54833
logit(P[Y>=2]) logit(P[Y>=3])
(Intercept) -9.676092 -10.581724
let 2.596806 2.596806
```

Embedding an R snippet on your website

Add the following code to your website.

For more information on customizing the embed code, read Embedding Snippets.