Categorical Data Analysis ( Wiley Series in Probability and Statistics )

Publication series :Wiley Series in Probability and Statistics

Author: Alan Agresti  

Publisher: John Wiley & Sons Inc‎

Publication year: 2014

E-ISBN: 9781118710852

P-ISBN(Hardback):  9780470463635

Subject: O212.4 Multivariate Analyses

Keyword: nullnull

Language: ENG

Access to resources Favorite

Disclaimer: Any content in publications that violate the sovereignty, the constitution or regulations of the PRC is not accepted or approved by CNPIEC.

Categorical Data Analysis

Description

Praise for the Second Edition

"A must-have book for anyone expecting to do research and/or applications in categorical data analysis."
Statistics in Medicine

"It is a total delight reading this book."
Pharmaceutical Research

"If you do any analysis of categorical data, this is an essential desktop reference."
Technometrics

The use of statistical methods for analyzing categorical data has increased dramatically, particularly in the biomedical, social sciences, and financial industries. Responding to new developments, this book offers a comprehensive treatment of the most important methods for categorical data analysis.

Categorical Data Analysis, Third Edition summarizes the latest methods for univariate and correlated multivariate categorical responses. Readers will find a unified generalized linear models approach that connects logistic regression and Poisson and negative binomial loglinear models for discrete data with normal regression for continuous data. This edition also features:

  • An emphasis on logistic and probit regression methods for binary, ordinal, and nominal responses for independent observations and for clustered data with marginal models and random effects models
  • Two new chapters on alternative methods for binary response data, including smoothing and regularization methods, classification methods such as linear discriminant analysis and classification trees, and cluster analysis
  • New sections introducing the Bayesian approach for methods in that chapter
  • More than 100 analyses of data sets and over 600 exercises
  • Notes at the end of each chapter that provide references to recent research and topics not covered in the text, linked to a bibliography of more than 1,200 sources
  • A supplementary website showing how to use R and SAS; for all examples in the text, with information also about SPSS and Stata and with exercise solutions

Categorical Data Analysis, Third Edition is an invaluable tool for statisticians and methodologists, such as biostatisticians and researchers in the social and behavioral sciences, medicine and public health, marketing, education, finance, biological and agricultural sciences, and industrial quality control.

Chapter

1.4 Statistical Inference for Binomial Parameters

1.4.1 Tests About a Binomial Parameter

1.4.2 Confidence Intervals for a Binomial Parameter

1.4.3 Example: Estimating the Proportion of Vegetarians

1.4.4 Exact Small-Sample Inference and the Mid P- Value

1.5 Statistical Inference for Multinomial Parameters

1.5.1 Estimation of Multinomial Parameters

1.5.2 Pearson Chi-Squared Test of a Specified Multinomial

1.5.3 Likelihood-Ratio Chi-Squared Test of a Specified Multinomial

1.5.4 Example: Testing Mendel's Theories

1.5.5 Testing with Estimated Expected Frequencies

1.5.6 Example: Pneumonia Infections in Calves

1.5.7 Chi-Squared Theoretical Justification

1.6 Bayesian Inference for Binomial and Multinomial Parameters

1.6.1 The Bayesian Approach to Statistical Inference

1.6.2 Binomial Estimation: Beta and Logit-Normal Prior Distributions

1.6.3 Multinomial Estimation: Dirichlet Prior Distributions

1.6.4 Example: Estimating Vegetarianism Revisited

1.6.5 Binomial and Multinomial Estimation: Improper Priors

Notes

Exercises

2 Describing Contingency Tables

2.1 Probability Structure for Contingency Tables

2.1.1 Contingency Tables

2.1.2 Joint/Marginal/Conditional Distributions for Contingency Tables

2.1.3 Example: Sensitivity and Specificity for Medical Diagnoses

2.1.4 Independence of Categorical Variables

2.1.5 Poisson, Binomial, and Multinomial Sampling

2.1.6 Example: Seat Belts and Auto Accident Injuries

2.1.7 Example: Case–Control Study of Cancer and Smoking

2.1.8 Types of Studies: Observational Versus Experimental

2.2 Comparing Two Proportions

2.2.1 Difference of Proportions

2.2.2 Relative Risk

2.2.3 Odds Ratio

2.2.4 Properties of the Odds Ratio

2.2.5 Example: Association Between Heart Attacks and Aspirin Use

2.2.6 Case–Control Studies and the Odds Ratio

2.2.7 Relationship Between Odds Ratio and Relative Risk

2.3 Conditional Association in Stratified 2 × 2 Tables

2.3.1 Partial Tables

2.3.2 Example: Racial Characteristics and the Death Penalty

2.3.3 Conditional and Marginal Odds Ratios

2.3.4 Marginal Independence Versus Conditional Independence

2.3.5 Homogeneous Association

2.3.6 Collapsibility: Identical Conditional and Marginal Associations

2.4 Measuring Association in I × J Tables

2.4.1 Odds Ratios in I x J Tables

2.4.2 Association Factors

2.4.3 Summary Measures of Association

2.4.4 Ordinal Trends: Concordant and Discordant Pairs

2.4.5 Ordinal Measure of Association: Gamma

2.4.6 Probabilistic Comparisons of Two Ordinal Distributions

2.4.7 Example: Comparing Pain Ratings After Surgery

2.4.8 Correlation for Underlying Normality

Exercises

Notes

3 Inference for Two-Way Contingency Tables

3.1 Confidence Intervals for Association Parameters

3.1.1 Interval Estimation of the Odds Ratio

3.1.2 Example: Seat-Belt Use and Traffic Deaths

3.1.3 Interval Estimation of Difference of Proportions and Relative Risk

3.1.4 Example: Aspirin and Heart Attacks Revisited

3.1.5 Deriving Standard Errors with the Delta Method

3.1.6 Delta Method Applied to the Sample Logit

3.1.7 Delta Method for the Log Odds Ratio

3.1.8 Simultaneous Confidence Intervals for Multiple Comparisons

3.2 Testing Independence in Two-way Contingency Tables

3.2.1 Pearson and Likelihood-Ratio Chi-Squared Tests

3.2.2 Example: Education and Belief in God

3.2.3 Adequacy of Chi-Squared Approximations

3.2.4 Chi-Squared and Comparing Proportions in 2 x 2 Tables

3.2.5 Score Confidence Intervals Comparing Proportions

3.2.6 Profile Likelihood Confidence Intervals

3.3 Following-up Chi-Squared Tests

3.3.1 Pearson Residuals and Standardized Residuals

3.3.2 Example: Education and Belief in God Revisited

3.3.3 Partitioning Chi-Squared

3.3.4 Example: Origin of Schizophrenia

3.3.5 Rules for Partitioning

3.3.6 Summarizing the Association

3.3.7 Limitations of Chi-Squared Tests

3.3.8 Why Consider Independence If It's Unlikely to Be True?

3.4 Two-Way Tables with Ordered Classifications

3.4.1 Linear Trend Alternative to Independence

3.4.2 Example: Is Happiness Associated with Political Ideology?

3.4.3 Monotone Trend Alternatives to Independence

3.4.4 Extra Power with Ordinal Tests

3.4.5 Sensitivity to Choice of Scores

3.4.6 Example: Infant Birth Defects by Maternal Alcohol Consumption

3.4.7 Trend Tests for I x 2 and 2 x J Tables

3.4.8 Nominal-Ordinal Tables

3.5 Small-Sample Inference for Contingency Tables

3.5.1 Fisher's Exact Test for 2 x 2 Tables

3.5.2 Example: Fisher's Tea Drinker

3.5.3 Two-Sided P-Values for Fisher's Exact Test

3.5.4 Confidence Intervals Based on Conditional Likelihood

3.5.5 Discreteness and Conservatism Issues

3.5.6 Small-Sample Unconditional Tests of Independence

3.5.7 Conditional Versus Unconditional Tests

3.6 Bayesian Inference for Two-way Contingency Tables

3.6.1 Prior Distributions for Comparing Proportions in 2 x 2 Tables

3.6.2 Posterior Probabilities Comparing Proportions

3.6.3 Posterior Intervals for Association Parameters

3.6.4 Example: Urn Sampling Gives Highly Unbalanced Treatment Allocation

3.6.5 Highest Posterior Density Intervals

3.6.6 Testing Independence

3.6.7 Empirical Bayes and Hierarchical Bayesian Approaches

3.7 Extensions for Multiway Tables and Nontabulated Responses

3.7.1 Categorical Data Need Not Be Contingency Tables

Notes

Exercises

4 Introduction to Generalized Linear Models

4.1 The Generalized Linear Model

4.1.1 Components of Generalized Linear Models

4.1.2 Binomial Logit Models for Binary Data

4.1.3 Poisson Loglinear Models for Count Data

4.1.4 Generalized Linear Models for Continuous Responses

4.1.5 Deviance of a GLM

4.1.6 Advantages of GLMs Versus Transforming the Data

4.2 Generalized Linear Models for Binary Data

4.2.1 Linear Probability Model

4.2.2 Example: Snoring and Heart Disease

4.2.3 Logistic Regression Model

4.2.4 Binomial GLM for 2 x 2 Contingency Tables

4.2.5 Probit and Inverse cdf Link Functions

4.2.6 Latent Tolerance Motivation for Binary Response Models

4.3 Generalized Linear Models for Counts and Rates

4.3.1 Poisson Loglinear Models

4.3.2 Example: Horseshoe Crab Mating

4.3.3 Overdispersion for Poisson GLMs

4.3.4 Negative Binomial GLMs

4.3.5 Poisson Regression for Rates Using Offsets

4.3.6 Example: Modeling Death Rates for Heart Valve Operations

4.3.7 Poisson GLM of Independence in Two-Way Contingency Tables

4.4 Moments and Likelihood for Generalized Linear Models

4.4.1 The Exponential Dispersion Family

4.4.2 Mean and Variance Functions for the Random Component

4.4.3 Mean and Variance Functions for Poisson and Binomial GLMs

4.4.4 Systematic Component and Link Function of a GLM

4.4.5 Likelihood Equations for a GLM

4.4.6 The Key Role of the Mean–Variance Relationship

4.4.7 Likelihood Equations for Binomial GLMs

4.4.8 Asymptotic Covariance Matrix of Model Parameter Estimators

4.4.9 Likelihood Equations and cov(β) for Poisson Loglinear Model

4.5 Inference and Model Checking for Generalized Linear Models

4.5.1 Deviance and Goodness of Fit

4.5.2 Deviance for Poisson GLMs

4.5.3 Deviance for Binomial GLMs: Grouped Versus Ungrouped Data

4.5.4 Likelihood-Ratio Model Comparison Using the Deviances

4.5.5 Score Tests for Goodness of Fit and for Model Comparison

4.5.6 Residuals for GLMs

4.5.7 Covariance Matrices for Fitted Values and Residuals

4.5.8 The Bayesian Approach for GLMs

4.6 Fitting Generalized Linear Models

4.6.1 Newton–Raphson Method

4.6.2 Fisher Scoring Method

4.6.3 Newton–Raphson and Fisher Scoring for Binary Data

4.6.4 ML as Iterative Reweighted Least Squares

4.6.5 Simplifications for Canonical Link Functions

4.7 Quasi-Likelihood and Generalized Linear Models

4.7.1 Mean–Variance Relationship Determines Quasi-likelihood Estimates

4.7.2 Overdispersion for Poisson GLMs and Quasi-likelihood

4.7.3 Overdispersion for Binomial GLMs and Quasi-likelihood

4.7.4 Example: Teratology Overdispersion

Notes

Exercises

5 Logistic Regression

5.1 Interpreting Parameters in Logistic Regression

5.1.1 Interpreting β: Odds, Probabilities, and Linear Approximations

5.1.2 Looking at the Data

5.1.3 Example: Horseshoe Crab Mating Revisited

5.1.4 Logistic Regression with Retrospective Studies

5.1.5 Logistic Regression Is Implied by Normal Explanatory Variables

5.2 Inference for Logistic Regression

5.2.1 Inference About Model Parameters and Probabilities

5.2.2 Example: Inference for Horseshoe Crab Mating Data

5.2.3 Checking Goodness of Fit: Grouped and Ungrouped Data

5.2.4 Example: Model Goodness of Fit for Horseshoe Crab Data

5.2.5 Checking Goodness of Fit with Ungrouped Data by Grouping

5.2.6 Wald Inference Can Be Suboptimal

5.3 Logistic Models with Categorical Predictors

5.3.1 ANOVA-Type Representation of Factors

5.3.2 Indicator Variables Represent a Factor

5.3.3 Example: Alcohol and Infant Malformation Revisited

5.3.4 Linear Logit Model for I × 2 Contingency Tables

5.3.5 Cochran–Armitage Trend Test

5.3.6 Example: Alcohol and Infant Malformation Revisited

5.3.7 Using Directed Models Can Improve Inferential Power

5.3.8 Noncentral Chi-Squared Distribution and Power for Narrower Alternatives

5.3.9 Example: Skin Damage and Leprosy

5.3.10 Model Smoothing Improves Precision of Estimation

5.4 Multiple Logistic Regression

5.4.1 Logistic Models for Multiway Contingency Tables

5.4.2 Example: AIDS and AZT Use

5.4.3 Goodness of Fit as a Likelihood-Ratio Test

5.4.4 Model Comparison by Comparing Deviances

5.4.5 Example: Horseshoe Crab Satellites Revisited

5.4.6 Quantitative Treatment of Ordinal Predictor

5.4.7 Probability-Based and Standardized Interpretations

5.4.8 Estimating an Average Causal Effect

5.5 Fitting Logistic Regression Models

5.5.1 Likelihood Equations for Logistic Regression

5.5.2 Asymptotic Covariance Matrix of Parameter Estimators

5.5.3 Distribution of Probability Estimators

5.5.4 Newton–Raphson Method Applied to Logistic Regression

Notes

Exercises

6 Building, Checking, and Applying Logistic Regression Models

6.1 Strategies in Model Selection

6.1.1 How Many Explanatory Variables Can Be in the Model?

6.1.2 Example: Horseshoe Crab Mating Data Revisited

6.1.3 Stepwise Procedures: Forward Selection and Backward Elimination

6.1.4 Example: Backward Elimination for Horseshoe Crab Data

6.1.5 Model Selection and the "Correct" Model

6.1.6 AIC: Minimizing Distance of the Fit from the Truth

6.1.7 Example: Using Causal Hypotheses to Guide Model Building

6.1.8 Alternative Strategies, Including Model Averaging

6.2 Logistic Regression Diagnostics

6.2.1 Residuals: Pearson, Deviance, and Standardized

6.2.2 Example: Heart Disease and Blood Pressure

6.2.3 Example: Admissions to Graduate School at Florida

6.2.4 Influence Diagnostics for Logistic Regression

6.3 Summarizing the Predictive Power of a Model

6.3.1 Summarizing Predictive Power: R and R-Squared Measures

6.3.2 Summarizing Predictive Power: Likelihood and Deviance Measures

6.3.3 Summarizing Predictive Power: Classification Tables

6.3.4 Summarizing Predictive Power: ROC Curves

6.3.5 Example: Evaluating Predictive Power for Horseshoe Crab Data

6.4 Mantel–Haenszel and Related Methods for Multiple 2 × 2 Tables

6.4.1 Using Logistic Models to Test Conditional Independence

6.4.2 Cochran–Mantel–Haenszel Test of Conditional Independence

6.4.3 Example: Multicenter Clinical Trial Revisited

6.4.4 CMH Test Is Advantageous for Sparse Data

6.4.5 Estimation of Common Odds Ratio

6.4.6 Meta-analyses for Summarizing Multiple 2 x 2 Tables

6.4.7 Meta-analyses for Multiple 2 x 2 Tables: Difference of Proportions

6.4.8 Collapsibility and Logistic Models for Contingency Tables

6.4.9 Testing Homogeneity of Odds Ratios

6.4.10 Summarizing Heterogeneity in Odds Ratios

6.4.11 Propensity Scores in Observational Studies

6.5 Detecting and Dealing with Infinite Estimates

6.5.1 Complete or Quasi-complete Separation

6.5.2 Example: Multicenter Clinical Trial with Few Successes

6.5.3 Remedies When at Least One ML Estimate Is Infinite

6.6 Sample Size and Power Considerations

6.6.1 Sample Size and Power for Comparing Two Proportions

6.6.2 Sample Size Determination in Logistic Regression

6.6.3 Sample Size in Multiple Logistic Regression

6.6.4 Power for Chi–Squared Tests in Contingency Tables

6.6.5 Power for Testing Conditional Independence

6.6.6 Effects of Sample Size on Model Selection and Inference

Notes

Exercises

7 Alternative Modeling of Binary Response Data

7.1 Probit and Complementary Log-log Models

7.1.1 Probit Models: Three Latent Variable Motivations

7.1.2 Probit Models: Interpreting Effects

7.1.3 Probit Model Fitting

7.1.4 Example: Modeling Flour Beetle Mortality

7.1.5 Complementary Log–Log Link Models

7.1.6 Example: Beetle Mortality Revisited

7.2 Bayesian Inference for Binary Regression

7.2.1 Prior Specifications for Binary Regression Models

7.2.2 Example: Risk Factors for Endometrial Cancer Grade

7.2.3 Bayesian Logistic Regression for Retrospective Studies

7.2.4 Probability–Based Prior Specifications for Binary Regression Models

7.2.5 Example: Modeling the Probability a Trauma Patient Survives

7.2.6 Bayesian Fitting for Probit Models

7.2.7 Bayesian Model Checking for Binary Regression

7.3 Conditional Logistic Regression

7.3.1 Conditional Likelihood

7.3.2 Small-Sample Inference for a Logistic Regression Parameter

7.3.3 Small-Sample Conditional Inference for 2 x 2 Contingency Tables

7.3.4 Small-Sample Conditional Inference for Linear Logit Model

7.3.5 Small-Sample Tests of Conditional Independence in 2 x 2 x K Tables

7.3.6 Example: Promotion Discrimination

7.3.7 Discreteness Complications of Using Exact Conditional Inference

7.4 Smoothing: Kernels, Penalized Likelihood, Generalized Additive Models

7.4.1 How Much Smoothing? The Variance/Bias Trade-off

7.4.2 Kernel Smoothing

7.4.3 Example: Smoothing to Portray Probability of Kyphosis

7.4.4 Nearest Neighbors Smoothing

7.4.5 Smoothing Using Penalized Likelihood Estimation

7.4.6 Why Shrink Estimates Toward 0?

7.4.7 Firth's Penalized Likelihood for Logistic Regression

7.4.8 Example: Complete Separation but Finite Logistic Estimates

7.4.9 Generalized Additive Models

7.4.10 Example: GAMs for Horseshoe Crab Mating Data

7.4.11 Advantages/Disadvantages of Various Smoothing Methods

7.5 Issues in Analyzing High–Dimensional Categorical Data

7.5.1 Issues in Selecting Explanatory Variables

7.5.2 Adjusting for Multiplicity: The Bonferroni Method

7.5.3 Adjusting for Multiplicity: The False Discovery Rate

7.5.4 Other Variable Selection Methods with High–Dimensional Data

7.5.5 Examples: High–Dimensional Applications in Genomics

7.5.6 Example: Motif Discovery for Protein Sequences

7.5.7 Example: The Netflix Prize

7.5.8 Example: Credit Scoring

Notes

Exercises

8 Models for Multinomial Responses

8.1 Nominal Responses: Baseline–Category Logit Models

8.1.1 Baseline–Category Logits

8.1.2 Example: Alligator Food Choice

8.1.3 Estimating Response Probabilities

8.1.4 Fitting Baseline–Category Logistic Models

8.1.5 Multicategory Logit Model as a Multivariate GLM

8.1.6 Multinomial Probit Models

8.1.7 Example: Effect of Menu Pricing

8.2 Ordinal Responses: Cumulative Logit Models

8.2.1 Cumulative Logits

8.2.2 Proportional Odds Form of Cumulative Logit Model

8.2.3 Latent Variable Motivation for Proportional Odds Structure

8.2.4 Example: Happiness and Traumatic Events

8.2.5 Checking the Proportional Odds Assumption

8.3 Ordinal Responses: Alternative Models

8.3.1 Cumulative Link Models

8.3.2 Cumulative Probit and Log-Log Models

8.3.3 Example: Happiness Revisited with Cumulative Probits

8.3.4 Adjacent–Categories Logit Models

8.3.5 Example: Happiness Revisited

8.3.6 Continuation–Ratio Logit Models

8.3.7 Example: Developmental Toxicity Study with Pregnant Mice

8.3.8 Stochastic Ordering Location Effects Versus Dispersion Effects

8.3.9 Summarizing Predictive Power of Explanatory Variables

8.4 Testing Conditional Independence in I × J × K Tables

8.4.1 Testing Conditional Independence Using Multinomial Models

8.4.2 Example: Homosexual Marriage and Religious Fundamentalism

8.4.3 Generalized Cochran-Mantel–Haenszel Tests for I x J x K Tables

8.4.4 Example: Homosexual Marriage Revisited

8.4.5 Related Score Tests for Multinomial Logit Models

8.5 Discrete-Choice Models

8.5.1 Conditional Logits for Characteristics of the Choices

8.5.2 Multinomial Logit Model Expressed as Discrete-Choice Model

8.5.3 Example: Shopping Destination Choice

8.5.4 Multinomial Probit Discrete–Choice Models

8.5.5 Extensions: Nested Logit and Mixed Logit Models

8.5.6 Extensions: Discrete Choice with Ordered Categories

8.6 Bayesian Modeling of Multinomial Responses

8.6.1 Bayesian Fitting of Cumulative Link Models

8.6.2 Example: Cannabis Use and Mother's Age

8.6.3 Bayesian Fitting of Multinomial Logit and Probit Models

8.6.4 Example: Alligator Food Choice Revisited

Notes

Exercises

9 Loglinear Models for Contingency Tables

9.1 Loglinear Models for Two-way Tables

9.1.1 Independence Model for a Two-Way Table

9.1.2 Interpretation of Loglinear Model Parameters

9.1.3 Saturated Model for a Two-Way Table

9.1.4 Alternative Parameter Constraints

9.1.5 Hierarchical Versus Nonhierarchical Models

9.1.6 Multinomial Models for Cell Probabilities

9.2 Loglinear Models for Independence and Interaction in Three-way Tables

9.2.1 Types of Independence

9.2.2 Homogeneous Association and Three-Factor Interaction

9.2.3 Interpretation of Loglinear Model Parameters

9.2.4 Example: Alcohol, Cigarette, and Marijuana Use

9.3 Inference for Loglinear Models

9.3.1 Chi-Squared Goodness-of-Fit Tests

9.3.2 Inference about Conditional Associations

9.4 Loglinear Models for Higher Dimensions

9.4.1 Models for Four–Way Contingency Tables

9.4.2 Example: Automobile Accidents and Seat-Belt Use

9.4.3 Large Samples and Statistical Versus Practical Significance

9.4.4 Dissimilarity Index

9.5 Loglinear—Logistic Model Connection

9.5.1 Using Logistic Models to Interpret Loglinear Models

9.5.2 Example: Auto Accidents and Seat-Belts Revisited

9.5.3 Equivalent Loglinear and Logistic Models

9.5.4 Example: Detecting Gene–Environment Interactions in Case–Control Studies

9.6 Loglinear Model Fitting: Likelihood Equations and Asymptotic Distributions

9.6.1 Minimal Sufficient Statistics

9.6.2 Likelihood Equations for Loglinear Models

9.6.3 Unique ML Estimates Match Data in Sufficient Marginal Tables

9.6.4 Direct Versus Iterative Calculation of Fitted Values

9.6.5 Decomposable Models

9.6.6 Chi-Squared Goodness-of-Fit Tests

9.6.7 Covariance Matrix of ML Parameter Estimators

9.6.8 Connection Between Multinomial and Poisson Loglinear Models

9.6.9 Distribution of Probability Estimators

9.6.10 Proof of Uniqueness of ML Estimates

9.6.11 Pseudo ML for Complex Sampling Designs

9.7 Loglinear Model Fitting: Iterative Methods and Their Application

9.7.1 Newton-Raphson Method

9.7.2 Iterative Proportional Fitting

9.7.3 Comparison of IPF and Newton–Raphson Iterative Methods

9.7.4 Raking a Table: Contingency Table Standardization

Notes

Exercises

10 Building and Extending Loglinear Models

10.1 Conditional Independence Graphs and Collapsibility

10.1.1 Conditional Independence Graphs

10.1.2 Graphical Loglinear Models

10.1.3 Collapsibility in Three–Way Contingency Tables

10.1.4 Collapsibility for Multiway Tables

10.2 Model Selection and Comparison

10.2.1 Considerations in Model Selection

10.2.2 Example: Model Building for Student Survey

10.2.3 Loglinear Model Comparison Statistics

10.2.4 Partitioning Chi-Squared with Model Comparisons

10.2.5 Identical Marginal and Conditional Tests of Independence

10.3 Residuals for Detecting Cell-Specific Lack of Fit

10.3.1 Residuals for Loglinear Models

10.3.2 Example: Student Survey Revisited

10.3.3 Identical Loglinear and Logistic Standardized Residuals

10.4 Modeling Ordinal Associations

10.4.1 Linear-by-Linear Association Model for Two-Way Tables

10.4.2 Corresponding Logistic Model for Adjacent Responses

10.4.3 Likelihood Equations and Model Fitting

10.4.4 Example: Sex and Birth Control Opinions Revisited

10.4.5 Directed Ordinal Test of Independence

10.4.6 Row Effects and Column Effects Association Models

10.4.7 Example: Estimating Category Scores for Premarital Sex

10.4.8 Ordinal Variables in Models for Multiway Tables

10.5 Generalized Loglinear and Association Models, Correlation Models, and Correspondence Analysis

10.5.1 Generalized Loglinear Model

10.5.2 Multiplicative Row and Column Effects Model

10.5.3 Example: Mental Health and Parents' SES

10.5.4 Correlation Models

10.5.5 Correspondence Analysis

10.5.6 Model Selection and Score Choice for Ordinal Variables

10.6 Empty Cells and Sparseness in Modeling Contingency Tables

10.6.1 Empty Cells: Sampling Versus Structural Zeros

10.6.2 Existence of Estimates in Loglinear Models

10.6.3 Effects of Sparseness on X2, G2, and Model-Based Tests

10.6.4 Alternative Sparse Data Asymptotics

10.6.5 Adding Constants to Cells of a Contingency Table

10.7 Bayesian Loglinear Modeling

10.7.1 Estimating Loglinear Model Parameters in Two-Way Tables

10.7.2 Example: Polarized Opinions by Political Party

10.7.3 Bayesian Loglinear Modeling of Multidimensional Tables

10.7.4 Graphical Conditional Independence Models

Notes

Exercises

11 Models for Matched Pairs

11.1 Comparing Dependent Proportions

11.1.1 Confidence Intervals Comparing Dependent Proportions

11.1.2 McNemar Test Comparing Dependent Proportions

11.1.3 Example: Changes in Presidential Election Voting

11.1.4 Increased Precision with Dependent Samples

11.1.5 Small-Sample Test Comparing Dependent Proportions

11.1.6 Connection Between McNemar and Cochran-Mantel–Haenszel Tests

11.1.7 Subject-Specific and Population–Averaged (Marginal) Tables

11.2 Conditional Logistic Regression for Binary Matched Pairs

11.2.1 Subject–Specific Versus Marginal Models for Matched Pairs

11.2.2 Logistic Models with Subject-Specific Probabilities

11.2.3 Conditional ML Inference for Binary Matched Pairs

11.2.4 Random Effects in Binary Matched-Pairs Model

11.2.5 Conditional Logistic Regression for Matched Case–Control Studies

11.2.6 Conditional Logistic Regression for Matched Pairs with Multiple Predictors

11.2.7 Marginal Models and Subject-Specific Models: Extensions

11.3 Marginal Models for Square Contingency Tables

11.3.1 Marginal Models for Nominal Classifications

11.3.2 Example: Regional Migration

11.3.3 Marginal Models for Ordinal Classifications

11.3.4 Example: Opinions on Premarital and Extramarital Sex

11.4 Symmetry, Quasi-Symmetry, and Quasi-Independence

11.4.1 Symmetry as Logistic and Loglinear Models

11.4.2 Quasi-symmetry

11.4.3 Marginal Homogeneity and Quasi-symmetry

11.4.4 Quasi–independence

11.4.5 Example: Migration Revisited

11.4.6 Ordinal Quasi-symmetry

11.4.7 Example: Premarital and Extramarital Sex Revisited

11.5 Measuring Agreement Between Observers

11.5.1 Agreement: Departures from Independence

11.5.2 Using Quasi–independence to Analyze Agreement

11.5.3 Quasi-symmetry and Agreement Modeling

11.5.4 Kappa: A Summary Measure of Agreement

11.5.5 Weighted Kappa: Quantifying Disagreement

11.5.6 Extensions to Multiple Observers

11.6 Bradley-Terry Model for Paired Preferences

11.6.1 Bradley-Terry Model

11.6.2 Example: Major League Baseball Rankings

11.6.3 Example: Home Team Advantage in Baseball

11.6.4 Bradley-Terry Model and Quasi-symmetry

11.6.5 Extensions to Ties and Ordinal Pairwise Evaluations

11.7 Marginal Models and Quasi-Symmetry Models for Matched Sets

11.7.1 Marginal Homogeneity, Complete Symmetry, and Quasi-symmetry

11.7.2 Types of Marginal Symmetry

11.7.3 Comparing Binary Marginal Distributions in Multiway Tables

11.7.4 Example: Attitudes Toward Legalized Abortion

11.7.5 Marginal Homogeneity for a Multicategory Response

11.7.6 Wald and Generalized CMH Score Tests of Marginal Homogeneity

Notes

Exercises

12 Clustered Categorical Data: Marginal and Transitional Models

12.1 Marginal Modeling: Maximum Likelihood Approach

12.1.1 Example: Longitudinal Study of Mental Depression

12.1.2 Modeling a Repeated Multinomial Response

12.1.3 Example: Insomnia Clinical Trial

12.1.4 ML Fitting of Marginal Logistic Models: Constraints on Cell Probabilities

12.1.5 ML Fitting of Marginal Logistic Models: Other Methods

12.2 Marginal Modeling: Generalized Estimating Equations (GEEs) Approach

12.2.1 Generalized Estimating Equations Methodology: Basic Ideas

12.2.2 Example: Longitudinal Mental Depression Revisited

12.2.3 Example: Multinomial GEE Approach for Insomnia Trial

12.3 Quasi-Likelihood and Its GEE Multivariate Extension: Details

12.3.1 The Univariate Quasi-likelihood Method

12.3.2 Properties of Quasi–likelihood Estimators

12.3.3 Sandwich Covariance Adjustment for Variance Misspecification

12.3.4 GEE Multivariate Methodology: Technical Details

12.3.5 Working Associations Characterized by Odds Ratios

12.3.6 GEE Approach: Multinomial Responses

12.3.7 Dealing with Missing Data

12.4 Transitional Models: Markov Chain and Time Series Models

12.4.1 Markov Chains

12.4.2 Example: Changes in Evapotranspiration Rates

12.4.3 Transitional Models with Explanatory Variables

12.4.4 Example: Child's Respiratory Illness and Maternal Smoking

12.4.5 Example: Initial Response in Matched Pair as a Covariate

12.4.6 Transitional Models and Loglinear Conditional Models

Notes

Exercises

13 Clustered Categorical Data: Random Effects Models

13.1 Random Effects Modeling of Clustered Categorical Data

13.1.1 Generalized Linear Mixed Model

13.1.2 Logistic GLMM with Random Intercept for Binary Matched Pairs

13.1.3 Example: Changes in Presidential Voting Revisited

13.1.4 Extension: Rasch Model and Item Response Models

13.1.5 Random Effects Versus Conditional ML Approaches

13.2 Binary Responses: Logistic-Normal Model

13.2.1 Shared Random Effect Implies Nonnegative Marginal Correlations

13.2.2 Interpreting Heterogeneity in Logistic-Normal Models

13.2.3 Connections Between Random Effects Models and Marginal Models

13.2.4 Comments About GLMMs Versus Marginal Models

13.3 Examples of Random Effects Models for Binary Data

13.3.1 Example: Small–Area Estimation of Binomial Proportions

13.3.2 Modeling Repeated Binary Responses: Attitudes About Abortion

13.3.3 Example: Longitudinal Mental Depression Study Revisited

13.3.4 Example: Capture–Recapture Prediction of Population Size

13.3.5 Example: Heterogeneity Among Multicenter Clinical Trials

13.3.6 Meta-analysis Using a Random Effects Approach

13.3.7 Alternative Formulations of Random Effects Models

13.3.8 Example: Matched Pairs with a Bivariate Binary Response

13.3.9 Time Series Models Using Autocorrelated Random Effects

13.3.10 Example: Oxford and Cambridge Annual Boat Race

13.4 Random Effects Models for Multinomial Data

13.4.1 Cumulative Logit Model with Random Intercept

13.4.2 Example: Insomnia Study Revisited

13.4.3 Example: Combining Measures on Ordinal Items

13.4.4 Example: Cluster Sampling

13.4.5 Baseline-Category Logit Models with Random Effects

13.4.6 Example: Effectiveness of Housing Program

13.5 Multilevel Modeling

13.5.1 Hierarchical Random Terms: Partitioning Variability

13.5.2 Example: Children's Care for an Unmarried Mother

13.6 GLMM Fitting, Inference, and Prediction

13.6.1 Marginal Likelihood and Maximum Likelihood Fitting

13.6.2 Gauss–Hermite Quadrature Methods for ML Fitting

13.6.3 Monte Carlo and EM Methods for ML Fitting

13.6.4 Laplace and Penalized Quasi-likelihood Approximations to ML

13.6.5 Inference for GLMM Parameters

13.6.6 Prediction Using Random Effects

13.7 Bayesian Multivariate Categorical Modeling

13.7.1 Marginal Homogeneity Analyses for Matched Pairs

13.7.2 Bayesian Approaches to Meta-analysis and Multicenter Trials

13.7.3 Example: Bayesian Analyses for a Multicenter Trial

13.7.4 Bayesian GLMMs and Marginal Models

Notes

Exercises

14 Other Mixture Models for Discrete Data

14.1 Latent Class Models

14.1.1 Independence Given a Latent Categorical Variable

14.1.2 Fitting Latent Class Models

14.1.3 Example: Latent Class Model for Rater Agreement

14.1.4 Example: Latent Class Models for Capture-Recapture

14.1.5 Example: Latent Class Transitional Models

14.2 Nonparametric Random Effects Models

14.2.1 Logistic Models with Unspecified Random Effects Distribution

14.2.2 Example: Attitudes About Legalized Abortion

14.2.3 Example: Nonparametric Mixing of Logistic Regressions

14.2.4 Is Misspecification of Random Effects a Serious Problem?

14.2.5 Rasch Mixture Model

14.2.6 Example: Modeling Rater Agreement Revisited

14.2.7 Nonparametric Mixtures and Quasi-symmetry

14.2.8 Example: Attitudes About Legalized Abortion Revisited

14.3 Beta-Binomial Models

14.3.1 Beta-Binomial Distribution

14.3.2 Models Using the Beta-Binomial Distribution

14.3.3 Quasi-likelihood with Beta-Binomial Type Variance

14.3.4 Example: Teratology Overdispersion Revisited

14.3.5 Conjugate Mixture Models

14.4 Negative Binomial Regression

14.4.1 Gamma Mixture of Poissons Is Negative Binomial

14.4.2 Negative Binomial Regression Modeling

14.4.3 Example: Frequency of Knowing Homicide Victims

14.5 Poisson Regression with Random Effects

14.5.1 A Poisson GLMM

14.5.2 Marginal Model Implied by Poisson GLMM

14.5.3 Example: Homicide Victim Frequency Revisited

14.5.4 Negative Binomial Models versus Poisson GLMMs

Notes

Exercises

15 Non-Model-Based Classification and Clustering

15.1 Classification: Linear Discriminant Analysis

15.1.1 Classification with Normally Distributed Predictors

15.1.2 Example: Horseshoe Crab Satellites Revisited

15.1.3 Multicategory Classification and Other Versions of Discriminant Analysis

15.1.4 Classification Methods for High Dimensions

15.1.5 Discriminant Analysis Versus Logistic Regression

15.2 Classification: Tree-Structured Prediction

15.2.1 Classification Trees

15.2.2 Example: Classification Tree for a Health Care Application

15.2.3 How Does the Classification Tree Grow?

15.2.4 Pruning a Tree and Checking Prediction Accuracy

15.2.5 Classification Trees Versus Logistic Regression

15.2.6 Support Vector Machines for Classification

15.3 Cluster Analysis for Categorical Data

15.3.1 Supervised Versus Unsupervised Learning

15.3.2 Measuring Dissimilarity Between Observations

15.3.3 Clustering Algorithms: Partitions and Hierarchies

15.3.4 Example: Clustering States on Election Results

Notes

Exercises

16 Large- and Small-Sample Theory for Multinomial Models

16.1 Delta Method

16.1.1 O, o Rates of Convergence

16.1.2 Delta Method for a Function of a Random Variable

16.1.3 Delta Method for a Function of a Random Vector

16.1.4 Asymptotic Normality of Functions of Multinomial Counts

16.1.5 Delta Method for a Vector Function of a Random Vector

16.1.6 Joint Asymptotic Normality of Log Odds Ratios

16.2 Asymptotic Distributions of Estimators of Model Parameters and Cell Probabilities

16.2.1 Asymptotic Distribution of Model Parameter Estimator

16.2.2 Asymptotic Distribution of Cell Probability Estimators

16.2.3 Model Smoothing Is Beneficial

16.3 Asymptotic Distributions of Residuals and Goodness-of-fit Statistics

16.3.1 Joint Asymptotic Normality of p and π

16.3.2 Asymptotic Distribution of Pearson and Standardized Residuals

16.3.3 Asymptotic Distribution of Pearson X2 Statistic

16.3.4 Asymptotic Distribution of Likelihood-Ratio Statistic

16.3.5 Asymptotic Noncentral Distributions

16.4 Asymptotic Distributions for Logit/Loglinear Models

16.4.1 Asymptotic Covariance Matrices

16.4.2 Connection with Poisson Loglinear Models

16.5 Small-Sample Significance Tests for Contingency Tables

16.5.1 Exact Conditional Distribution for I x J Tables Under Independence

16.5.2 Exact Tests of Independence for I x J Tables

16.5.3 Example: Sexual Orientation and Party ID

16.6 Small-Sample Confidence Intervals for Categorical Data

16.6.1 Small-Sample CIs for a Binomial Parameter

16.6.2 CIs Based on Tests Using the Mid P- Value

16.6.3 Example: Proportion of Vegetarians Revisited

16.6.4 Small-Sample CIs for Odds Ratios

16.6.5 Example: Fisher's Tea Taster Revisited

16.6.6 Small-Sample CIs for Logistic Regression Parameters

16.6.7 Example: Diarrhea and an Antibiotic

16.6.8 Unconditional Small-Sample CIs for Difference of Proportions

16.7 Alternative Estimation Theory for Parametric Models

16.7.1 Weighted Least Squares for Categorical Data

16.7.2 Inference Using the WLS Approach to Model Fitting

16.7.3 Scope of WLS Versus ML Estimation

16.7.4 Minimum Chi-Squared Estimators

16.7.5 Minimum Discrimination Information

Notes

Exercises

17 Historical Tour of Categorical Data Analysis

17.1 Pearson-Yule Association Controversy

17.2 R. A. Fisher's Contributions

17.3 Logistic Regression

17.4 Multiway Contingency Tables and Loglinear Models

17.5 Bayesian Methods for Categorical Data

17.6 A Look Forward, and Backward

Appendix A Statistical Software for Categorical Data Analysis

Appendix B Chi-Squared Distribution Values

References

Author Index

Example Index

Subject Index

The users who browse this book also browse