University of Limerick Institutional Repository

Multiple imputation for life-course sequence data

DSpace Repository

Show simple item record Halpin, Brendan 2014-01-28T13:51:48Z 2014-01-28T13:51:48Z 2012
dc.description non-peer-reviewed en_US
dc.description.abstract As holistic analysis of life-course sequences becomes more common, using optimal matching (OM) and other approaches the problem of missing data becomes more serious. Longitudinal data is prone to missingness in ways that cross-sectional is not. Existing solutions (e.g., coding for gaps) are not satisfactory, and deletion of gappy sequences causes bias. Multiple imputation seems promising, but standard implementations are not adapted for sequence data. I propose and demonstrate a Stata implementation of a chained multiple imputation procedure that “heals” gaps from both ends, taking account of the longitudinal nature of the measured information, and also constraining the imputations to respect this longitudinality. Using the sequence data alone, without auxiliary individual-level information, stable imputations with good characteristics are generated. Using additional information about the structure of data collection (which relates to mechanisms of missingness) gives better prediction models, but imputations that differ only subtly. Many sequence analysts proceed by cluster analysis of the matrix of pairwise OM distances between sequences. As a non-inferential procedure, this does not benefit from “Rubin’s Rules” for multiple imputation in averaging across estimations. I explore ways of clustering with multiplyimputed sequences that allow us to assess the variability due to imputation. I compare the results with an existing approach that codes gaps with a special missing value that is maximally different from all other states, and show that imputation performs better. In an example data set drawn from BHPS work-life histories, imputation of short internal gaps ( 12 months) increases the available sample size by approximately 25 percent. Moreover, the gappy sequences have a distinctly different distribution, with higher numbers of transitions, so deletion of gappy sequences distorts the sample badly. For typical longitudinal data sets, we can expect missingness to be related to the amount of instability in the career, and to proceed without imputation will cause serious bias. en_US
dc.language.iso eng en_US
dc.publisher Department of Sociology, University of Limerick en_US
dc.relation.ispartofseries University of Limerick Department of Sociology Working Paper Series;WP2012-01
dc.subject longitudinal data en_US
dc.subject sequence analysts en_US
dc.subject optimal matching en_US
dc.title Multiple imputation for life-course sequence data en_US
dc.type info:eu-repo/semantics/workingPaper en_US
dc.type.supercollection all_ul_research en_US
dc.rights.accessrights info:eu-repo/semantics/openAccess en_US

Files in this item

This item appears in the following Collection(s)

Show simple item record

Search ULIR


My Account