Latent Theme Dictionary Model for Finding Co-occurrent Patterns in Process Data

  • PDF / 689,262 Bytes
  • 37 Pages / 547.087 x 737.008 pts Page_size
  • 35 Downloads / 163 Views

DOWNLOAD

REPORT


LATENT THEME DICTIONARY MODEL FOR FINDING CO-OCCURRENT PATTERNS IN PROCESS DATA

Guanhua Fang and Zhiliang Ying COLUMBIA UNIVERSITY

Process data, which are temporally ordered sequences of categorical observations, are of recent interest due to its increasing abundance and the desire to extract useful information. A process is a collection of time-stamped events of different types, recording how an individual behaves in a given time period. The process data are too complex in terms of size and irregularity for the classical psychometric models to be directly applicable and, consequently, new ways for modeling and analysis are desired. We introduce herein a latent theme dictionary model for processes that identifies co-occurrent event patterns and individuals with similar behavioral patterns. Theoretical properties are established under certain regularity conditions for the likelihood-based estimation and inference. A nonparametric Bayes algorithm using the Markov Chain Monte Carlo method is proposed for computation. Simulation studies show that the proposed approach performs well in a range of situations. The proposed method is applied to an item in the 2012 Programme for International Student Assessment with interpretable findings. Key words: latent theme dictionary model, process data, co-occurrent pattern, identifiability.

1. Introduction Process data are temporally ordered data with categorical observations. Such data are ubiquitous and common in e-commerce (online purchases), social networking services and computerbased educational assessments. In large scale computer-based tests, analyzing process data has gained much attention and becomes a core task in the next generation of assessment; see, for example, 2012 and 2015 Programme for International Student Assessment (PISA; OECD 2014b, 2016), 2012 Programme for International Assessment of Adult Competencies (PIAAC; Goodman et al. 2013), Assessment and Teaching of 21st Century Skills (ATC21S; Griffin et al. 2012). In such technology-rich tests, there are problem-solving items which require the examinee to perform a number of actions before submitting final answers. These actions and their corresponding times are sequentially recorded and saved in a log file. Such log file data could provide extra information about the examinee’s latent structure that is not available to traditional paper-based tests, in which only final responses (correct / incorrect) are collected. Similar to item response theory (IRT; Lord 1980) models and diagnostic classification models (DCMs; Templin et al. 2010), it is important to characterize item and examinees’ characteristics through the calibration of item and person parameters in the analysis of process data. However, process data are much more complicated in the sense that events occur at irregular time points and event sequence length varies from one examinee to another. Different examinees may have different reaction speeds in addition to varied action patterns to complete the task. In addition, different examinees may have diff