Tang: Subtask Analysis of Process Data Through a Predictive Model

In classic tests, item responses are often expressed as univariate categorical variables. Computer-based tests allow us to track participants’ entire problem-solving processes through log files. Such response process data contain rich information about respondents’ behavioral patterns and cognitive processes. However, standard statistical tools are not directly applicable because of the irregular data format and high noise level. In this talk, I will first give a general picture of process data analysis. Then I will introduce a method for effectively exploring respondents’ problem-solving strategies exhibited in their process data. This method segments a lengthy and noisy process into a sequence of subtasks to achieve complexity reduction, easy visualization, and meaningful interpretation. The segmentation is based on sequential action predictability measured by the Shannon entropy. The performance of the new method is examined through simulation studies and a case study of process data from the 2012 Programme for the International Assessment of Adult Competencies.