Research Topics

The work of the department is divided into seven topics:

Methodology of large-scale assessment studies

Large-scale assessment studies such as PISA, PIRLS, or TIMSS focus on determining the distribution of academic competencies in different content domains (e.g., mathematics, reading) and the relationship of these competencies to key background variables (e.g., socioeconomic status). The analysis is based on item response theory (IRT) procedures to scale the test items and then to determine the distribution of the competencies and their relationships to covariates. The department mainly works on the following topics in this area:

  1. development and evaluation of IRT models in large-scale assessment studies [1],
  2. evaluation of linking procedures and linking errors in estimating group comparisons or trends, especially in the presence of Differential Item Functioning (DIF) [2,3,4],
  3. methods for estimating the background model for generating plausible values [5].

[1] Robitzsch, A. & Lüdtke, O. (2022). Some thoughts on analytical choices in the scaling model for test scores in international large-scale assessments. Measurement Instruments for the Social Sciences 4, 9 https://doi.org/10.1186/s42409-022-00039-w

[2] Robitzsch, A. & Lüdtke, O. (2022). Mean comparisons of many groups in the presence of DIF: An evaluation of linking and scaling approaches. Journal of Educational and Behavioral Statistics, 47, 36–68. https://doi.org/10.3102/10769986211017479

[3] Robitzsch, A., & Lüdtke, O. (2019). Linking errors in international large-scale assessments: Calculation of standard errors for trend estimation. Assessment in Education: Principles, Policy & Practice, 26, 444-465. https://doi.org/10.1080/0969594X.2018.1433633

[4] Robitzsch, A., & Lüdtke, O. (2023). Comparing different trend estimation approaches in country means and standard deviations in international large-scale assessment studies. Large Scale Assessments in Education, 11, 26. https://doi.org/10.1186/s40536-023-00176-6

[5] Grund, S., Lüdtke, O., & Robitzsch, A. (2021). On the treatment of missing data in background questionnaires in educational large-scale assessments: An evaluation of different procedures. Journal of Educational and Behavioral Statistics, 46, 430-465. https://doi.org/10.3102/1076998620959058

Estimation of latent variable models

Latent variable model estimation approaches are evaluated and developed that allow psychological constructs such as school competencies and personality to be analyzed flexibly over time and across different groups. Emphasis is placed on approaches to stabilize parameter estimates in latent variable models, especially with small sample sizes. This is done using Markov chain Monte Carlo and penalized maximum likelihood methods [1,2,3]. Further work in this area focuses on developing robust estimation procedures for latent variable models [4] and adequate modeling of occupational interests using circumplex models [5,6].

[1] Lüdtke, O., Robitzsch, A., & Wagner, J. (2018). More stable estimation of the STARTS model: A Bayesian approach using Markov Chain Monte Carlo techniques. Psychological Methods, 23(3), 570-593. https://doi.org/10.1037/met0000155

[2] Lüdtke, O., Ulitzsch, E., & Robitzsch, A. (2021). A comparison of penalized maximum likelihood estimation and Markov chain Monte Carlo techniques for estimating confirmatory factor analysis models with small sample sizes. Frontiers in Psychology, 12, 615162. https://doi.org/10.3389/fpsyg.2021.615162

[3] Ulitzsch, E., Lüdtke, O., & Robitzsch, A. (2023). Alleviating estimation problems in small sample structural equation modeling – A comparison of constrained maximum likelihood, Bayesian, and fixed reliability single indicators approaches. Psychological Methods, 3, 527–557.  https://doi.org/10.1037/met0000435

[4] Robitzsch, A. (2023). Model-robust estimation of multiple-group structural equation models. Algorithms, 16(4), 210. https://doi.org/10.3390/a16040210

[5] Nagy, G., Etzel, J., & Lüdtke, O. (2019). Integrating covariates into circumplex structures: An extension procedure for Browne’s circular stochastic process model. Multivariate Behavioral Research, 54(3), 404­–428. https://doi.org/10.1080/00273171.2018.1534678

[6] Nagy, G., Brunner, M., Lüdtke, O., & Greiff, S. (2017). Extension procedures for confirmatory factor analysis. The Journal of Experimental Education, 85(4), 574–596. https://doi.org/10.1080/00220973.2016.1260524

Statistical modeling of test-taking behavior

Part of the research conducted in this area is devoted to the development and testing of statistical procedures for the identification of responses with low diagnostic content (careless and non-engaged responses). So-called position-effect-based IRT models for identifying sloppy responses in questionnaires [1] and achievement tests [2] have been presented. Procedures have also been developed for using response times in computer-administered questionnaires [3,4] and achievement tests [5] to identify sloppy and non-engaged responses, combining them with item position effects [6].

Further research is devoted to developing and applying exploratory procedures of sequence pattern analysis to study problem-solving processes in simulated environments (for example, simulated web environments). Clustering procedures for action sequences from interactive tasks that cluster groups of typical sequences allowing the identification of different processing strategies have been proposed [7,8]. Similarly, machine learning techniques have shown that it is possible to predict the success of the applied strategy with the actions performed at the beginning of the solution process [9].

[1] Ulitzsch, E., Yildirim‐Erbasli, S. N., Gorgun, G., & Bulut, O. (2022). An explanatory mixture IRT model for careless and insufficient effort responding in self‐report measures. British Journal of Mathematical and Statistical Psychology, 75(3), 668-698. https://doi.org/10.1111/bmsp.12272

[2] Nagy, G., Nagengast, B., Frey, A., Becker, M., & Rose, N. (2019). A multilevel study of position effects in PISA achievement tests: Student-and school-level predictors in the German tracked school system. Assessment in Education: Principles, Policy & Practice, 26(4), 422-443. https://doi.org/10.1080/0969594X.2018.1449100

[3] Ulitzsch, E., Pohl, S., Khorramdel, L., Kroehne, U., & von Davier, M. (2022). A response-time-based latent response mixture model for identifying and modeling careless and insufficient effort responding in survey data. Psychometrika, 87(2), 593-619. https://doi.org/10.1007/s11336-022-09846-w

[4] Ulitzsch, E., Shin, H. J., & Lüdtke, O. (2023). Accounting for careless and insufficient effort responding in large-scale survey data—Development, evaluation, and application of a screen-time-based weighting procedure. Behavior Research Methods, 1–22. https://doi.org/10.3758/s13428-022-02053-6

[5] Nagy, G., & Ulitzsch, E. (2022). A multilevel mixture IRT framework for modeling response times as predictors or indicators of response engagement in IRT models. Educational and Psychological Measurement, 82(5), 845-879. https://doi.org/10.1177/00131644211045351

[6] Nagy, G., Ulitzsch, E., & Lindner, M. A. (2023). The role of rapid guessing and test‐taking persistence in modelling test‐taking engagement. Journal of Computer Assisted Learning, 39(3), 751-766. https://doi.org/10.1111/jcal.12719

[7] Ulitzsch, E., He, Q., Ulitzsch, V., Molter, H., Nichterlein, A., Niedermeier, R., & Pohl, S. (2021). Combining clickstream analyses and graph-modeled data clustering for identifying common response processes. Psychometrika, 86, 190-214. https://doi.org/10.1007/s11336-020-09743-0

[8] Ulitzsch, E., He, Q., & Pohl, S. (2022). Using sequence mining techniques for understanding incorrect behavioral patterns on interactive tasks. Journal of Educational and Behavioral Statistics, 47(1), 3-35. https://doi.org/10.3102/10769986211010467

[9] Ulitzsch, E., Ulitzsch, V., He, Q., & Lüdtke, O. (2023). A machine learning-based procedure for leveraging clickstream data to investigate early predictability of failure on interactive tasks. Behavior Research Methods, 55(3), 1392-1412. https://doi.org/10.3758/s13428-022-01844-1

Multilevel models

Social context features, such as instruction or the social composition of a school, are important determinants of school learning outcomes. Multilevel structural equation models allow modeling of context effects and can correct for different types of measurement error ("doubly latent") [1,2]. Moreover, it has been shown that the use of Bayesian methods [3] can optimize the estimation of multilevel structural equation models in problematic data constellations (e.g., small number of classes, low reliability). Additional focus has been placed on the analysis of more complex multilevel structures, such as those encountered when collecting network data (such as round-robin designs in which students assess each other) or when assessing instruction from multiple perspectives (e.g., students, teachers, external observers) [4]. A general approach to evaluating the Social Relations Model (SRM) was developed based on integrating multilevel models with cross-classified random effects and structural equation models [5,6]. The approach is implemented in the R package srm.

[1] Lüdtke, O., Marsh, H.W., Robitzsch, A., Trautwein, U., Asparouhov, T. & Muthén, B. (2008). The multilevel latent covariate model: A new, more reliable approach to group-level effects in contextual studies. Psychological Methods, 13, 203-229. https://psycnet.apa.org/doi/10.1037/a0012869

[2] Lüdtke, O., Marsh, H. W., Robitzsch, A., & Trautwein, U. (2011). A 2x2 taxonomy of multilevel latent contextual model: Accuracy-bias trade-offs in full and partial error correction models. Psychological Methods, 16, 444–467. https://psycnet.apa.org/doi/10.1037/a0024376

[3] Zitzmann, S., Lüdtke, O., Robitzsch, A., & Marsh, H. W. (2016). A Bayesian approach to estimating latent contextual models. Structural Equation Modelling, 23, 661–679. https://doi.org/10.1080/10705511.2016.1207179

[4] Lüdtke, O., Robitzsch, A., Kenny, D. A., & Trautwein, U. (2013). A general and flexible approach to estimating the social relations model using Bayesian methods. Psychological Methods, 18, 101–119. https://psycnet.apa.org/doi/10.1037/a0029252

[5] Nestler, S., Lüdtke, O., & Robitzsch, A. (2022). Analyzing longitudinal social relations model data using the social relations structural equation model. Journal of Educational and Behavioral Statistics, 47, 231–260. https://doi.org/10.3102/10769986211056541

[6] Nestler, S., Lüdtke, O., & Robitzsch, A. (2020). Maximum likelihood estimation of a social relations structural equation model. Psychometrika, 85, 870–889. https://doi.org/10.1007/s11336-020-09728-z

Missing Data Methods

Statistical analyses in educational research are often complicated by missing data, i.e., data not actually available for every person selected for a study, because some people either omit individual questions or do not participate in the study at all. Missing values in a data set can lead to less efficient and biased parameter estimates - due to the data omission. The multiple imputation (MI) procedure uses an imputation model to generate multiple substitutions for the missing observations in a dataset that account for the uncertainty associated with the substitution. The department focuses on the following research topics:

  1. multiple imputation of data with hierarchical [1,2], cross-classified [3], or multiple-membership multilevel structure
  2. imputation of data when analytical models with nonlinear effects are of interest [4,5]
  3. statistical inference for multiply imputed data sets [6,7]
  4. imputation of data with a large number of variables
  5. methods for generating synthetic data [8].

[1] Grund, S., Lüdtke, O., & Robitzsch, A. (2018). Multiple imputation of multilevel data in organizational research. Organizational Research Methods, 21(1), 111-149. https://doi.org/10.1177/1094428117703686

[2] Grund, S., Lüdtke, O., & Robitzsch, A. (2018). Multiple imputation of missing data at level 2: A comparison of fully conditional and joint modeling in multilevel designs. Journal of Educational and Behavioral Statistics, 43(3), 316-353. https://doi.org/10.3102/1076998617738087

[3] Grund, S., Lüdtke, O., & Robitzsch, A. (2023). Handling missing data in cross-classified multilevel analyses: An evaluation of different multiple imputation approaches. Journal of Educational and Behavioral Statistics, 48, 454–489. https://doi.org/10.3102/10769986231151224

[4] Lüdtke, O., Robitzsch, A., & West, S. G. (2020). Regression models involving nonlinear effects with missing data: A sequential modeling approach using Bayesian estimation. Psychological Methods, 25, 157-181. http://dx.doi.org/10.1037/met0000233

[5] Grund, S., Lüdtke, O., & Robitzsch, A. (2021). Multiple imputation of missing data in multilevel models with the R package mdmb: A flexible sequential modeling approach. Behavior Research Methods, 53, 2631–2649. https://doi.org/10.3758/s13428-020-01530-0

[6] Grund, S., Lüdtke, O., & Robitzsch, A. (2016). Pooling ANOVA results from multiply imputed datasets: A simulation study. Methodology, 12, 75–88.

[7] Grund, S., Lüdtke, O., & Robitzsch, A. (2023). Pooling methods for likelihood ratio tests in multiply imputed data. Psychological Methods. https://doi.org/10.1037/met0000556

[8] Grund, S., Lüdtke, O., & Robitzsch, A. (2022). Using synthetic data to improve the reproducibility of statistical results in psychological research. Psychological Methods.

Estimation of causal effects

From an evidence-based education research perspective, robust causal inferences about the effectiveness of targeted changes in the education system are of particular interest. The research focuses on statistical methods that should allow at least a tentative causal interpretation of patterns of association even in the absence of randomization. Areas of interest include the evaluation of different weighting approaches (e.g., propensity score weights) for estimating causal effects when the data have a multilevel structure, and the treatment is at Level 1 (e.g., students receive tutoring vs. no tutoring) [1]. Further work has focused on the potential of longitudinal data to estimate causal effects [2]. In empirical educational research, cross-lagged panel designs are often implemented in which (at least) two variables (Xt and Yt) are collected over time. Conditions that must be met for a causal interpretation of cross-lagged effects have been highlighted [3]. Further focus is on the estimation of longitudinal treatment effects [4].

[1] Fuentes, A., Lüdtke, O., & Robitzsch, A. (2022). Causal inference with multilevel data: A comparison of different propensity score weighting approaches. Multivariate Behavioral Research, 57, 916-939. https://doi.org/10.1080/00273171.2021.1925521

[2] Lüdtke, O., & Robitzsch, A. (2023). ANCOVA vs. change score for the analysis of two-wave data. Journal of Experimental Education. https://doi.org/10.1080/00220973.2023.2246187

[3] Lüdtke, O. & Robitzsch, A. (2022). A comparison of different approaches for estimating cross-lagged effects from a causal inference perspective. Structural Equation Modeling, 29, 888–907. https://doi.org/10.1080/10705511.2022.2065278

[4] Lüdtke, O., & Robitzsch, A. (2020). Commentary regarding the section 'Modeling the effectiveness of teaching quality: Methodological challenges in assessing the causal effects of teaching. Zeitschrift für Pädagogik, 66, 210–222. https://psyarxiv.com/bpk4a/

Applications of statistical methods

The final topic area covers the application of statistical methods to answer substantive questions. The focus is on the following areas:

  1. development and structure of vocational interests [1,2].
  2. methodological case studies in large-scale assessment studies [3,4]
  3. assessment of school context and instructional effects [5,6]
  4. modeling individual personality and motivation [7,8,9].

[1] Etzel, J. M., Krey, L., & Nagy, G. (2023). We’ve come full circle: The universality of people-things and data-ideas as core dimensions of vocational interests. Journal of Vocational Behavior. https://doi.org/10.1016/j.jvb.2023.103897

[2] Etzel, J. M., & Nagy, G. (2021). Stability and change in vocational interest profiles and interest congruence over the course of vocational education and training. European Journal of Personality, 35, 534–556. https://doi.org/10.1177/08902070211014015

[3] Robitzsch, A., Lüdtke, O., Goldhammer, F., Kröhne, U., & Köller, O. (2020). Reanalysis of the German PISA data: A comparison of different approaches for trend estimation with a particular emphasis on mode effects. Frontiers in Psychology, 11:884. https://doi.org/10.3389/fpsyg.2020.00884

[4] Ulitzsch, E., Lüdtke, O., & Robitzsch, A (2023). The role of response style adjustments in cross-country comparisons – A case study using data from the PISA 2015 questionnaire. Educational Measurement: Issues and Practice. https://doi.org/10.1111/emip.12552

[5] Ruzek, E., Aldrup, K., & Lüdtke, O. (2022). Assessing the effects of student perceptions of instructional quality: A cross-subject within-student design. Contemporary Educational Psychology, 70 https://doi.org/10.1016/j.cedpsych.2022.102085

[6] Becker, M., Kocaj, A., Jansen, M., Dumont, H., & Lüdtke, O. (2022). Class-average achievement and individual achievement development: Testing achievement composition and peer spillover effects using five German longitudinal studies. Journal of Educational Psychology, 114, 177-197. https://doi.org/10.1037/edu0000519

[7] Jansen, M., Lüdtke, O., & Robitzsch, A. (2020). Disentangling different sources of stability and change in students’ academic self-concepts: An integrative data analysis using the STARTS model. Journal of Educational Psychology, 112, 1614–1631. https://doi.org/10.1037/edu0000448 

[8] Meyer, J., Jansen, T., Hübner, N., & Lüdtke, O. (2023). Disentangling the association between big five personality traits and student achievement: Meta-analytic evidence on the role of domain specificity and achievement measures. Educational Psychology Review, 35, 12. https://doi.org/10.1007/s10648-023-09736-2

[9] Wagner, J., Lüdtke, O., & Robitzsch, A. (2019). Does personality become more stable with age? Disentangling state and trait effects for the Big Five across the life span using local structural equation modeling. Journal of Personality and Social Psychology, 116, 666-680. https://doi.org/10.1037/pspp0000203