Statistical Evaluation of AI-Enabled Training Micro-Agents: A Longitudinal Analysis of Adoption and Learning Efficiency

Dr. Smrite Goudhaman Corresponding Author

Published: 24/03/2026

Keywords:AI agentsHuman-in-the-loop governanceLearning analyticsLongitudinal studyTraining effectivenessTechnology acceptanceTrust calibrationFrontline workforceHospitality operationsResponsible AI

This study undertakes a longitudinal evaluation of the effectiveness of AI-enabled training within a frontline hospitality context, which is an extension of the preceding pilot study designed within a doctoral research context. The pilot study, designed as a controlled study, recruited 100 frontline employees between September and November 2024, evaluating learning outcomes within a supervised context. Following the completion of the doctoral research, the AI-enabled training intervention was rolled out within the operational context during 2025, allowing for a longitudinal evaluation of learning outcomes within a real-world context.

The independent variable is the mode and maturation of AI-enabled training, which is operationalized as a transition between a supervised pilot study and a rollout of AI-enabled training within an operational context, utilizing AI micro-agents and human supervision. The dependent variables are learning adoption and learning efficiency, which are operationalized as objective learning-platform trace data, including: course completion, assessment performance, and time-on-task. The study context is characterized by high workforce turnover, making learning efficiency an important dependent variable, where an exposure-adjusted active employee framework is employed to minimize bias and identify patterns among active employees.

Descriptive statistics show that the completion percentage was 100% for the pilot phase, where all the active employees (656 records) completed the process. However, the percentage remained stable at 86.82% (or 8,505 records out of 9,796 records) for the 2025 scaled deployment period. The z-test revealed a significant difference between the two percentages, where the z-score was 11.52, and the difference was 13.18% (p < 0.001), corresponding to a 95% CI of [12.51, 13.85].

However, the most significant effect of the scaled deployment was the improvement in learning efficiency, where the mean assessment score increased from 80.10 (SD = 21.55) for the pilot phase to 83.38 (SD = 23.14) for the 2025 deployment, t(1108.70) = 4.29, p < 0.001, and a small effect size of 0.14, 95% CI = [1.78, 4.79]. The time-on-task also reduced from 10.30 minutes (SD = 11.53) for the pilot phase to 5.98 minutes (SD = 7.82) for the 2025 deployment, t(971.64) = -10.88, p < 0.001, and a medium effect size of -0.52, 95% CI = [-5.09, -3.53].

This study contributes to the rare research on the performance of AI micro-agents in the context of the doctoral pilot and its longitudinal effects on the post-dissertation period, providing a measurement framework for the evaluation of the effectiveness of AI-enabled learning systems.

Scroll to read the preview. Download for the complete document.

Amershi, S., Weld, D., Vorvoreanu, M., Fourney, A., Nushi, B., Collisson, P., … Horvitz, E. (2019). Guidelines for Human-AI Interaction. CHI 2019. https://doi.org/10.1145/3290605.3300233

Baldwin, T. T., & Ford, J. K. (1988). Transfer of training: A review and directions for future research. Personnel Psychology, 41(1), 63–105. https://doi.org/10.1111/j.1744-6570.1988.tb00632.x

Davis, F. D. (1989). Perceived usefulness, perceived ease of use, and user acceptance of information technology. MIS Quarterly, 13(3), 319–340. https://doi.org/10.2307/249008

Dietvorst, B. J., Simmons, J. P., & Massey, C. (2015). Algorithm aversion: People erroneously avoid algorithms after seeing them err. Journal of Experimental Psychology: General, 144(1), 114–126. https://doi.org/10.1037/xge0000033

Fügener, A., Grahl, J., Gupta, A., & Ketter, W. (2019). Cognitive challenges in human–AI collaboration (working paper). SSRN. https://doi.org/10.2139/ssrn.3368813

Hemmer, P., Schemmer, M., Kühl, N., Vössing, M., & Satzger, G. (2025). Complementarity in human–AI collaboration. European Journal of Information Systems. https://doi.org/10.1080/0960085X.2025.2475962

Hosen, S., Hamzah, S. R., Ismail, I. A., Alias, S. N., Abd Aziz, M. F., & Rahman, M. M. (2024). Training & development, career development, and organizational commitment as the predictor of work performance. Heliyon, 10(1), e23903. https://doi.org/10.1016/j.heliyon.2023.e23903

Madanchian, M., Taherdoost, H., & Mohamed, N. (2023). AI-based human resource management tools and techniques: A systematic literature review. Procedia Computer Science, 229, 367–377. https://doi.org/10.1016/j.procs.2023.12.039

Papagiannidis, E., Mikalef, P., & Conboy, K. (2025). Responsible artificial intelligence governance: A review and research framework. Journal of Strategic Information Systems, 34, Article 101885. https://doi.org/10.1016/j.jsis.2024.101885

Pappas, I. O., Giannakos, M. N., & Sampson, D. G. (2019). Fuzzy set analysis for learning systems: The role of complex concepts and human factors. Computers in Human Behavior, 92, 646–659. https://doi.org/10.1016/j.chb.2018.03.032

Podsakoff, P. M., MacKenzie, S. B., Lee, J. Y., & Podsakoff, N. P. (2003). Common method biases in behavioral research. Journal of Applied Psychology, 88(5), 879–903. https://doi.org/10.1037/0021-9010.88.5.879

Venkatesh, V., & Davis, F. D. (2000). A theoretical extension of the technology acceptance model. Management Science, 46(2), 186–204. https://doi.org/10.1287/mnsc.46.2.186.11926

Venkatesh, V., Morris, M. G., Davis, G. B., & Davis, F. D. (2003). User acceptance of information technology: Toward a unified view. MIS Quarterly, 27(3), 425–478. https://doi.org/10.2307/30036540

Dr. Smrite Goudhaman Corresponding Author

Downloads

Views

Metrics are updated in real time as the article is accessed and downloaded.

Statistical Evaluation of AI-Enabled Training Micro-Agents: A Longitudinal Analysis of Adoption and Learning Efficiency

Comments