AI-assisted text analysis for coaching evaluation

By Katelyn McCoy
Center for Creative Leadership

Tanner Landolt
NuView Analytics

Ezekiel Welsh and Caroline DeStefano
Center for Creative Leadership

Summary

This empirical study examines the efficacy of OpenAI’s GPT-4 large language model in thematic coding of qualitative leadership development data. The researchers analyzed 1,500 participant responses across three leadership coaching contexts (open enrollment, custom, and extended custom programs), comparing GPT-4’s performance against expert human coders in identifying and applying thematic codes to participant feedback. The investigation employed a three-phase methodology: single-theme tagging, multiple-theme tagging, and multiple-theme tagging with human intervention. Results demonstrate that GPT-4 achieved 55-65% agreement with human consensus coding in single-theme assignment, slightly below the 60-70% agreement rate among human coders. In multiple-theme assignment, GPT-4 identified at least one matching theme 85% of the time but demonstrated less discretion than human coders, tagging more than twice as many themes per response. The hybrid human-AI approach proved most effective, where human refinement of GPT-4’s initial themes reduced theme quantity by nearly half while maintaining 75% accuracy. This methodological investigation contributes to the emerging literature on AI-assisted qualitative analysis, suggesting that while large language models significantly reduce coding time, they currently function best as supplementary tools within human-led research processes rather than autonomous replacements for expert qualitative analysis.

Citation

McCoy, K., Landolt, T., Welsh, E., & DeStefano, C. (2025). AI-assisted text analysis for coaching evaluation. Center for Creative Leadership. https://doi.org/10.35613/ccl.2025.2066

LINK

Leave a Comment