By Pengda Wang
Rice University
Andrew C. Loignon
Center for Creative Leadership
Sirish Shrestha
Center for Creative Leadership
George C. Banks
University of North Carolina at Charlotte
Frederick L. Oswald
Rice University
Summary
The importance of data sharing in organizational science is well-acknowledged, yet the field faces hurdles that prevent this, including concerns around privacy, proprietary information, and data integrity. We propose that synthetic data generated using machine learning (ML) could offer one promising solution to surmount at least some of these hurdles. Although this technology has been widely researched in the field of computer science, most organizational scientists are not familiar with it. To address the lack of available information for organizational scientists, we propose a systematic framework for the generation and evaluation of synthetic data. This framework is designed to guide researchers and practitioners through the intricacies of applying ML technologies to create robust, privacy-preserving synthetic data. Additionally, we present two empirical demonstrations using the ML method of Generative Adversarial Networks (GANs) to illustrate the practical application and potential of synthetic data in organizational science. Through this exploration, we aim to furnish the community with a foundational understanding of synthetic data generation and encourage further investigation and adoption of these methodologies. By doing so, we hope to foster scientific advancement by enhancing data-sharing initiatives within the field.
Citation
Wang, P., Loignon, A.C., Shrestha, S., Banks, G.C., & Oswald, F. (in press). Advancing Organizational Science Through Synthetic Data: A Path to Enhanced Data Sharing and Collaboration. Journal of Business and Psychology. 1-27. https://doi.org/10.1007/s10869-024-09997-w.