Lecture: Can AI-Generated Participants Replace Humans in Linguistic Research?

Version 1.0

A Psycholinguistic Case Study on Perspective-Taking

Collecting data from human participants is slow and costly. Recruitment is difficult, participants are often un- or underpaid, and results are affected by biases such as the Observer’s Paradox and the Hawthorne Effect. In addition, many sample groups are WEIRD (from Western, educated, industrialized, rich, and democratic societies). So can Artificial Intelligence (AI) simulate human participants? To test this, I generated a sample of 600 AI participants using the LLaMA 3.1 model (8B), approximating Germany’s population based on 2024 federal statistics. Their questionnaire responses were compared to data from human participants. No prior computational knowledge is required, everyone interested in discussing the implications of AI in linguistic research is warmly invited!

Collecting data from human participants is a demanding and time-consuming process. Recruiting participants can be difficult, they are often un- or underpaid, and results are not always reliable due to effects such as the Observer’s Paradox and the Hawthorne. Moreover, participant pools are often WEIRD (coming from Western, educated, industrialized, rich, and democratic societies). So why not use Artificial Intelligence (AI) to simulate human participants? AI-generated data would be faster to obtain and significantly more cost-efficient than traditional data collection with humans.

This case study addresses precisely this question. To investigate it, I generated a sample of 600 AI participants (N = 600) using the LLaMA 3.1 model (8B parameters). The sample approximated the demographic distribution of Germany’s population, based on 2024 data from the Federal Statistical Office (Statistisches Bundesamt). These simulated participants completed a set of questionnaires, and their responses were then compared to data collected from human participants (N = 37).

The results generated by LLaMA 3.1 tended to be stereotypical and did not align with human-collected data, particularly in the questionnaire measuring empathy.

Info

Day: 2026-05-16
Start time: 15:20
Duration: 00:30
Room: DOR 24 1.501
Track: Psycholinguistics
Language:

Schedule 79. StuTS