A recent study from Columbia University has uncovered critical flaws in the use of AI-generated personas for user experience (UX) and social simulations. While synthetic personas promise scalable, cost-effective alternatives to traditional research methods, the findings challenge the core assumptions behind these tools. The implications are relevant not just for UX practitioners, but for any domain relying on user modelling.
Personas are one of the concepts one comes across almost instantly once they start looking into UX! Traditional persona development typically involves extensive user research: interviews, surveys, data analysis, and synthesis — a process that's resource-intensive but grounded in actual user insights. With the rise of large language models (LLMs), there’s growing interest in generating these profiles algorithmically. LLMs promise scalability, narrative richness, and speed. However, the question remains: Do they produce personas that reflect real-world people and behaviours?
While LLMs can generate coherent, plausible-sounding personas, recent research suggests that this coherence can be misleading. A growing body of work has investigated how well LLM-generated personas and simulations align with actual human responses. Some studies (e.g., Argyle et al. , 2023); Park et al. , 2023) have shown that, under specific conditions, LLMs can approximate public opinion distributions or simulate social interaction between agents. However, these early findings largely demonstrated feasibility, not accuracy or reliability at scale.
More recent work has shifted focus toward the limitations and biases inherent in these approaches. Gupta et al. (2023), for example, found that assigning personas to LLMs can introduce reasoning distortions, while Hu and Collier (2024) documented how small variations in persona prompts can lead to significant divergence in responses. Salewski et al. (2023) similarly highlight that LLM-generated outputs are sensitive to how personas are framed, raising concerns about internal consistency and reproducibility. Across this growing literature, one recurring theme stands out: these personas may be well-written and internally coherent, but they are not necessarily representative of real users or populations.
The Study
More recently, Li and colleagues developed a systematic framework to evaluate persona generation methods. They categorised personas into four types along a spectrum of AI involvement:
Meta Personas: Demographically accurate profiles based solely on census data with no LLM involvement
Objective Tabular Personas: Building on real user personas by adding factual attributes like occupation and income using LLMs
Subjective Tabular Personas: Further incorporating personality traits and subjective attributes through LLMs
Descriptive Personas: Fully narrative descriptions generated entirely by LLMs
They generated approximately one million personas (!) across six different language models, then tested how these personas "behaved" when simulating opinions on various topics, including political elections and hundreds of questions from the OpinionQA dataset covering issues from climate policy to entertainment preferences.
The study revealed a consistent and concerning pattern: the more LLM-generated content was incorporated into personas, the more their simulated opinions diverged from real-world data.
For example, when simulating the 2024 U.S. presidential election, the most basic personas (with minimal LLM influence) produced results reasonably aligned with actual electoral outcomes. The fully LLM-generated personas, however, predicted Democratic victories across all states, a clear (and unfortunate) divergence from reality.
Similar patterns emerged across most domains. The researchers found that LLM-generated personas consistently favoured:
Environmental considerations over economic factors
Liberal arts education over STEM fields
Artistic entertainment over mainstream options
A particularly telling discovery came through sentiment analysis of the persona descriptions themselves. LLM-generated personas exhibited increasingly positive sentiment and higher subjectivity as more details were added, often portraying idealised individuals with strong community values and minimal life challenges, not the complex, sometimes contradictory people we encounter in actual user research.
Why This Happens
The researchers identified several mechanisms behind these findings:
First, LLMs are trained on content that likely overrepresent certain demographic groups and perspectives. Despite efforts to diversify training data, these models still reflect existing imbalances in who creates and publishes content.
Second, the safety alignment techniques used in LLM development may inadvertently introduce ideological skews by steering model outputs toward what are deemed more acceptable or "safe" responses.
Third, there appears to be a strong "positivity bias" in how LLMs generate persona descriptions, creating profiles that are more successful, adjusted, and socially conscious than realistic population distributions would suggest.
The Path Forward
Recognising these issues, the researchers advocate for the development of a more rigorous, methodologically grounded “science of persona generation”. Their work outlines several key directions for improving how synthetic personas are built and validated.
One foundational challenge lies in identifying the essential information needed to create effective personas. It is not enough to list attributes — researchers need to determine which kinds of data actually shape realistic simulation outcomes. This includes demographic characteristics, but also psychographic traits like values and lifestyle, behavioural history, and contextual information such as social environment or current events. Current research offers mixed results on which variables matter most, and under what conditions.
Another issue is calibration. Existing datasets, like the U.S. Census, only offer marginal distributions for attributes like income or education, making it difficult to generate realistic combinations across multiple dimensions. The researchers emphasise the need for better sampling and calibration methods that can reconcile these gaps and ensure synthetic personas reflect actual population-level joint distributions.
To support research and evaluation, the authors propose building a large-scale, open-source benchmark dataset for persona generation. Such a resource would enable consistent comparisons across models, serve as training data for new techniques, and offer a reference library of high-quality, demographically grounded personas. They note that such an effort would require careful attention to privacy, as well as substantial resources, but the long-term benefits to both research and practice would be considerable.
Finally, they call for interdisciplinary collaboration. Persona-driven simulation has potential across many fields, from UX and behavioural design to economics, political science, and public health. Developing reliable, ethically sound persona systems will require input from both AI researchers and domain experts. Understanding how these synthetic personas perform in real-world applications, and where they fall short, is essential for guiding their responsible use.
Implications for UX Practice
As UX professionals, these findings challenge us to think critically about how we integrate AI-generated personas (and insights) into our work.
The representational gap: LLM-generated personas may systematically underrepresent certain perspectives and user groups, particularly those that diverge from mainstream or idealised narratives. This creates a representational gap that could lead to products and services that fail to meet the needs of significant user segments. For example, if personas consistently present users as technologically savvy, environmentally conscious, and oriented toward artistic experiences, we might miss designing for users with different priorities and constraints.
Challenge of validating them: How do we know when an AI-generated persona is accurate? The research suggests that traditional validation methods may be insufficient, as these personas can appear internally consistent and plausible while still diverging significantly from real-world behaviours. This requires the development of more sophisticated validation approaches that compare synthetic persona perspectives against empirical data from actual user populations.
Domain-specific considerations: The study reveals that certain domains are particularly susceptible to divergence. When working on products related to political choices, environmental decisions, educational content, or cultural preferences, extra scrutiny of AI-generated personas is warranted. For example, a financial application designed based on AI personas might overemphasise sustainability features while undervaluing cost-saving functions that real users might prioritise.
Some Suggestions
Our initial reaction might be to reject AI-generated personas but functions outside UX will probably start or/and keep using them... As a result, we have to develop a more nuanced approach.
Always start with real research. There is no substitute for direct observation and real data. Understanding users requires witnessing their actual behaviours, frustrations, and workarounds, not idealised projections.
Consider a calibrated approach where demographically representative samples form the foundation, with LLM-generated content carefully added in layers that can be validated independently. The research suggests that minimising LLM contribution while focusing on structured attributes produces more accurate results.
When creating AI personas, it's more important than ever to test and validate them properly. We need to develop systematic approaches to doing so against multiple sources, including:
Behavioural data from analytics
Small-sample qualitative research
Existing research
If you really have to use synthetic personas, it is important to explicitly document their limitations and potential areas of divergence. This methodological transparency is crucial for maintaining research integrity and ensuring stakeholders understand the appropriate weight to give these insights. As researchers, it is our duty to educate stakeholders on these limitations.
Conclusion
The skepticism many UX professionals have expressed about AI-generated personas wasn't just resistance to change, it was rooted in a deep understanding of what genuine user research requires. This study confirms that creating accurate representations of users remains a fundamentally human activity that requires empathy, observation, and methodological rigour. As pressure to adopt AI tools continues to mount, UX professionals can now point to concrete evidence supporting a more measured approach.
I took a different approach using AI. I conducted the interviews and survey, placed all the gathered data in a NotebookLM folder with a detailed project description, and built the personas around that information. What do you guys think about this approach?
This is exactly why I don’t create personas the traditional way anymore