Amongst many things, the role of a product team now involves knowing where AI can enhance the traditional product process, and where it can’t.
One place this is showing up is in how teams approach early-stage research. Synthetic interviews are beginning to appear in design and research workflows, with vendors positioning them as a way to move faster, test ideas earlier, and reduce the cost of qualitative research.
It’s an appealing idea. But it’s also one that many practitioners are rightly uncomfortable with.
Product work is rooted in understanding user needs, so the idea of removing people from the research process understandably raises concerns.
Alongside this, there’s also a lack of evidence. There are few documented examples showing where synthetic research has clearly helped or harmed product decisions, and little tracking of whether AI-generated insights lead to better outcomes over time.
On a recent financial services project at Planes, Design Lead, Rob Boyett, decided to explore this properly. Alongside ongoing qualitative research with users, he ran a parallel set of synthetic interviews using the same discussion guide and personas grounded in existing research.
The aim wasn’t to see whether synthetic research should replace human research. It shouldn’t. Instead, the question was whether it could play a useful role in shaping early hypotheses and making real research more effective.
Qualitative research takes time. Recruiting participants takes time. Interviews take time. Analysis takes even longer. That investment is always worth it, but in the early stages of a project, teams are often working with incomplete information and half-formed assumptions.
If synthetic interviews can reliably reflect human perspectives at that stage, they could help teams explore themes earlier, pressure-test assumptions, and arrive at real interviews with sharper questions.
Equally, if they introduce false confidence or miss important signals, they could do more harm than good.
The best way to understand the trade-offs was to run the two approaches side by side.
Rob ran two parallel studies using the same discussion guide.
One set consisted of ten real user interviews from an existing qualitative study on financial management. The other consisted of ten synthetic interviews, generated using AI personas grounded in prior research.
Rather than using an off-the-shelf platform, Rob built a simple, flexible setup so he could better understand the trade-offs and limitations firsthand.
If you’d like to dig into the test setup and see the raw results, you can explore Rob’s full research evaluation here.
At the end of the process, Rob had two sets of transcripts: ten human, ten synthetic, all analysed using the same set of themes.
The obvious question was: how closely do these two sets actually align?
To explore that, he calculated a single score that looked at four things: whether the same themes showed up, whether the emotional tone was similar, whether responses included concrete detail, and whether the language resembled real conversation.
The overall score came out at 85%.
That number sits in an uncomfortable place. It’s too high to ignore, but too imperfect to trust outright. On its own, it doesn’t tell you what to do. It simply tells you that synthetic interviews are close enough to deserve attention, and different enough to deserve caution.
Looking at where that score comes from is where things get interesting.
The strongest result was thematic alignment. The synthetic interviews surfaced almost all of the core themes that emerged in the human interviews.
For early-stage work, this is meaningful. If you’re trying to understand what kinds of topics, motivations, or behaviours are likely to matter, synthetic interviews can give you a credible first map of the territory.
A common criticism of AI-generated research is that it’s overly agreeable or artificially positive.
That didn’t show up here. On average, the emotional tone of the synthetic interviews closely matched the human ones. They weren’t noticeably more flattering or more neutral.
They don’t feel human in the way real conversations do, but at an aggregate level, they weren’t emotionally misleading.
Synthetic interviews were also more concrete than expected. They referenced specific examples, behaviours, and products, rather than relying on vague generalisations.
That makes them usable for early hypothesis-building and internal discussion, particularly when teams need something more tangible than gut feel but aren’t ready for full fieldwork.
One clear gap was generational perspective.
Human participants naturally talked about how their parents managed money, how their own behaviour had changed over time, or how different age groups approach financial decisions.
That kind of comparison didn’t appear in the synthetic interviews. The personas were detailed, but they existed in a kind of perpetual present. They lacked observations that come from family history and lived experience.
Real interviews carried a sense of stress, particularly when it came to timing and cash flow pressure.
Synthetic responses touched on these themes, but with less intensity. The result is a calmer, more composed version of reality, which can be useful structurally but risky if taken at face value.
Synthetic interviews reliably surfaced the core themes from the human interviews. Where they fell down was proportion.
They tended to over-cover themes, repeating and elaborating on them far more than people naturally would in conversation.
In practice, this means synthetic research is good at mapping what exists in a problem space, but much weaker at showing how much weight each theme really carries, or how deeply it’s felt. Used on its own, it can give a false sense of how “big” or important a theme actually is.
Taken together, these gaps point to something important about what synthetic research can and can’t do.
It’s not that synthetic interviews are wrong. The themes are real. The structure of the problem space is broadly accurate. What’s missing is texture. The human-ness of humans.
Human research gives you lived experience. It shows you where things are uncomfortable, unresolved, or emotionally loaded. Synthetic research gives you coverage, but it smooths over those edges.
Synthetic interviews can be useful for:
Exploring a problem space before fieldwork begins
Surfacing likely themes and blind spots
Pressure-testing hypotheses built from existing knowledge
They’re much less suited to:
Capturing emotional intensity or stress
Understanding generational or cultural shifts
Replacing synthesis and sense-making done by a team
Ultimately, synthetic interviews don’t generate new human insight on their own, and they should never be seen as a replacement for talking to people.
What they can do is improve the quality of the early stage, so that when teams do speak to real participants, they’re asking better questions and listening more closely.