AI-powered UX research analysis toolkit that turns raw user test transcripts into structured, evidence-based insights. It applies 14 proven UX methodologies systematically — from verbatim coding to Kano Model classification — so you spend minutes instead of hours on each interview. Built as Claude Code agents, battle-tested on 50+ real user interviews.
Read the agents to get started.
- Claude Code CLI installed
- Interview transcripts from any STT tool (Otter.ai, Whisper, Clova Note, etc.)
git clone https://github.com/Lee-Soyeon/ux-research-agents.git
# Copy agents to your Claude Code config
cp ux-research-agents/agents/*.md ~/.claude/agents/Deep analysis (14-stage):
@ut-research-analyzer Analyze /path/to/transcript.txt
Hypotheses: "Users will complete onboarding without help"
Sprint analysis (quick):
@ut-transcript-analyzer Analyze /path/to/transcript.txt
Hypothesis: "The new search flow increases task completion rate"
The standalone prompts in prompts/ work with any LLM — copy-paste into ChatGPT, Claude, or Gemini with your transcript:
prompts/ut-auto-summary.md— Quick sprint retrospective summaryprompts/ut-deep-analysis-prompt.md— Full 14-stage deep analysis (no agent required)
Tip
Start with the Sprint Transcript Analyzer for quick results, then use the Deep Research Analyzer when you need comprehensive insights.
agents/ut-research-analyzer.md — 1,200+ lines
A 14-stage analysis pipeline that applies established UX research frameworks to raw interview transcripts:
| Stage | Method | Framework |
|---|---|---|
| 1 | Transcript preprocessing & speaker identification | — |
| 2 | Verbatim extraction & semantic coding | Qualitative coding (14 rules) |
| 3 | Behavioral sequence analysis | Timeline mapping |
| 4 | Emotional journey mapping | Peak-end analysis (Kahneman) |
| 5 | Empathy Map | NNG Says/Thinks/Does/Feels |
| 6 | Thematic analysis | Braun & Clarke 6-phase |
| 7 | Affinity mapping | Cluster by type |
| 8 | Jobs-to-be-Done analysis | JTBD (Christensen) |
| 9 | Proto-Persona sketch | NNG methodology |
| 10 | Mental model gap analysis | Don Norman |
| 11 | 7 Stages of Action | Don Norman |
| 12 | 3 Levels of Processing | Don Norman |
| 13 | Hypothesis validation | Evidence-based mapping |
| 14 | Usability issues & Pain/Gain | Nielsen's 10 heuristics |
When you have multiple users analyzed, run Cross-User Analysis (8 additional stages) to consolidate:
| Stage | Method | Framework |
|---|---|---|
| C1 | Verbatim cross-comparison | Pattern matching |
| C2 | Hypothesis cross-validation | Evidence aggregation |
| C3 | Theme cross-mapping | Universal / Major / Segment / Unique |
| C4 | Persona consolidation | 2-3 representative personas |
| C5 | Importance-Satisfaction Gap | Lean Product Playbook (Dan Olsen) |
| C6 | PMF Pyramid mapping | 5-layer product-market fit |
| C7 | Kano Model classification | Must-be / Performance / Delighter |
| C8 | Actionable recommendations | Problem Space vs Solution Space |
agents/ut-transcript-analyzer.md
Fast, hypothesis-driven analysis for sprint retrospectives. Tags every user utterance with 6 semantic labels:
| Tag | Meaning | Example |
|---|---|---|
[PAIN] |
Frustration, complaint | "The options are too limited" |
[AHA] |
Positive surprise, delight | "I didn't expect to get so into this" |
[WTP] |
Willingness to pay/reuse | "At $3, I'd consider it" |
[BEHAV] |
Observable behavior | Hesitation at 03:42, repeated exploration |
[NEED] |
Feature request | "It would be nice if it had..." |
[COMP] |
Competitor comparison | "Notion does X, but this..." |
Generates hypothesis validation verdicts: Validated / Partially Validated / Rejected / Insufficient Data, plus UX vs Functional Architecture issue classification and next sprint actions (max 3).
prompts/ut-auto-summary.md — Lightweight sprint retrospective summary. No agent setup required — paste into any LLM.
prompts/ut-deep-analysis-prompt.md — Full 14-stage deep analysis as a standalone prompt. Split into Part 1 (Stages 1-7) and Part 2 (Stages 8-14) for LLMs with limited context windows.
For a complete 14-stage deep analysis output, see examples/sample-deep-analysis.md.
Sprint analysis output (quick)
# Sprint 2 - UT Sprint Summary: User #8
> Testing scope: Onboarding flow + task creation + dashboard comprehension
## User Info
- 24F, college student, no prior experience with this product category
- Segment: New User
## 0. One-line Key Finding
- Onboarding successfully built initial understanding,
but dashboard complexity caused confusion and reduced task completion.
## 1. Tagged Key Utterances
### [PAIN]
> "I don't really get what this button does" (03:42)
> "There are too many things on this screen" (11:20)
### [AHA]
> "Oh wait, this actually makes sense now" (08:15)
### [WTP]
> "If it saved me this much time every week... maybe $5/month?" (22:30)
## 2. Hypothesis Validation
**H1: Users complete onboarding without assistance**
**Verdict: Partially Validated**
| Axis | Verdict | Evidence |
|-----------------|---------|----------------------------------------|
| Task completion | Present | Completed 4/5 steps independently |
| Comprehension | Weak | "What does this icon mean?" (05:12) |
| Satisfaction | Present | "That was pretty straightforward" (07:45) |
## 3. Usability Issues
| Screen | Issue | Heuristic | Severity |
|-----------|-------------------------|--------------------------|----------|
| Dashboard | Icon meaning unclear | Recognition > Recall | 3/4 |
| Settings | No confirmation on save | System Status Visibility | 2/4 |This toolkit uses LLMs to automate qualitative analysis. Be aware of inherent limitations:
- No non-verbal cues: LLMs analyze text only. Facial expressions, tone of voice, body language, sighs, and pauses are invisible unless explicitly noted in the transcript.
- Cultural nuance: Sarcasm, politeness norms, and culturally specific expressions may be misinterpreted or missed entirely.
- Hallucination risk: LLMs may generate inferences that sound plausible but are not grounded in the transcript. Always verify codes and themes against the source text.
- Not a replacement: This tool assists researchers -- it does not replace them. A qualified UX researcher should review all AI-generated analysis before sharing with stakeholders.
- Single-interview scope: One interview cannot be generalized to a population. Use cross-user analysis with multiple participants before drawing product conclusions.
- Prompt sensitivity: Results may vary across LLM providers, model versions, and even between runs. Treat outputs as a strong first draft, not a final report.
For details on why these specific methodologies were chosen and ordered this way, see Design Rationale.
Built on established, peer-reviewed UX research frameworks:
- Nielsen Norman Group — Empathy Mapping, Persona Development, Usability Heuristics
- Don Norman — Mental Model Gap Analysis, 7 Stages of Action, 3 Levels of Emotional Design
- Braun & Clarke — 6-Phase Thematic Analysis
- Dan Olsen — Lean Product Playbook, Importance-Satisfaction Gap, PMF Pyramid
- Clayton Christensen — Jobs-to-be-Done
- Noriaki Kano — Kano Model
- Daniel Kahneman — Peak-End Rule
ux-research-agents/
├── agents/
│ ├── ut-research-analyzer.md # 14-stage deep analysis (1,200+ lines)
│ └── ut-transcript-analyzer.md # Sprint-level quick analysis
├── prompts/
│ ├── ut-auto-summary.md # Standalone prompt for any LLM
│ └── ut-deep-analysis-prompt.md # Full 14-stage prompt (no agent needed)
├── examples/
│ ├── sample-transcript.md # Fictional sample interview
│ └── sample-deep-analysis.md # Complete 14-stage analysis output
├── docs/
│ └── design-rationale.md # Why these methodologies and this order
├── templates/
│ ├── ut-interview-guide.md # Interview guide template
│ └── hypothesis-template.md # Sprint hypothesis template
├── CONTRIBUTING.md
├── LICENSE
└── README.md
- Core analysis agents (14-stage + sprint-level)
- Cross-user analysis (8-stage, Lean Product Playbook)
- Standalone prompt for any LLM
- Interview guide & hypothesis templates
- PostHog session replay AI analysis
- Playwright-based automated UX testing
- Video/screen recording analysis (mp4 to UX insights)
- Multi-language transcript support
- Integration with Notion/Linear for issue tracking
Contributions welcome. See CONTRIBUTING.md for guidelines.
Built by Soyeon Lee from real pain of analyzing 50+ user interviews across multiple product discovery sprints.