AI Can Help with Survey Writing, But It Still Requires Human Expertise

AI can produce polished survey drafts quickly, but experienced human review is still needed to catch subtle survey-design flaws that weaken data quality.

Summary: AI can produce polished survey drafts quickly, but experienced human review is still needed to catch subtle survey-design flaws that weaken data quality.

Generative AI chatbots are an appealing tool for UX researchers who are under pressure to quickly deploy surveys. When given a clear research objective and prompted to use survey-design best practices, genAI tools like ChatGPT and Claude can generate a solid first draft in a matter of seconds.

That said, a solid first draft is still only a starting point. Although these tools can handle some aspects of survey design surprisingly well, they also show weaknesses in other important areas. Understanding where AI performs well — and where it falls short — is essential. Without that judgment, AI can speed up the writing process while introducing problems that weaken the quality of your survey and, ultimately, your research insights.

Where GenAI Can Assist in Survey Creation

When guided with clear instructions, generative AI tools can perform reasonably well on several foundational aspects of survey writing. In many cases, they can:

• Generate relevant questions that cover multiple dimensions of the research topic • Write questions in clear, neutral, and straightforward language • Avoid certain common issues such as double-barreled wording • Group questions together based on related topics • Suggest a mix of both closed-ended and open-ended questions • Suggest multiple ways to phrase questions and construct response options

These are meaningful strengths. For researchers who need a starting point, especially when time is limited, AI-generated drafts can meaningfully accelerate survey development by providing a strong initial structure.

Current Limitations of GenAI for Survey Development

Despite these strengths, AI-generated surveys often miss important aspects of good survey design. These shortcomings are not always dramatic. In fact, that is what makes them risky: a survey can look polished at first glance while still containing subtle issues that weaken data quality or the respondent experience.

To examine how these issues may show up in practice, I tested ChatGPT 5.4 (Thinking mode), Claude Sonnet 4.6 (Extended Thinking mode), and Claude Opus 4.6 (Extended Thinking mode) using the same survey-writing prompt. I ran the prompt at least twice in each tool to see how the outputs varied across attempts. (Because these tools are evolving quickly, future models may resolve some of the issues described in this article, while possibly introducing new ones of their own.)

Here’s the prompt I used.

Prompt: I'm creating a survey for the discovery stage of developing a new telehealth platform, so my goal is to learn what barriers people face when trying to access care online and what would make the experience feel more useful, trustworthy, and convenient. Results from this survey will be used to inform how to create a telehealth platform that is easy and intuitive for users to use.

Draft survey questions for this survey. Incorporate survey design best practices. Make the survey take no longer than 10 minutes to complete. Give suggestions for page order and question order in a way that will reduce priming. Export the survey draft into a Microsoft Word document.

Across the survey drafts that were created, several issues emerged.

### GenAI May Underestimate Respondent Burden

A major limitation is that genAI tends to underestimate how effortful the survey will feel to respondents. Because genAI does not experience the survey as a respondent would, it can miss forms of friction that make a survey feel tiring or frustrating in practice.

For example, genAI tools may:

• Underestimate survey length: A genAI tool may confidently suggest that a survey will fit within a 10-minute completion time despite containing far too many questions. In other cases, it may generate an overly long draft and give suggestions on which questions to trim. In either case, human review is needed to assess actual burden and cut unnecessary content. • Recommend grid questions: GenAI may suggest using grid questions to present rating-scale items, but this approach is generally best avoided. Grid questions can increase respondent burden and encourage straightlining, which refers to when participants repeatedly select the same answer across multiple items without fully evaluating each one. • Generate too many options for multiselect questions: GenAI may produce long lists of answer options — such as more than 10 choices — for multiselect items. Even when the options are plausible, too many choices can make questions harder to scan and evaluate, especially if there are several multiselect questions with long lists used consecutively. • Place demographic questions too early: GenAI tools may recommend asking demographics questions at the beginning of the survey. However, demographic questions are often better placed at the end (if they’re not being used as screener questions), where they feel less intrusive and are less likely to discourage participation early on. • Use inconsistent instructions for multiselect questions: A survey draft may require respondents to Select exactly 3 for one multiselect question and Select up to 2 for another, without a clear reason for the difference. These shifts can make the survey feel less predictable, increase cognitive effort, and lead to selection errors.

Tips for GenAI-Assisted Survey Writing

Here are 7 tips for using genAI chatbots in survey design. The key principle is to use genAI as a drafting tool, but always rely on human expertise for oversight.

### 1\. Start with a Clear Research Objective

GenAI performs best when you give it a specific goal, clear context, and enough background on what you are trying to learn. The more precise you are about the context, the more likely the tool will generate survey questions that are relevant and actionable.

Your prompt can include aspects such as:

• Your research questions • The business or product decisions it will inform • Your target audience • Any parts of the research goal that should not be made too obvious to participants • Your analysis plan (for instance, mentioning that you do not intend to gather any qualitative data to prevent it from suggesting open-ended questions) • The ideal survey length

### 2\. Prompt the Tool to Follow Survey-Design Best Practices

Do not assume the model will naturally produce good survey questions on its own. Tell it to incorporate survey-design best practices.

You can even explicitly tell it to avoid specific common problems such as double-barreled questions, leading wording, vague terms, overly long answer lists, and unnecessarily burdensome question formats. If you have access to a paid subscription to Claude, you could consider creating a Claude Skill that documents these best practices.

### 3\. Use AI to Generate Alternatives, Not Final Questions

One of the most useful applications of genAI is producing multiple alternatives of a question or response set. You can do this using a simple prompt such as, “Provide 5 alternatives for this survey question”. This approach makes it easier to compare wordings, identify subtle issues, and refine the draft more efficiently.

In many cases, the best final question may come from combining the strongest parts of several alternatives.

### 4\. Scrutinize Question Wording and Response Options Carefully

Even when a survey item sounds strong initially, both the question wording and the answer choices may still be flawed. Doublecheck whether the question is clear, specific, and neutral, and whether the response options are exhaustive, mutually exclusive when appropriate, balanced, semantically parallel, and aligned with how respondents naturally think about the topic.

### 5\. Review the Survey for Respondent Burden

Check whether the survey truly fits your ideal length and is as easy to complete as the tool claims.

AI news brief archive