mobile-coding-agents

Mobile UI Quality-Control Checklist for AI-Generated Code

AI coding agents don't tell you what they silently add — and asking them to review their own work doesn't help. Here's the 8-point checklist that catches what the agent won't.

Sahil Kathpal

24 Apr 2026 • 11 min read

AI coding agents — Cursor, Claude Code, Codex — produce mobile UIs that break in consistent, predictable ways: viewport-snapping breakpoints, modals that trap background scroll, touch targets that are visually present but physically untappable, and features that appear in the diff without appearing in the prompt. Asking the agent to self-review before you merge is largely ineffective. This agent-agnostic, 8-point checklist gives you a QA layer to run before every mobile PR, catching the regressions your agent introduced silently.

TL;DR: Run this checklist on every mobile PR that a coding agent touched. The eight checks cover viewport breakpoints, modal behavior, touch target sizing, silent feature additions, navigation regressions, text overflow, keyboard handling, and cross-device smoke testing. Total time: under 15 minutes per PR if you work from the diff.

Why does asking the agent to review its own work fail?

The honest framing first: agent self-review is a trap. As one developer described in a thread on r/Frontend about AI-generated mobile slop, "Asking the agent to review its own work — mostly useless as it hallucinates with its own work." The agent that wrote the broken component evaluates the same code as correct, because its confidence is calibrated to produce output, not audit it.

The silent-addition problem compounds this. A developer who upgraded to Cursor Pro described the experience bluntly in r/cursor: "It tries to be overly helpful and adds a bunch of extra stuff. The worst part is that it doesn't even tell me what it's adding!" You cannot ask the agent to review an addition you don't know exists.

This failure is widespread enough that it spawned a company. Daemons, a Show HN entry, pivoted entirely to cleaning up after coding agents — a product that exists precisely because agents leave a consistent enough mess to build a business around. The problem is especially acute for unattended agent workflows, where the agent runs for hours without oversight and unrequested additions accumulate invisibly until someone opens the diff.

What actually works is a human-authored checklist run against the agent's diff before merge. That is what follows.

What do you need before running this checklist?

Prerequisites:

Access to the PR diff (GitHub, GitLab, or git diff main...HEAD locally)
A mobile device or browser DevTools emulator (Chrome → Toggle Device Toolbar covers most checks)
Your project running locally or on a preview URL
15 minutes

No specialized tooling is required. The checklist is designed to be executable during a code review.

The 8-point mobile UI QA checklist

1. Viewport breakpoint audit

AI agents default to breakpoints that look reasonable in a desktop preview but snap incorrectly on real device widths. The typical failure: a breakpoint at 768px for "tablet" and 480px for "mobile" that never accounts for the actual distribution of production traffic — 375px (iPhone SE/14/15), 390px (iPhone 14 Pro), and 414px (iPhone Plus/XR models).

What to check:

Open Chrome DevTools → Toggle Device Toolbar
Test at exactly: 320px, 375px, 390px, 414px, 768px
Look for layout collapse, element overflow, or overlapping components at any width

# Find breakpoints the agent added in this PR
git diff main...HEAD -- '*.css' '*.scss' '*.tsx' '*.jsx' \
  | grep -E '@media|breakpoint|min-width|max-width'

Flag any breakpoint value that did not exist in the codebase before this PR. Any value above 480px that is supposed to target mobile is almost certainly wrong.

2. Modal and overlay behavior audit

Modals are the single most consistent failure surface in AI-generated mobile UI. The agent produces a modal that looks correct in a static preview but exhibits one or more of: background scroll not locked, backdrop tap not dismissing, z-index conflicts with native navigation bars, or safe area insets not respected on notched devices (iPhone 14 Pro and newer).

What to check:

Open the modal → try scrolling the content behind it. If the background scrolls, scroll-lock is broken.
Tap outside the modal. Does it dismiss? If not, is that intentional or an omission?
Test on an iPhone with a home indicator — does modal content overlap the bottom safe area?
Test at 375px — does the modal overflow or clip content at the edges?

// What correct safe area handling looks like in React Native
<View style={{ paddingBottom: insets.bottom }}>
  {/* insets from react-native-safe-area-context */}
</View>

A modal without safe area handling renders correctly on Android and visually broken on iPhone. Agents omit this reliably.

3. Touch target size verification

The minimum tap target size per Apple's Human Interface Guidelines and Google's Material Design specification is 44×44 points. AI agents consistently generate icon buttons, close icons, and inline action links at 24×24 or smaller — visually correct, physically untappable on a real device.

What to check:

Inspect every new icon button, close control, or inline action that appears in the diff
In Chrome DevTools mobile mode, hover over the element and verify the rendered hit area is at least 44×44px

# Find small interactive elements the agent may have added
git diff main...HEAD \
  | grep -A5 'IconButton\|TouchableOpacity\|Pressable\|<button' \
  | grep -E 'size=|width:|height:'

A 20px icon inside a 20px container fails this check. A 20px icon inside a 44px container with alignItems: center passes. Agents almost always generate the former.

4. Unrequested feature inventory

This is the check that prevents the surprises developers are finding months after launch. The community thread in r/SaaS on what breaks in production AI-built apps repeatedly surfaces agent-added logic as the top post-launch pain — entire feature paths that shipped because nobody audited the diff carefully before merging.

The agent writes code that was not in the prompt. Sometimes a "helpful" enhancement. Sometimes a new route. It will not announce any of it.

What to check:

# Every line the agent added (strip deletions for clarity)
git diff main...HEAD | grep '^+' | grep -v '^+++' | less

# New function and component definitions
git diff main...HEAD \
  | grep -E '^\+(export default|export const [A-Z]|function [A-Z][a-zA-Z]+)' \
  | grep -v '^+++'

# New route or navigation entries
git diff main...HEAD \
  | grep -E '^\+.*(Route|Screen|Tab|Stack|router\.)' \
  | grep -v '^+++'

Read every addition. For each line you did not explicitly request: understand it, test it, or remove it. "I didn't ask for this" is sufficient justification to revert.

5. Navigation regression check

Agents editing routing or navigation code break back-button behavior, deep link resolution, and tab state persistence in ways that are invisible in a desktop browser and surface only on a physical device.

What to check:

Navigate to the modified screen → press the hardware back button (Android) or swipe-back gesture (iOS)
Does the expected previous screen appear?
If the PR touches routing, test every deep link your app registers
Navigate away from a modified tab and return — is scroll position preserved?

# Check whether the agent touched navigation-related files
git diff main...HEAD --name-only \
  | grep -iE 'navigation|router|routes|stack|tab'

Any navigation file appearing in the diff adds 5 minutes to your review for this check. Budget accordingly — do not skip it.

6. Typography and text truncation audit

AI agents set font sizes, line heights, and container widths that look correct in the reference context but overflow or get silently clipped on small device widths. Card components, notification banners, and list items are the highest-frequency failure points.

What to check:

Find the component in the diff that will render the longest expected text (user names, product descriptions, error messages from your API)
Test it at 320px
Look for text that overflows its container, clips without an ellipsis, or wraps in a way that breaks the layout

# Hardcoded font sizes the agent introduced
git diff main...HEAD | grep -E '^\+.*(fontSize|font-size):' | grep -v '^+++'

# Truncation props that may be silently cutting content
git diff main...HEAD | grep -E '^\+.*(numberOfLines|ellipsizeMode|text-overflow)' | grep -v '^+++'

numberOfLines={1} silently truncates any text longer than a single line, including content that is valid, expected, and meaningful to the user. Agents add this as a layout "fix" and it ships invisibly.

7. Keyboard and input field behavior

On mobile, the virtual keyboard reduces the available viewport height. Components positioned at the bottom of the screen with position: absolute; bottom: 0 are hidden behind the keyboard unless the layout explicitly handles it. Agents generate these without KeyboardAvoidingView or equivalent handling at a reliable rate.

What to check:

Open any screen with a text input → focus the input → verify no meaningful UI element is hidden behind the keyboard
Check that submit buttons and form actions remain accessible with the keyboard open
Test on both iOS (keyboard pushes layout up) and Android (keyboard shrinks the viewport)

// React Native — correct keyboard handling for any form screen
<KeyboardAvoidingView
  behavior={Platform.OS === 'ios' ? 'padding' : 'height'}
  style={{ flex: 1 }}
>
  {/* form content */}
</KeyboardAvoidingView>

Any input UI the agent added without KeyboardAvoidingView (React Native) or windowSoftInputMode: adjustResize (Android) will fail on a physical device.

8. Cross-device smoke test

After the seven targeted checks, run a 3-minute end-to-end smoke test through every modified screen. The targeted checks catch specific failure modes; the smoke test catches interaction effects between them and regressions the earlier checks didn't anticipate.

What to run:

Start from app launch or the deepest entry point touched by the PR
Navigate to every modified screen
Perform the primary action on each screen
Navigate back to the starting point

Test on at least one iOS and one Android device. For high-risk PRs or PRs touching core navigation, mobile test automation tooling can run this on a device farm with consistent coverage. For production SaaS where visual regressions routinely slip past unit tests, adding a baseline screenshot comparison step here pays off after the first incident it catches.

How do you automate the discovery phase?

Checks 1, 4, 5, 6, and 7 involve scanning the diff for mechanical patterns — these can be partially automated. The judgment calls (is this addition intentional? does this modal interaction feel right?) remain human work.

#!/bin/bash
# mobile-qa-scan.sh — run at the start of every mobile PR review

echo "=== Breakpoints introduced ==="
git diff main...HEAD -- '*.css' '*.scss' '*.tsx' \
  | grep -E '(@media|breakpoint)'

echo "=== New exports and components ==="
git diff main...HEAD \
  | grep -E '^\+(export default|export const [A-Z]|function [A-Z])' \
  | grep -v '^+++'

echo "=== Navigation files touched ==="
git diff main...HEAD --name-only \
  | grep -iE 'navigation|router|routes|stack|tab'

echo "=== Inputs without keyboard handling ==="
git diff main...HEAD \
  | grep -E '^\+.*(TextInput|<input)' \
  | grep -v '^+++'

echo "=== Truncation props added ==="
git diff main...HEAD \
  | grep -E '^\+.*(numberOfLines|ellipsizeMode)' \
  | grep -v '^+++'

Run this script, review the flagged output, then proceed to the manual checks. For a proper end-to-end regression baseline, this script is a triage layer, not a replacement. Post the output as a comment in the PR before you start reviewing — you can validate the scope and follow up on any flagged item from wherever you are, including reviewing your agent's code changes from your phone.

What should you do when a check fails?

Document it specifically in the PR — note which check failed and the exact symptom ("Check 2: modal background scrolls on iOS; scroll-lock missing")
Give the agent a precise fix prompt — "The modal is missing overflow: hidden on the body when it opens. Add it to the modal open handler." Specific beats vague every time.
Re-run checks 1, 2, and 4 after the fix — agents fixing one issue will break adjacent things. Breakpoints, modal behavior, and the feature inventory are the most likely to regress during a targeted fix pass.
If the fixup commit adds new lines, re-run the full inventory — a fixup can introduce as much unrequested code as the original change.

FAQ

How do I catch mobile UI regressions introduced by AI coding agents?

Run a structured 8-point checklist before merging any AI-generated mobile PR. The highest-leverage checks: viewport breakpoints at 320px, 375px, and 390px; modal scroll-lock and safe-area inset handling; touch targets minimum 44×44px; and a line-by-line diff scan for additions outside the original prompt. Each check takes 1–3 minutes and catches failure modes that agent self-review reliably misses.

Why does my AI coding agent add features I didn't ask for?

Large language models are optimized to produce complete, polished output — not to scope strictly to the prompt. An agent asked to "fix the modal" may adjust button styles, add an animation, or refactor a nearby component without announcing any of it. The only reliable defense is a diff audit before merge that specifically scans for additions outside the original task using git diff main...HEAD | grep '^+' | grep -v '^+++'.

Is asking an AI coding agent to review its own code effective for mobile UI work?

No. Agents evaluate their own output with the same confidence they generated it. A broken viewport breakpoint or a missing KeyboardAvoidingView looks correct to the model that wrote it. Human review against a structured checklist consistently catches what agent self-review misses, particularly for layout and interaction issues that require a real device to surface.

What mobile UI problems appear most often in production AI-generated code?

The five highest-frequency failures based on community reports: (1) breakpoints that don't account for real device widths in the 320–414px range, (2) modals without background scroll-lock, (3) touch targets below 44px, (4) text inputs obscured by the virtual keyboard, and (5) unrequested additions to routing or navigation logic. These appear in roughly that order of frequency.

How long does this mobile QA checklist take to run?

The full 8-point checklist takes approximately 15 minutes on a PR of typical scope. Running mobile-qa-scan.sh first narrows the focus — if no navigation files appear in the diff, Check 5 takes under a minute. Check 4 (feature inventory) and Check 8 (smoke test) scale with PR size and are the most time-variable. On a large PR, budget 25–30 minutes.

This post is published by Grass — a VM-first compute platform that gives your coding agent a dedicated virtual machine, accessible and controllable from your phone. Works with Claude Code and OpenCode.