← Knowledge Hub
Research7 April 202610 min read

How AI Search Engines Choose Which Businesses to Recommend

When someone asks ChatGPT ‘best accountant in Leeds,’ the model follows a process with identifiable signals you can influence. Here’s how the retrieval pipeline actually works.

A

Awais M.

Founder of GeoRankLocal

When someone asks ChatGPT “best accountant in Leeds” or “reliable plumber near me,” the model doesn’t randomly pick a business. It follows a process. That process has identifiable signals you can influence, measurable biases you can exploit, and known failure modes you can avoid.

Most content about AI search optimisation focuses on what to do. This article focuses on why it works — the actual mechanics behind how AI search engines select which businesses to cite. Once you understand the retrieval pipeline, every GEO tactic makes intuitive sense rather than feeling like a checklist someone pulled off a blog post.

This is the methodology piece. How the machine actually thinks — or more accurately, how it retrieves, ranks, and synthesises.

The retrieval pipeline: what actually happens when someone asks ChatGPT a question

When a user asks ChatGPT a question, the system doesn’t search a static database. It runs a live process called Retrieval-Augmented Generation (RAG). Understanding RAG at a practical level is the foundation of everything in GEO.

Here’s the pipeline, broken into its real stages:

Stage 1 — Query parsing and intent classification. The model reads the user’s query and determines what kind of answer is needed. “Best accountant in Leeds” is a local recommendation query. “How does corporation tax work in the UK” is an informational query. “Should I use Xero or QuickBooks” is a comparison query. The intent classification determines which retrieval strategy the model uses — and crucially, which types of sources it prioritises.

Stage 2 — Search fan-out. The model doesn’t run one search. It runs multiple. Research on ChatGPT’s current production model (GPT-5.4) shows it now runs 10+ different “fan-out” queries per user prompt (Chris Long and RESONEO, April 2026). For a query like “best accountant in Leeds,” it might simultaneously search for “accountant Leeds reviews,” “Leeds accounting firms,” “best rated accountant Leeds 2026,” and several variations. This means your content needs to be findable across multiple query phrasings, not just your primary keyword.

Stage 3 — Candidate retrieval. From all those searches, the model retrieves a set of candidate pages — usually between 10 and 30 across all fan-out queries. These candidates are the raw material the model will work with. If your page isn’t in this candidate set, you cannot be cited. Full stop.

This is where traditional SEO matters for GEO. The candidates mostly come from search engine results. Writesonic’s analysis of over a million AI Overviews found that 40.58% of citations come from Google’s top 10 organic results, rising to 71% for the top 20 (Wowbix, 2026). For Google’s AI Overviews, strong organic rankings are the primary path into the candidate set.

But ChatGPT is different. ChatGPT Search primarily cites lower-ranking pages — position 21 and below — about 90% of the time (Position Digital, 2026). This means a page that doesn’t rank on Google’s first page can still be cited by ChatGPT if it’s findable in the broader search index. Google AI Overviews reward existing SEO authority. ChatGPT rewards content quality and structure regardless of traditional rank position.

Stage 4 — Candidate scoring. This is where GEO lives. The model reads each candidate page and scores it on several dimensions:

  • Relevance: Does the page content actually answer the user’s question? Not “is it about the right topic” but “does it contain a direct, extractable answer to this specific question?”
  • Structure: Can the model easily extract a clean quote or fact from the page? Pages with clear headings, FAQ blocks, summary paragraphs, and structured data are dramatically easier to cite.
  • Authority: Is this source trustworthy? The model evaluates domain authority, backlink profile, brand mentions across the web, and presence on trusted third-party platforms.
  • Freshness: When was this page last updated? The model checks dateModified in Article schema, visible “last updated” dates, and content recency signals.
  • Entity clarity: Does the model know exactly what entity this page represents? Schema markup, consistent NAP across platforms, and clear author attribution all contribute.

Stage 5 — Answer synthesis. The model constructs its answer using the top-scored candidates. It paraphrases, synthesises across multiple sources, and structures the answer to match the user’s intent. During synthesis, the model decides which specific sources to name and cite.

Stage 6 — Citation attachment. The model attaches citations. Different platforms handle this differently: Perplexity always cites sources inline. ChatGPT sometimes names businesses without links. Google AI Overviews embed source links. Claude names sources when using web search but not always.

You cannot control Stage 6, but you can heavily influence Stages 3-5.

The five signals that actually determine citation (ranked by impact)

Based on the research and our audit observations at GeoRankLocal, here are the five signals that most strongly predict citation.

Signal 1: Domain authority and brand recognition (strongest signal)

SE Ranking’s study of 2.3 million pages found that sites with over 1.16 million monthly visitors earn an average of 6.4 citations per query, compared to just 2.4 for sites with fewer than 2,700 visitors — a 3x difference. Branded web mentions have a correlation of 0.664 with AI Overview appearances, far higher than any other single factor.

What this means for SMBs: You can’t manufacture domain authority overnight. But you can build brand mentions through industry directory listings, trade press citations, client testimonials on third-party platforms, and active participation in relevant online communities. Sites with Trustpilot, G2, or industry directory profiles have 3x higher chances of ChatGPT citation (SE Ranking, November 2025).

Signal 2: Content depth and fact density

Content depth and fact density have a significant correlation with citation rates across all major AI platforms.

From our GeoRankLocal rubric testing, we’ve found that the “fact density” metric — calculated as the count of digits, percentages, currency symbols, and year-pattern occurrences per 1000 words — cleanly separates citable content from generic marketing copy. Marketing brochure sites typically score under 3.0. Well-researched editorial content scores 8-20.

AirOps found that early-discovery content with 5-7 statistics earns 20% higher citation likelihood. ChatGPT specifically prefers definite language, high entity density, and a balanced mix of facts and opinions (Growth Memo, February 2026).

Signal 3: Structured data (schema markup)

Schema markup acts as a cheat sheet for AI engines. It eliminates the need for the model to infer information from unstructured text. FAQPage schema is the most impactful type because it maps directly to conversational query formats.

From our audit data, sites with GEO-relevant schema score 15-25 points higher on our rubric than comparable sites without it, controlling for content quality.

Signal 4: Answer formatting and extractability

Pages structured with clear headings, FAQ blocks, summary-first paragraphs, and bulleted lists are cited more often because they’re easier for the model to extract clean quotes from. AirOps measured this: comparison pages with 3 tables earn 25.7% more citations. Validation pages with 8 list sections earn 26.9% more. Shortlist pages with fewer than 10 words per sentence earn 18.8% more.

Signal 5: Freshness

Content freshness is a hard requirement. 76.4% of ChatGPT’s most-cited pages were updated within the last 30 days. The model explicitly checks dateModified signals in Article schema and visible timestamps. Static “set and forget” websites are at a fundamental disadvantage.

What we observed in the audit data that surprised us

Running the GeoRankLocal audit tool against fifty UK sites surfaced findings that don’t appear in the standard GEO literature.

Finding 1: JavaScript rendering is a bigger problem than most people realise. Several otherwise legitimate sites returned fewer than 15 words of parseable content because their entire content layer loaded via client-side JavaScript. The sites looked fine to human visitors but were functionally invisible to AI crawlers. This affected roughly 20% of the sites we audited.

Finding 2: Entity inconsistency is rampant. Many businesses had slightly different names across their website, Google Business Profile, Trustpilot, and Companies House registration. AI engines cross-reference these signals to verify entity identity. Inconsistency reduces confidence and citation probability.

Finding 3: The gap between average and good is huge, but the gap between good and great is small. Moving a typical UK SMB site from 15/100 to 50/100 requires implementing basics that should take a developer a few days. Moving from 50/100 to 75/100 requires sustained content production and authority building over months. The initial lift is massive and cheap; the marginal gains above 50 are harder.

Finding 4: Pricing information dramatically helps citation. Sites that included clear pricing were more likely to appear in AI recommendations for commercial queries. AI engines prefer to give complete answers. If a user asks “how much does a plumber in Manchester cost?” the model wants to cite a source that actually states the price, not one that says “contact us for a quote.” Transparent pricing is a GEO signal, not just a sales tactic.

What this means for your business

Understanding the retrieval pipeline changes how you think about every GEO decision.

Why FAQ schema matters: because it maps directly to Stage 4 (candidate scoring for relevance and extractability). You’re pre-formatting your content into the exact shape the model needs to cite you.

Why content depth matters: because it influences Stage 4 (authority scoring) and Stage 5 (synthesis). The model needs enough substance to paraphrase from.

Why directory listings matter: because they influence Stage 3 (candidate retrieval through brand mention fan-out queries) and Stage 4 (authority scoring through cross-platform entity verification).

Why freshness matters: because the model explicitly checks it at Stage 4 and downgrades stale content.

Why consistency matters: because the model cross-references at Stage 4, and inconsistency creates ambiguity that reduces citation confidence.

Every GEO tactic maps to a specific stage in the retrieval pipeline. Once you see the pipeline, the tactics stop feeling arbitrary and start feeling inevitable.

The honest summary

AI search engines don’t pick businesses randomly. They run a structured retrieval pipeline with identifiable signals at every stage. The five signals that matter most — domain authority, content depth, structured data, answer formatting, and freshness — are all measurable and all improvable.

From the audits we ran this month, the vast majority of UK SMBs aren’t optimising for any of these signals. The few that are — even partially — are dramatically over-represented in AI citations relative to their competitors. The pipeline rewards the prepared. It ignores everyone else.

The mechanics aren’t mysterious. They’re mechanical. And that’s the best possible news for any business willing to do the work.

Sources

  1. GeoRankLocal internal audit data, March-April 2026 (50 UK service business sites)
  2. Position Digital, “100+ AI SEO Statistics for 2026.” https://www.position.digital/blog/ai-seo-statistics/
  3. SE Ranking, “AI Citation Factors Study,” November 2025
  4. The SEO Works, “75 AI SEO Statistics for 2026.” https://www.seoworks.co.uk/downloads/ai-seo-statistics/
  5. AirOps, “AI Citation Research Report,” April 2026
  6. Growth Memo, “AI Mode User Behaviour Analysis,” April 2026
  7. Wowbix, “GEO vs SEO in 2026.” https://wowbix.com/geo-vs-seo/
  8. Chris Long and RESONEO, “GPT-5.4 Search Behaviour Analysis,” April 2026
  9. Schema.org specification, https://schema.org
A

Awais M.

Founder of GeoRankLocal

Awais M. is the founder of GeoRankLocal, a UK-wide agency that builds AI-citable websites and manages ongoing GEO and SEO for businesses across the United Kingdom. He’s a Chartered Certified Accountant by background and writes about generative engine optimisation, the shift from search to AI discovery, and what UK SMBs need to do to stay visible in the AI search era.

Want to get your business cited by AI?

Get a free AI visibility audit from GeoRankLocal.