– Abhishek Iyer (@distantgradient), o3, Opus, ACME.BOT
TLDR takeaways:
I analyzed 50 real ChatGPT conversations by intercepting network traffic to uncover the patterns behind when and how ChatGPT searches the web.
- ChatGPT has an internal understanding of when to lookup information using web search. It will lookup information from the internet only when unsure, or the user has explicitly asked it to lookup.
- Adding keywords like “lookup X” or “latest Y” or “Z near me” will force a web search lookup.
- Ranking boosters like “
best”, “2025“,<current year>, “reviews” appear in ~30 % of ChatGPT’s generated web search queries. - For very well established verticals (like “
wordpress hosting”) ChatGPT provides an answer off the cuff from what it remembers. - When generating web search queries, ChatGPT likes to:
- add keywords “
top”, “reviews”, - the user’s location and the current year
- and authoritative sources (like “
who“, “cdc” for medical queries)
- add keywords “
With that TLDR out of the way, let’s take a step back and try to understand what is happening under the hood.
The Three Ways ChatGPT Answers Questions
When you ask ChatGPT* (or Gemini or Claude) something, it does one of the following things:
- Instant recall – Provides answers immediately from training data (like “
how many fingers on each hand“) - Reasoning – Thinks through a problem step-by-step (like “
how many fingers do 7 people have total“) - Web search – Looks up current information online (like “
who is the prime minister of Namibia“)
This third option – what we call using the “web search tool” – is where things get interesting. Just as you might Google “namibia prime minister“, ChatGPT does something similar. But unlike humans who search frequently, ChatGPT has vast knowledge built-in and only searches when specific conditions are met.
But when exactly does ChatGPT decide to search the web versus relying on its training? Let’s dig deeper into this decision-making process.
Understanding ChatGPT when search tool use is chosen
The difference between ChatGPT and us humans is that ChatGPT does not need to use the web search tool as often as humans do. Infact it is able to hold a staggering amount of information and can answer many questions without thinking much.
For example, most people will need to look up when asked for an “enchilada recipe”. But not ChatGPT. It can just spit out the answer to that question in a way similar to “how many fingers in a hand”.
There seem to be clear patterns when ChatGPT chooses to use the web search tool:
- Explicitly asked to “lookup” something (or using the search button)
- Query with a temporal portion (eg. “
best seo agent tools 2025”) - Query with a local / location bent (eg. “
mexican food near indira nagar bangalore”)
And not so clear patterns:
- when asked for business or services, sometimes it chooses to use the web search tool, sometimes not:
- “
cheapest wordpress hosting” does not use search tool - but, “
cheapest magento hosting” does use search tool.
- “
To better understand these patterns, I collected real data from ChatGPT conversations.
It seems like there is a classifier (dubbed “sonic_classifier_ev3”) that does only one thing: decide when to invoke the search search engine and when to not.
This classifier is likely trained to identify when queries can be answered based on ChatGPT’s training data vs not. (Interestingly training a separate classifier is different from the somewhat standard practice of letting the LLM itself decide on when to use the search tool).
Also, looking at the data from the queries, it became apparent that ChatGPT doesn’t just pass your query straight to the search engine. It transforms it first.
Query Translation Process

We refer to the process of converting a ChatGPT query to the search engine query as a “Query Translation Process”. Digging into this provides us with insights on what type of queries to target when specifically looking for ChatGPT traffic.
Let’s see this translation process in action with some real examples from the data.
A few real-world examples
| Raw user request | Engine queries fired (1-2 each) |
| build me a macro friendly meal plan 1800 kcal | “macro friendly meal plan 1800 kcal sample” · “best 1800 kcal meal prep ideas” |
| who regulates infant formula marketing in india | “india infant formula marketing regulation 2025” · “fssai infant formula advertising rules” |
| explain drm free pc games statistics | “drm free pc games market share 2025” |
| top rated pikler triangle india | “pikler triangle best reviews india” · “pikler climber buy india” |
Following seems to be the way the Query Translation Process works:
Inject context heuristically
- Year (2025) if the user hinted at recency or the topic is cyclical (finance, tech).
- Locale (eg. california) when costs or availability are implied. It can sometimes automatically do this via the ChatGPT memory (ie. what ChatGPT remembers about the user), even if the user has not explicitly specified any location.
- Authority prefixes (“who”, “cdc”) for health queries.
“manage snakebite in rural clinics” → “who snakebite management protocol rural clinic”
Strip imperatives & filler words
- Removes “lookup”, “find”, “show me”, “the”, “for”, etc.
“lookup sectors ETFs with the highest dividend yield 2025” → “sector ETFs highest dividend yield 2025”
Preserve hard nouns & adjectives (the “content spine”)
- Keeps domain nouns (ETF, telemedicine) and critical qualifiers (cheap, neonatal).
“affordable telemedicine platforms compliant with HIPAA” → “affordable telemedicine platforms HIPAA compliant”
Generate a query with a “ranking booster” – ie. terms added to cast a wider net
- Add best, top, review, pricing, guidelines, case study to cast a wider net.
Query 1: “equal weight vs market cap S&P 500 annual returns”
Query 2: “equal weight vs market cap S&P 500 performance review”
There are some patterns in the “ranking booster” terms that ChatGPT adds:
These patterns aren’t uniform across all topics. Different verticals have their own unique query transformation quirks.
Vertical-specific quirks you should know
- Health & Medicine – Prepends “who”, “cdc”, “nih”, often appends “guidelines” or “pdf”.
- Finance & Investing – Loves adding the current year and words like “returns” or “performance”.
- E-commerce / Pricing – Adds “best”, “cheap”, “reviews”, sometimes shoves the currency (“usd”, “inr”) into the query.
- Travel – If no region is mentioned it guesses from user profile (“india”) or from the attraction itself (“amalfi coast italy”).
- Shopping – Retains names verbatim (Pikler, Montessori) and throws in “buy” + locale.
Conclusion
Hopefully this helps better understand the inner workings of an LLM based Answer Engines like ChatGPT. We’re just about scratching the surface of what is available to be explored here.
Appendix: How was this data collected?
Data of conversations was collected using ChatGPT’s network traffic. O3 then parsed the raw network traffic to extract pairs of user request -> search query.
Additional context for the nerds:
Chrome DevTools ▶ Network ▶ filter for /backend-api/…/search_query calls.
Each call contains a JSON payload with two arrays: the raw human message and the list of search queries dispatched.
We saved fifty such turns to a .jsonl file, kept no personal identifiers, and analysed only the text fields.
*For the purpose of this article ChatGPT is ChatGPT 4o, the default model as of this writing.

Leave a Reply