deep-research
Investigating OpenAI’s “Deep Research” Implementation⚑
1. System Architecture and Technology Stack⚑
Agent-Based Architecture: Deep Research is implemented as a specialized AI agent within the ChatGPT platform (OpenAI Launches Deep Research: Advancing AI-Assisted Investigation - InfoQ). Unlike a single-turn chatbot response, this agent runs independently for an extended duration (5–30 minutes), performing multi-step tasks autonomously (OpenAI Launches Deep Research: Advancing AI-Assisted Investigation - InfoQ). At its core is a customized large language model (LLM) based on OpenAI’s GPT-4 architecture – referred to as the “o3” reasoning model – which has been optimized for long-form reasoning, web browsing, and data analysis (OpenAI Launches Deep Research: Advancing AI-Assisted Investigation - InfoQ). This model is the brain of the system, orchestrating the research process and integrating with various tools.
Microservices and Tools: The Deep Research system is modular, with distinct components (microservices or tools) handling different functions. Key pieces likely include:
- Web Search Service: Allows the agent to query the internet in real-time for relevant information. OpenAI provides a built-in web search tool (based on the same model used in ChatGPT’s search feature) to retrieve up-to-date results with citations (OpenAI will let other apps deploy its computer-operating AI | The Verge). This is presumably backed by a search API or engine (OpenAI hasn’t disclosed the provider, but it “allows developers to get real-time information and citations from the web” (OpenAI will let other apps deploy its computer-operating AI | The Verge)).
- Browser/Content Fetcher: Enables the agent to navigate to web pages, retrieve their content (HTML, text, PDFs, images), and pass that content back to the LLM for analysis (Deep Research FAQ | OpenAI Help Center). This component handles live crawling of pages returned by the search results. It likely includes parsers (for HTML, PDF, etc.) and possibly OCR or vision models for images so the LLM can interpret non-text data.
- Python Execution Environment: A sandboxed code runner (similar to OpenAI’s Code Interpreter) that the agent can use for data analysis, computations, or generating charts/graphs (OpenAI Launches Deep Research (New AI Feature for ChatGPT): What it Do and How to Use It - GeeksforGeeks) (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch). The Deep Research model can decide to invoke this tool when needed – for example, to crunch numbers from a dataset or create a visualization. This environment is isolated for security and managed by OpenAI’s infrastructure (allowing only safe libraries and limited runtime).
- Orchestration Layer: An internal orchestration system coordinates the LLM and tools. OpenAI’s recently announced Agents SDK is likely the framework underlying this coordination. The Agents SDK lets the developer (or system) define a set of available tools and then allows the AI agent to autonomously decide which tools to use, and in what sequence, to accomplish the goal (Mastering OpenAI’s new Agents SDK & Responses API [Part 1] - DEV Community) (Mastering OpenAI’s new Agents SDK & Responses API [Part 1] - DEV Community). Each tool (search, browse, code, etc.) is exposed to the model via function calls. The Responses API serves as the interface for this, enabling the model to call functions dynamically and receive structured results (Mastering OpenAI’s new Agents SDK & Responses API [Part 1] - DEV Community). In practice, the Deep Research agent’s LLM “thinks” about the user query, emits a function call (e.g.
web_search(query)
), the orchestrator executes that via the search service, then feeds the results back into the model’s context. This loop continues with the agent using tools and gathering information until it decides to stop and produce the final report. - Intermediate Storage & Memory: During a 30-minute research session, the agent might accumulate a lot of data. The system likely maintains an ephemeral memory or storage for intermediate results – e.g. caching fetched web pages or storing summaries – so that the model can reference information discovered earlier without re-fetching it repeatedly. This could be an in-memory cache or a short-term database that holds the texts of visited pages, parsed data from files, etc. (OpenAI hasn’t detailed this, but such a store would be essential to handle “hundreds of sources” of information efficiently). The model itself has a large context window (GPT-4’s context can be tens of thousands of tokens) to hold many pieces of information at once, but for very long sessions the system might use summarization or compression of earlier findings to keep the context size manageable.
- AI Model Integration Pipeline: The GPT-4 “o3” model is integrated via OpenAI’s model-serving stack (likely running on clusters of GPUs for inference). This model was “trained through reinforcement learning on real-world tasks requiring browser and Python tool use”, meaning the pipeline includes not just the base model, but also policy logic refined by feedback (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch). The inference pipeline uses the function-calling capabilities of OpenAI’s Chat Completion API to let the model output tool actions in a structured format. The Agents SDK/Orchestrator monitors the model’s outputs; when a tool call is requested, it pauses the model, invokes the tool, and then resumes the model with the tool’s result injected into the prompt for the next step. This cycle is repeated iteratively. The entire process is managed as a job – when the user invokes Deep Research, a task is queued and the agent works asynchronously, streaming partial progress to the UI “Activity” sidebar and finally delivering the complete result.
In terms of technology stack, OpenAI likely utilizes common cloud-native components: the services (search, browsing, Python sandbox) run on scalable infrastructure (Kubernetes or similar), and the LLM inference runs on optimized GPU servers. The Agents SDK itself is provided in Python (Mastering OpenAI’s new Agents SDK & Responses API [Part 1] - DEV Community) (Mastering OpenAI’s new Agents SDK & Responses API [Part 1] - DEV Community), suggesting the orchestrator is written in Python and integrates with OpenAI’s API (for model calls and tools). The web browsing might leverage headless browser frameworks or custom HTTP clients. Overall, the architecture is a microservices + orchestrated AI model design: each tool is a microservice, and the LLM agent is the “brain” coordinating them via an orchestration layer (OpenAI will let other apps deploy its computer-operating AI | The Verge). This modular approach makes the system extensible – OpenAI noted that future versions will connect to “more specialized data sources, including subscription-based and internal resources” via this tool interface (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch). In essence, Deep Research is built like a research assistant agent: it has access to a suite of services (search engine, browser, code executor, etc.) and a powerful reasoning model that drives the whole process.
2. Web Research Execution⚑
Live Internet Searching: Deep Research performs live web research by issuing search queries and crawling online sources in real time. It does not rely solely on a pre-indexed corpus; instead, it actively queries the web when a task is run. According to OpenAI, the feature “conducts multi-step research on the internet for complex tasks” (Introducing deep research - OpenAI). Under the hood, the agent uses a web search API/tool to retrieve current information. For example, it might query news sites, academic databases, or general search engines depending on the prompt. (OpenAI’s partnership with Microsoft suggests it may use Bing’s search API, though official sources just describe it generally as a web search tool with real-time info (OpenAI will let other apps deploy its computer-operating AI | The Verge).)
Once the search results are obtained, the agent iteratively visits relevant links. It can click through multiple pages and even follow links within those pages if needed. As it “brows[es] the web [and] interpret[s] content” (OpenAI Launches Deep Research: Advancing AI-Assisted Investigation - InfoQ), the system parses the page text (and possibly metadata) and feeds those contents into the GPT-4o model for analysis. This is akin to a human opening search hits and reading them to gather facts. The agent’s design emphasizes finding authoritative sources: OpenAI has indicated that Deep Research tries to use verified, reputable sources (e.g. scientific publications or official statistics) for information (OpenAI Launches Deep Research (New AI Feature for ChatGPT): What it Do and How to Use It - GeeksforGeeks). In practice, that means the agent’s search queries and link choices skew towards high-quality domains (for example, if researching a medical question, it might specifically seek pages on PubMed or WHO). The model likely learned during training to prefer sources that humans would consider trustworthy. However, it isn’t infallible – OpenAI cautions that the agent can sometimes fail to distinguish authoritative information from rumors or less reliable content (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch). To mitigate this, the agent doesn’t just grab the first answer it sees; it aggregates from multiple sources and provides citations so the user can verify claims (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch).
Source Evaluation and Selection: The Deep Research agent uses its LLM reasoning to evaluate search results and page content. It will typically perform multiple search queries throughout a session, refining them as it learns more about the topic. The decision of which link to click or which source to trust is part of the agent’s learned policy. In OpenAI’s earlier WebGPT research, the model was trained to quote from sources and was rewarded for citing trustworthy sites ([2112.09332] WebGPT: Browser-assisted question-answering with human feedback). That research showed improved factual accuracy when the model had to back up answers with references. Deep Research inherits this philosophy: it actively “collects references while browsing in support of [its] answers” ([2112.09332] WebGPT: Browser-assisted question-answering with human feedback). The agent likely looks at cues like domain reputation (e.g., .edu
, .gov
sites), recency of information (to get up-to-date data), and consistency across multiple sources. If three different reputable sources converge on the same fact, the agent gains confidence in that information. Conversely, if sources conflict, the agent might note the discrepancy or seek additional references. This behavior aligns with one of Deep Research’s stated capabilities: “evaluating arguments, identifying biases, and suggesting counterpoints” (OpenAI Launches Deep Research (New AI Feature for ChatGPT): What it Do and How to Use It - GeeksforGeeks) – essentially fact-checking and critical analysis. The model’s training via reinforcement learning would have included scenarios requiring it to decide which pieces of information to trust and include in the final report (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch).
Multi-Modal and Deep Content Handling: Beyond basic web pages, Deep Research can handle various content formats encountered during research. It can “read through text, PDFs, images, and more” as it gathers information (Deep Research FAQ | OpenAI Help Center). For PDFs, the system likely has a PDF-to-text converter (extracting text from the document for the LLM to ingest). If an image contains useful information (say, an infographic or a chart), the agent can utilize GPT-4’s vision analysis or an OCR tool to interpret it. OpenAI’s GPT-4 model has multi-modal capabilities (vision), so Deep Research might leverage that when it “sees” an image – for example, understanding a diagram or reading the text in a screenshot. All of this is orchestrated through tool use: an “image reading” function or document parser function would feed the processed content back to the model. By supporting these formats, the agent is not limited to HTML web articles; it can digest academic papers (PDFs), datasets or tables (via CSV/Excel and the Python tool), and even videos transcripts if needed (though video handling isn’t explicitly mentioned, text transcripts would be handled as any document).
Summarization and Synthesis: As the agent collects information from potentially hundreds of pages, it must condense and synthesize that data. The GPT-4o model performs on-the-fly summarization of sources to keep only the relevant points in focus. For instance, if one source provides a definition, another gives a statistic, and another an expert opinion, the model will distill each and integrate them into its working notes. The “Activity” sidebar in the UI shows a running summary of the model’s thought process and the websites visited (Deep Research FAQ | OpenAI Help Center) – this indicates that the agent is maintaining an internal summary as it goes. The thought summary might say, e.g., “Searched for latest EV car models, reading review on CarAndDriver… gathering specs for top 3 electric SUVs”. This not only keeps the user informed, but also serves as the model’s scratchpad to avoid forgetting what it’s found. The model uses these notes to decide next steps (what to search for next, which angle to explore) and ultimately as raw material for the final report.
Fact-Checking and Citations: Before presenting the final output, Deep Research cross-checks important facts and ensures each claim can be traced to a source. OpenAI explicitly requires that “every ChatGPT deep research output will be fully documented, with clear citations… making it easy to reference and verify the information” (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch). In practice, the agent inserts footnote-style citations in the report, linking specific statements to the source URLs or documents. It can even “cite specific sentences or passages from its sources” when quoting or closely paraphrasing (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch). This citation mechanism was likely baked into the prompt or learned during training (the model knows to output not just an answer, but append a reference marker for each fact). By design, this encourages the model to base its answers on found information (grounding the output) rather than purely on its parametric memory or imagination. It’s a safeguard against hallucination: if the model can’t find a source for something, it’s less likely to assert it as fact. Additionally, the presence of multiple citations allows a form of post-hoc fact-checking – the user can click the cited sources to verify correctness or read more detail. The system doesn’t perform a separate automated fact-check beyond what the agent itself does, but the multi-source synthesis inherently means the content has been vetted through several references during the research process.
In summary, the web research execution involves live crawling of the internet via search queries, iterative filtering of relevant information, and thorough documentation. Deep Research essentially operates like a digital research analyst: it queries, reads, notes, cross-checks, and finally produces a well-sourced summary. OpenAI’s infrastructure ensures this happens within a safe sandbox – for example, the agent has read-only browsing (it can’t perform unauthorized actions on websites beyond fetching information), and any code it runs (via the Python tool) is in a controlled environment. The result is a balance between wide-ranging information access and controlled, reliable output. Early evaluations suggest this approach improves accuracy: on a challenging benchmark of expert-level questions (Humanity’s Last Exam), the Deep Research model scored twice as high as previous models, indicating its ability to understand context and retrieve the right info is superior (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch). Of course, if the web contains false or biased information, the agent might pick it up – hence the emphasis on user verification and critical review of the cited sources (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch). But overall, the system’s design (especially the citation requirement and use of multiple sources) is aimed at delivering well-substantiated answers rather than quick, unsupported replies.
3. Natural Language Query Understanding and Response Generation⚑
Understanding User Queries: When a user submits a query to Deep Research, the system first interprets what exactly the user is asking for and what type of output is needed. This goes beyond basic language understanding – the agent tries to form a research plan from the query. The user might ask a broad question (“Compare the top electric cars in terms of safety and price”) or a very specific one (“Find any published clinical trial results for drug XYZ in the last 5 years”). In either case, the GPT-4o model will parse the request and internally break it down into sub-tasks. This is aligned with the chain-of-thought (CoT) reasoning approach, where instead of jumping directly to an answer, the model “generate[s] a sequence of intermediate reasoning steps – essentially, to ‘think out loud.’” (OpenAI Deep Research Explains Itself - by Brian Chau). By reasoning step-by-step, the model can figure out: What are the key pieces of this question? What information do I need first? Is the query ambiguous or does it require clarification? OpenAI has even built a mechanism for clarification: if the query is complex or underspecified, Deep Research may generate a short form or follow-up questions to capture specific parameters before it starts (Deep Research FAQ | OpenAI Help Center). For example, if a user asks for “the best intermediate snowboard,” the agent might prompt the user (via a form in the UI) to specify budget, height, skill level, etc., to tailor the research (Deep Research FAQ | OpenAI Help Center). This indicates that the system uses the LLM to not only answer but also to query the user for missing details – a form of interactive query understanding.
Task Decomposition: After clarifying the request, the agent decomposes the problem internally. Using the chain-of-thought technique, the model might outline steps such as: 1) search for background on the topic, 2) find data or specific facts required, 3) analyze or compare the data, 4) formulate conclusions/recommendations. This process happens in the model’s “mind” (its hidden scratchpad) and is guided by prompt instructions and the model’s training. In fact, OpenAI’s Deep Research agent was likely trained with a prompting paradigm similar to ReAct (Reason+Act) or other agentic frameworks, where the model’s output alternates between reasoning statements and actions (tool calls). The OpenAI Responses API makes this possible by allowing the model to emit a structured action (like {"tool": "web_search", "input": "latest EV car safety ratings"}
) as part of its completion (Mastering OpenAI’s new Agents SDK & Responses API [Part 1] - DEV Community). The orchestrator then executes that action and returns the result to the model, which continues its chain-of-thought with new information. This loop continues, so the model is effectively translating the user’s natural language question into a sequence of research steps. It might not explicitly output the plan to the user, but the “Activity” sidebar shows a summary of its thinking and steps (Deep Research FAQ | OpenAI Help Center), confirming that it is indeed following a multi-step game plan internally. Each iteration, the model reads the latest tool result and decides the next move (“Do I need another source? Should I run analysis on data I found? Have I gathered enough to answer the question?”).
LLM Orchestration & Prompt Engineering: The Deep Research feature relies on heavy prompt engineering behind the scenes. There is a specialized system prompt (the “deep research global prompt”) that primes the model at the start of the session (Deep Research FAQ | OpenAI Help Center). This prompt likely outlines the agent’s role (“You are ChatGPT’s Deep Research agent, an AI researcher that can use tools to find and verify information…”), the tools available (with instructions for how to format tool calls), and the requirements for the final output (e.g., “provide a comprehensive report with sources cited for each major claim, and include a summary of your reasoning”). This system prompt acts as the blueprint for the agent’s behavior. Additional prompt instructions might be injected during the process, for example: after each tool use, the orchestrator could add a brief summary of what was found into the prompt (so the model can keep a running memory), or a reminder of the task goals to keep it on track.
The Agents SDK essentially automates much of this orchestration, so developers (and OpenAI internally) don’t have to hand-craft every prompt per step. Instead, the model was fine-tuned to follow a certain format for reasoning and tool use. One common pattern from research is to have the model output something like: “Thought: I need to find X. Action: web_search[‘query about X’].” The Responses API likely captures anything in the model’s output that looks like an action and executes it, then appends the result as observation: “Observation: (content from the web page)”. The model then continues: “Thought: From this content, it seems Y… Next, I should …” and so on. By structuring the interaction this way (often called thought-action-observation loop), the agent can dynamically respond to what it finds. OpenAI’s Olivier Godement described it as chaining “atomic units” of work (model + tool) to achieve complex tasks, which the Agents SDK manages (OpenAI will let other apps deploy its computer-operating AI | The Verge). This is a form of LLM orchestration where the agent’s long session is broken into many smaller LLM calls, each guided by the previous step’s outcome.
Reinforcement Learning and Model Tuning: A major reason the Deep Research agent can parse queries and execute on them so effectively is that it has been specially fine-tuned (and trained with reinforcement learning) for these research tasks. OpenAI took their base GPT-4-level model and further trained it on “real-world tasks requiring browser and Python tool use,” giving it feedback on how well it performed those tasks (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch). Likely, the training involved simulating many research sessions: the model was asked a question, it generated a sequence of tool uses and a final answer, and it got rewarded for correct, well-sourced answers. Over time, this training teaches the model how to decide on sub-tasks and when to invoke which tool. For example, the model learns that if the question is about data or numbers, it should use the Python tool to compute or graph rather than trying to do math in its head. Or if the question is recent (“latest developments on X”), it learns to use the search tool to get current information. The outcome is an internal policy that is optimized for multi-step reasoning. In other words, the model not only understands natural language, but also has a kind of meta-cognitive ability to figure out “what steps will lead me to the answer?”. This makes a huge difference in complex queries: instead of the user guiding it step by step, the agent self-directs its research. The model’s architecture (GPT-4) was already well-suited for following instructions and reasoning, but this fine-tuning with tools and feedback supercharges its effectiveness at these tasks (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch).
Response Generation (Report Writing): After gathering sufficient information, the agent enters the final phase: composing the answer. Here, the LLM consolidates everything it has learned into a coherent natural language report. The style of the report is detailed and structured – often with an introduction, body sections (which may cover different facets of the query), and a conclusion or summary. Throughout the text, it interweaves the citations corresponding to facts or quotes. The generation is handled entirely by the LLM (no templates are strictly imposed, but the system prompt likely provides a general expected format). Since the agent has been tracking its “thought process” and key points from sources, it uses those as the basis for the write-up. The user’s original question and any follow-up specifications are also part of the context, to ensure the answer is on target. Notably, OpenAI has the model also produce “a summary of its thinking” as part of the output or alongside it (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch). In the UI, the user can toggle an “Activity” view that shows the research steps (sources visited, etc.) and a “Citations” view that lists all sources (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch). The final answer the user sees is the polished report in the chat thread, while these other views provide transparency.
(OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch) Example of a Deep Research final report and the accompanying citations (shown on the right). In this case, the user asked for recommendations on intermediate freestyle snowboards, and the agent produced a detailed report (“Snowboards for Intermediate Freestyle Riders”) citing 15 different sources. The interface lets the user inspect which references back up each part of the answer, underscoring the system’s focus on verifiable information.
Under the hood, generating the final answer might involve another prompt to the model like: “Now compile all the gathered information into a comprehensive answer. Be sure to cite sources for each claim. If applicable, include an overview of how you approached the problem.” The model then outputs the answer in a single long completion (streamed to the user once ready). Currently, outputs are text-only, but OpenAI plans to augment them with visuals: “embedded images, data visualizations, and other ‘analytic’ outputs soon” are on the roadmap (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch). This means the agent will be able to include charts it generated or pertinent images from the web in-line with the text. The model is already capable of creating plots via the Python tool and can insert those (the system can handle images as part of the chat response). In future, if a question is, say, data-heavy (“Analyze sales trends and show a graph”), the Deep Research answer could contain a graph image along with the explanation.
Prompt Engineering & Guardrails: The prompts used ensure the model remains on task and handles the query responsibly. For instance, the system likely includes instructions like “If you encounter conflicting information, note it and seek clarification” or “Do not include information that cannot be substantiated by sources.” The model also has safety filters from OpenAI’s side: queries that would lead to disallowed content are handled by the broader ChatGPT safety system. Moreover, because the agent can browse, OpenAI must have filters on the browsing tool (e.g., to avoid going to illicit sites or accessing private data). These aspects are part of the logic handling (some are implemented in the tools layer, some in the model’s prompt). On the model generation side, to avoid things like plagiarism, it’s instructed to either paraphrase or quote with attribution rather than just copying large text from sources. All these are prompt-engineered behaviors that OpenAI has tuned during development.
In summary, the natural language query is translated into a series of actions by the LLM, using advanced prompting and a fine-tuned reasoning policy. The model effectively understands not just language but the intent and requirements behind the question, which allows it to produce a high-quality, targeted answer. By orchestrating multiple steps (search, read, analyze, write) it behaves much like a human expert researcher might. Techniques like chain-of-thought prompting significantly enhance its ability to tackle complex problems by breaking them down (OpenAI Deep Research Explains Itself - by Brian Chau), while the integration of tool use gives it the “hands” to act on its thoughts (searching the web, executing code, etc.). The final response is then generated in fluent, structured natural language by the same powerful model, ensuring the answer is not only factually grounded but also well-explained and contextually relevant.
4. User Request Routing and Logic Handling⚑
Detecting When to Use Deep Research: OpenAI’s system distinguishes between a normal ChatGPT query and one that warrants the Deep Research agent primarily through explicit mode selection by the user. In the ChatGPT interface, the user must click the “Deep research” option before submitting their query (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch). This toggles the back-end to route the request to the deep research pipeline instead of the standard GPT-4 model. Essentially, the user’s choice is the trigger – e.g., if you simply ask a question in the regular mode, ChatGPT will answer with a quick turn-around (possibly using a brief search if needed), but if you switch to Deep Research mode, the system knows to invoke the heavy-duty agent. This design gives users control over when to spend the extra time (and their monthly quota of deep-research queries) on a question. It aligns with OpenAI’s guidance: “If you just need a short response or a casual conversation, deep research is likely not necessary… Deep research shines in complicated, multi-layered inquiries requiring data from across the web.” (Deep Research FAQ | OpenAI Help Center) (Deep Research FAQ | OpenAI Help Center). In other words, straightforward questions are handled by the standard LLM, whereas complex or open-ended questions can be escalated to the Deep Research agent by user intent.
Behind the scenes, OpenAI could also implement some automatic logic to suggest Deep Research for certain queries. For example, if a Plus user asks, “Can you write a detailed market analysis on X with references,” the system might prompt the user: “This looks complex. Would you like to use Deep Research?” (This kind of suggestion isn’t confirmed in documentation, but it would be a sensible UX feature.) Regardless, as of now the routing is mostly manual via the UI selection. On the developer side, using the API, one could programmatically decide to route a query to a Deep Research agent. For instance, an application could have a classifier that detects that a user’s question is broad or requires current data and then call the Responses API with the deep-research agent enabled. OpenAI’s platform supports this by marking the request accordingly – the Compliance API even allows enterprise users to identify deep research queries by a flag or special prompt string (Deep Research FAQ | OpenAI Help Center). This implies that under the hood, when Deep Research mode is engaged, the system includes a "tool_name":"deep_research"
in the conversation metadata or prompt, which signals the backend to use the Deep Research agent and workflow (Deep Research FAQ | OpenAI Help Center).
Simple vs Complex Query Handling: The difference between a normal query and a “deep research” query in practice comes down to depth and interactivity. A simple query (e.g., “What’s the capital of France?” or “Give me a quick summary of the latest iPhone features”) does not require multi-step reasoning; ChatGPT can answer it in one turn, perhaps with a quick search if it’s current. ChatGPT’s built-in Search feature (formerly “Browsing”) is optimized for these “quick, interactive real-time answers”, pulling a few web snippets and returning a brief summary with source links (Deep Research FAQ | OpenAI Help Center). Deep Research, on the other hand, is intended for when the user’s request explicitly or implicitly demands comprehensive analysis – for example, comparing many options, compiling a report, or researching an unfamiliar topic in detail. In those cases, the user opts into a longer wait in exchange for a much more thorough answer. OpenAI describes the difference succinctly: “Search is great for quick… answers… It pulls info from the web and gives a brief summary with links. Deep research… searches through hundreds of sources, analyzes the information, and puts together a detailed report with citations and data.” (Deep Research FAQ | OpenAI Help Center). So, the system differentiates by scope: few sources vs many sources, seconds of work vs minutes of work, short answer vs in-depth report.
Routing to the Appropriate Module: Once the user (or system logic) decides on deep research, the request is handed off to the Deep Research backend. Technically, this might involve routing the query to a different API endpoint or service dedicated to the agent. OpenAI likely maintains separate model endpoints for GPT-4o (the deep research model) and the standard GPT-4 model. The Orchestrator sees the flag and spawns a Deep Research agent session, whereas a normal query would just prompt the chat model directly. It’s akin to selecting a different “skill” for the AI. Notably, ChatGPT also has other agent modes (OpenAI also introduced an **“Operator” agent for executing tasks on a computer, for example) (OpenAI will let other apps deploy its computer-operating AI | The Verge) (OpenAI will let other apps deploy its computer-operating AI | The Verge). Each of these agent modes is invoked by a user choice and then routed to its specialized model+tool stack. This modular architecture ensures that using Deep Research is a conscious decision – it sandboxes the heavy agent so it doesn’t inadvertently run for every query.
The system also incorporates usage limits and user tier logic as part of routing. Deep Research was first rolled out to Pro subscribers, with Plus/Team to follow, each with a fixed number of deep research queries allowed per month (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch) (Deep Research FAQ | OpenAI Help Center). The backend checks the user’s plan and remaining quota when routing the request. If the user has exceeded their allowance, it might refuse or ask to wait until it resets. This mechanism indirectly encourages using Deep Research only when necessary (since it’s a limited resource), thereby filtering out trivial queries. A Plus user, for example, only gets 10 deep research runs per month initially (Deep Research FAQ | OpenAI Help Center), so they’ll save it for truly complex tasks. In terms of logic handling, this means the system must manage a queue or scheduling for deep research jobs – since each can run up to 30 minutes, the system might not run them all concurrently if resources are limited. There may be an orchestration service that manages these long-running sessions, ensuring each gets the required model and tool time, and that results are returned to the correct user session upon completion (with a notification as noted in the UI: “you’ll get a notification when the research completes” (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch)).
Mixing Modes in a Conversation: Another aspect of routing is what happens after a deep research response is delivered. The Deep Research output appears in the chat like a message from ChatGPT. The user can follow up with further questions. At this point, the user might continue in normal mode (using the info from the report as context for GPT-4 to answer quickly) or trigger another deep research run. The system likely treats the deep research result as part of the chat history, so the next prompt could either be answered by the standard model (if the user doesn’t explicitly invoke deep research again) or by launching a new deep research sequence. The chat interface allows switching modes per query, not something that happens automatically mid-conversation. This is important for logic handling: the system doesn’t permanently switch the user to an agent – it’s a per-query decision. So routing is determined each time the user enters a query. The state (chat history) is preserved across modes, which suggests the backend can feed the previous conversation (including perhaps a summary of the deep research findings) into the prompt for a normal ChatGPT answer if needed. This seamless hand-off is an area of ongoing UX improvement. As of the latest info, mobile and desktop integrations of deep research are planned, indicating that the routing logic will extend to those clients as well, not just the web UI (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch).
Example of Routing Logic: Imagine a user asks: “Should I buy or lease a car? Provide a detailed analysis with recent data on cost differences.” If they are in normal mode, ChatGPT might give a generic but reasonably informed answer from its training knowledge. It might optionally use the Search feature to grab a statistic or two, but it won’t deeply scour sources. If the user specifically turns on Deep Research for this query, the system will route it to the agent. The agent will then likely search for latest car market trends, maybe pull data from financial sites or calculators, possibly run a quick computation (via Python) comparing long-term costs, and then produce a multi-page report with citations from say, consumer reports, financial blogs, etc. The difference in output quality and depth would be significant. Internally, the routing decision was simply the user clicking that “Deep Research” button – everything after follows the deep research code path.
From a developer perspective, OpenAI’s design is to keep these flows separate to avoid confusion and performance issues. The standard ChatGPT (GPT-4) path is tuned for speed and conversational coherence, while Deep Research’s path is tuned for thoroughness and tool use. By explicitly routing queries, the system doesn’t have to dynamically decide “shall I do a deep dive?” for every input – that would add unnecessary overhead and potential errors. Instead, the user (or an application’s logic) makes that call. That said, the capability to decide could be built in: the model could have a trigger phrase or detection (“when the user asks for a comprehensive report, automatically switch modes”). OpenAI might explore more seamless integration in the future, but initial rollout keeps it user-driven.
In conclusion, user request routing for Deep Research is a gated choice that directs the query to a specialized research agent instead of the standard chat responder. Simple queries go down the lightweight path (fast response, minimal tool usage), whereas complex research questions go down the heavyweight path (invoking the full web-mining, multi-step agent). This separation ensures efficiency and clarity – users get to choose depth vs. speed. OpenAI’s documentation emphasizes using the right tool for the job: “For shorter, real-time conversations or simpler queries, you might prefer using GPT-4o (with or without Search), which responds almost instantly. Deep research, on the other hand, is built for more complex tasks that require greater depth and thoroughness.” (Deep Research FAQ | OpenAI Help Center) (Deep Research FAQ | OpenAI Help Center). The system architecture supports this by maintaining distinct modules and pipelines for each, and by providing the interface and APIs to route queries appropriately. By prioritizing an explicit routing mechanism, OpenAI ensures that the powerful but resource-intensive Deep Research feature is applied only when it truly adds value – delivering exhaustive, source-backed answers for complex questions – while the everyday Q&A is handled with the usual speed and conversational finesse of ChatGPT.
Sources: The implementation details above are drawn from OpenAI’s official announcements and documentation, as well as analyses by credible tech outlets and researchers. OpenAI’s announcement of Deep Research describes it as an agent leveraging the GPT-4 (o3) model with tool integration for web browsing and data analysis (OpenAI Launches Deep Research: Advancing AI-Assisted Investigation - InfoQ) (OpenAI Launches Deep Research: Advancing AI-Assisted Investigation - InfoQ). InfoQ and TechCrunch provided insight into the specialized model and its training (optimized for reasoning and equipped with browsing/Python tools) (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch) (OpenAI unveils a new ChatGPT agent for ‘deep research’ | TechCrunch). The OpenAI Help Center FAQ offers guidance on when to use Deep Research vs. normal search, and how it operates with citations and a thinking log (Deep Research FAQ | OpenAI Help Center) (Deep Research FAQ | OpenAI Help Center). Additionally, OpenAI’s release of the Agents SDK and Responses API gives a peek into the architecture of such agent systems, highlighting how tools and LLMs are orchestrated in a modular fashion (OpenAI will let other apps deploy its computer-operating AI | The Verge) (OpenAI will let other apps deploy its computer-operating AI | The Verge). Finally, earlier research like OpenAI’s WebGPT paper illuminates the benefits of a browser-equipped model collecting references to improve factual accuracy ([2112.09332] WebGPT: Browser-assisted question-answering with human feedback) – principles clearly embodied in Deep Research’s design. Together, these sources paint a picture of Deep Research as a cutting-edge AI research assistant: one that combines a powerful language model with search, browsing, and analytical tools in a carefully structured pipeline to deliver trustworthy, in-depth answers.