惯性聚合 高效追踪和阅读你感兴趣的博客、新闻、科技资讯
阅读原文 在惯性聚合中打开

推荐订阅源

GbyAI
GbyAI
Y
Y Combinator Blog
Recent Announcements
Recent Announcements
D
Docker
Blog — PlanetScale
Blog — PlanetScale
罗磊的独立博客
美团技术团队
V
V2EX
Last Week in AI
Last Week in AI
D
DataBreaches.Net
T
The Blog of Author Tim Ferriss
宝玉的分享
宝玉的分享
Microsoft Security Blog
Microsoft Security Blog
Microsoft Azure Blog
Microsoft Azure Blog
人人都是产品经理
人人都是产品经理
M
MIT News - Artificial intelligence
P
Proofpoint News Feed
B
Blog RSS Feed
博客园_首页
B
Blog
博客园 - 叶小钗
I
InfoQ
WordPress大学
WordPress大学
L
LangChain Blog
Apple Machine Learning Research
Apple Machine Learning Research
freeCodeCamp Programming Tutorials: Python, JavaScript, Git & More
让小产品的独立变现更简单 - ezindie.com
让小产品的独立变现更简单 - ezindie.com
A
About on SuperTechFans
The GitHub Blog
The GitHub Blog
The Register - Security
The Register - Security
MyScale Blog
MyScale Blog
云风的 BLOG
云风的 BLOG
博客园 - 司徒正美
Latest news
Latest news
W
WeLiveSecurity
T
The Exploit Database - CXSecurity.com
OSCHINA 社区最新新闻
OSCHINA 社区最新新闻
aimingoo的专栏
aimingoo的专栏
小众软件
小众软件
Cyberwarzone
Cyberwarzone
Scott Helme
Scott Helme
D
Darknet – Hacking Tools, Hacker News & Cyber Security
C
CERT Recently Published Vulnerability Notes
C
CXSECURITY Database RSS Feed - CXSecurity.com
Recent Commits to openclaw:main
Recent Commits to openclaw:main
N
News and Events Feed by Topic
S
Secure Thoughts
The Hacker News
The Hacker News
Cyber Security Advisories - MS-ISAC
Cyber Security Advisories - MS-ISAC
Google DeepMind News
Google DeepMind News

Florian Brand

The Myth of unsafe Open Source AI The vibes in China’s AI labs Quo vadis, LLM benchmarks? Local models are (not) cope You Living in the Agentic Era A Guide to LLMs for Programmers Sane Python dependency management with uv How FastHTML sparked my joy in web development
Using OpenAI
Florian Brand · 2025-03-09 · via Florian Brand

OpenAI's Deep Research (ODR), which was released a bit over a month ago, has become a valuable tool for me. Initially, I wanted to compare it against the countless competitors, but none of them[1] come close to the quality of OpenAI's offering; most of them are either shallow listicle-fests or contain countless mistakes.

The versatility of ODR #

My most common use case is to use it as a research assistant to get into a new topic quickly. This is the most obvious and most advertised use case for Deep Research, which will only increase in quality and usage over time.

At least right now, the outputs cannot replace a full literature review, but they are good enough to get a good intro into a new topic, spanning around 3–6 papers usually, which saves around a day of finding and reading relevant papers. Aside from the summaries of those papers, ODR can also compare the approaches and findings and transfer them onto other domains.

Apart from that, ODR is also good for finding fixes for obscure bugs and niche libraries when tasked to scout GitHub issues and obscure StackOverflow posts for the solution. The underlying model behind ODR is trained to generate reports, so it will not fix the code for you, but instead give you a detailed rundown of the possible solutions.

A totally different use case, which I've not seen mentioned elsewhere, is using it as a shopping assistant for extremely niche products. As an example, I needed a heavy-duty rack which very specific and non-standard measurements to fit in a recess of my apartment. I have to admit, I was unable to find anything myself with some hours of searching, whereas ODR was able to find the single option satisfying the given constraints.

ODR is also capable of finding the cheapest option for a given product, which not even Idealo, the biggest price comparison website in Germany, could do. It is also capable of finding valid replacement parts for a given product when supplied with the product's name, something I expected competitors (like Perplexity) to accomplish, but they could not.

Getting the most out of ODR #

That said, ODR is not perfect and not as straightforward to use as other, LLM-based tools, despite the seeming simplicity of clicking the "Deep Research" button in ChatGPT and then typing in your prompt and answering the follow-up questions. One thing which isn't obvious: The model chosen when clicking the button doesn't matter, Deep Research will always use o3 behind the scenes, even when the selected model is 4o-mini.

The most important aspect of using ODR is that prompting matters (again). Whereas other products and LLMs these days are good enough for the majority of prompts, the difference in the outputs of ODR is night and day depending on the prompt.

Good prompts are highly specific and detailed, describing the goal, possible constraints and the desired output format. A viable approach is to use LLMs to generate a prompt for ODR, I've been using this prompt template from this tweet with o1-pro to generate a prompt for ODR. In my tests, using this template compared to prompting directly resulted in >40% longer reports, which go into more detail and are more structured.

Good prompts also specify the (sub)set of websites to use for the research. The default set of websites ODR chooses is a bit better than the usual, SEO-sloptimized first page of Google results, but this is far from perfect.

As examples: Adding to the prompt that the research should only use ArXiv (and similar sites) leads to better results for literature reviews; asking it to only use primary sources from NVIDIA leads to a correct comparison of GPU specs; asking it to use only Chinese-language sites like Weixin gives you better insights into the Chinese community than any third-party English-language site could give.

Limitations #

When using ODR for comparisons, e.g. to compare different papers or models, I found that the limit seems to be 2-3 comparisons max, i.e., comparing two models (like Qwen against GPT-4o) is fine, but adding more models to the comparison leads to numerous mistakes. I found this to be also true for other tasks: When you prompt it for too many things at once, the output will degrade quickly.

Aside from that, ODR has some technical limitations. It cannot access gated content, which will become an increasing part of the internet in the near future. It also cannot access YouTube nor the transcripts of YouTube videos. I also was unable to prompt it to use a transcription service to get the content of videos. It can, however, read PDFs, access images and execute Python code, although I haven't seen that for my queries yet.


  1. And I've tried quite a few, including some open-source projects, Gemini Deep Research, Perplexity Deep Research, Grok Deep Search, You ARI, even the recently released and hyped Manus. ↩︎