Sitemap

What makes o1 different

4 min readSep 20, 2024

--

The release of OpenAI’s o1 model has confused many. Unlike previous models which were always about more (more parameters! More modalities!), this one was … sideways. Let’s talk about the technical differences first, and then give a real world anecdote of a discussion I experienced. Finally, we’ll conclude with some tips about which to use and when.

Technically, the difference is clear: o1 is an “agentic wrapper” around gpt-4 (or possibly a similar model OpenAI created). What this means is there’s a little bit of metacognition (thinking about thinking) going on before we start to answer the question. Rather than diving into the question immediately, o1 first analyzes the best approach to answering the question in terms of subtasks.

It then starts executing each of the subtasks. Sometimes, based on the answer it gets, it will reconsider the plan. This is most similar to what is generally called the “tree of thought” approach. You can see it happening while it is working — it literally gives you a bit of an explanation of the subtask it’s working on. For a deeper understanding of such agentic approaches, I highly recommend Andrew Ng’s series of letters.

This approach isn’t cheap — it literally costs 6 times as much (and is around 6 times slower, too). There’s no guarantee that this metacognitive approach always produces better answers. If it’s a straight out factual question, or a generative task such as coming up with trivia questions, it may actually make things worse.

But it makes new types of conversations possible. Here’s a practical example.

Recently, I wanted to understand a trend in multimodal LLMs: variational autoencoders. I had a superficial level understanding of what they were, but I had some questions, like why it works better than traditional autoencoders, and how training them differs. This isn’t something you can easily find with a search, it’s more something you would go to a domain expert and ask.

To improve my understanding, I used both gpt-4 and with o1. What I quickly learned was that o1’s answers were more considered, and actually engaged in a back and forth. GPT-4 basically kept repackaging the same information as if it had gotten to the extent of it depth and just kept repeating itself in different ways (as an aside, I’ve seen the same behavior in people sometimes too).

The best example of this was when I tried doing a replay to check my understanding.

Talking to o1 part 1: making sure my understanding is correct. Note it spent 27 seconds thinking about it, and then took my paragraph sentence by sentence, and analyzed it.
Talking to o1 part 2: the conclusion of the above question. Note how it clearly says what I got right, and addresses a key incorrect nuance in my understanding — it’s the differentiability that matters, not the multiplication.

Now compare that to when I had the same conversation with gpt-4. It
basically repeated the previous content about what the VAE was.

Talking to gpt-4 with the same question. It doesn’t really so much respond to what I say as repeat the definition of the VAE.

It kind of gets to the same conclusion, but you can see the o1 reads much more like how a work colleague would explain the answer to the question, specifically attending to the points you were making. In short, o1 gave a more considered answer. And that’s the key point.

To summarize, if we were to anthropomorphize, gpt-4 is like your super know-it-all friend who when you ask them a question starts talking stream-of-consciousness, forcing you to sift through what they’re saying for the gems.

o1 is more like the friend who listens carefully to what you have to say, scratches their chin for a few moments, and then shares a couple of sentences that hit the nail on the head.

How can we use this experience to inform our choices? Here are some applications where o1 is likely stronger than gpt-4 and worth the considerable cost:

  • As a tool for you as a person. Example: aiding you in improving your understanding of an issue, aiding you in working through a problem by responding thoughtfully to what you have to say.
  • Tasks where logical reasoning rather than factual knowledge is the key challenge in the problem. Examples: maths olympiad questions, a virtual partner in an escape room game.
  • Tasks where it makes sense to think about how to solve the problem first before diving in. Example: puzzles, multi-stage problems.

--

--

Waleed Kadous
Waleed Kadous

Written by Waleed Kadous

Co-founder of CVKey, ex Head Engineer Office of the CTO @ Uber, ex Principal Engineer @ Google.

No responses yet