We recently explored how generative AI can be used to solve complex business problems.
Later that day, OpenAI released a new model intended to push the boundaries of AI even further: Enter o1-preview, a model designed to elevate AI's capacity for advanced reasoning and complex thinking.
So this edition takes a closer look at o1-preview and what it means for AI-powered strategy and problem-solving.
The next stage of AI problem-solving: What OpenAI’s new model means for your business
As the name suggests, what we have access to today is still just a preview of the full model. OpenAI writes:
As an early model, [OpenAI o1-preview] doesn't yet have many of the features that make ChatGPT useful, like browsing the web for information and uploading files and images. For many common cases GPT-4o will be more capable in the near term.
But for complex reasoning tasks this is a significant advancement and represents a new level of AI capability. Given this, we are resetting the counter back to 1 and naming this series OpenAI o1.
(OpenAI also announced the release of OpenAI o1-mini, mainly for developers.)
AI + advanced reasoning
To grasp the significance of o1-preview, it's helpful to understand System 1 and System 2 thinking:
- System 1 thinking is fast, automatic, and intuitive, ideal for routine decisions but prone to biases.
- System 2 thinking is slow, analytical, and deliberate, useful for complex decisions but more mentally demanding.
This terminology was coined by psychologist and Thinking, Fast and Slow author Daniel Kahneman, and is not used by OpenAI. But it helps delineate between the two “modes” of thinking we all use.
OpenAI aims to enhance AI's “System 2 thinking” capabilities, in part by allowing o1-preview to take longer pauses before delivering a response. (The crux of the argument is: more time to “think” = better results. More on this later.)
o1-preview demo: Watch me build a business plan
To test o1-preview's capabilities, I challenged it to create a comprehensive business plan. I worked with it sequentially, first identifying the kinds of tasks o1-preview might be uniquely capable of solving. I even asked it to write the prompt for me, after outlining the details and constraints of the hypothetical business we were building:
The result? The business plan was comprehensive and touched on all the considerations for starting up a new business, from legal structure to operations to a go-to-market plan. But so far, not dramatically different from GPT-4o.
Next I wanted to see if it could run and evaluate various scenarios, not just provide high-level business advice:
Watch it build a financial forecast in 7 seconds.
And then run a sensitivity analysis in 15 seconds.
For comparison’s sake, I fed identical prompts into GPT-4o, then took the results and asked two separate models, OpenAI's ChatGPT (GPT-4o) and Google's Gemini Advanced (Gemini 1.5 Pro) to review both plans.
Here's what they said:
Both models agreed that the plans are strong, but gave a slight preference for GPT-4o’s plan.
The future of AI "thinking" and reasoning
In the wake of o1-preview’s launch, many are asking, “Which model should I use?”
OpenAI shared these evaluations demonstrating that o1-preview is at or above GPT-4o in a number of capabilities:
Chatbot Arena (which offers a more subjective human comparison of various models) currently has o1-preview at the top of its leaderboard, followed closely behind by GPT-4o.
As I shared above, my own comparison of o1-preview and 4o found them to be largely equal, with 4o performing slightly better on this particular task.
I continue to recommend a multi-model approach, as AI companies leapfrog each other with new model releases, launch new capabilities, and now offer trade-offs between reasoning and speed. (i.e. Is your particular task a system 1 or system 2 kind of problem? What kind of thinking does it require?)
But there’s an even bigger question looming: What does OpenAI o1 tell us about AI’s potential?
It suggests that LLM scaling laws are holding true.
In simple terms, AI researchers argue that the two main variables impacting an LLM’s accuracy are data and time. Given more of either (or ideally, both), and models become smarter and more capable.
Both of these models are powerful (along with the other frontier models like Claude 3.5 Sonnet and Gemini 1.5 Pro). As AI reasoning capabilities advance, we can expect significant changes in how businesses interact with and use AI, including more nuanced problem-solving: AI could tackle increasingly complex business challenges, provide alternative perspectives, and identify potential blind spots in human reasoning.
And, big-big picture, it is perhaps another step forward on the path of artificial intelligence.
In other words, let’s play it forward. I’m paying less attention to debates over this new model vs. an existing one, whether o1-preview is over- or under-hyped, and more on what role these models might play in our work and in our companies.
What does a machine with human-level problem-solving mean for our product roadmaps, our service offerings, our tech stacks, and our org charts? That is the question worth solving.
You're reading a preview. Want more?
Ever wonder how other growing companies make the tough decisions — like how much to spend on marketing, when it’s time to expand your org chart, or when and where to use AI?
Us, too. So we decided to create growthcurve: A newsletter dedicated to practical advice for growth without the growing pains.
Get frameworks, templates, and advice to smooth out the bumps along the way.