First impressions of OpenAI o1: An AI designed to overthink it
Comment
OpenAI released its new o1 models on Thursday, giving ChatGPT users their first chance to try AI models that pause to “think” before they answer. There’s been a lot of hype building up to these models, codenamed “Strawberry” inside OpenAI. But does Strawberry live up to the hype?
Sort of.
Compared to GPT-4o, the o1 models feel like one step forward and two steps back. OpenAI o1 excels at reasoning and answering complex questions, but the model is roughly four times more expensive to use than GPT-4o. OpenAI’s latest model lacks the tools, multimodal capabilities, and speed that made GPT-4o so impressive. In fact, OpenAI even admits that “GPT-4o is still the best option for most prompts” on its help page, and notes elsewhere that o1 struggles at simpler tasks.
“It’s impressive, but I think the improvement is not very significant,” said Ravid Shwartz Ziv, an NYU professor who studies AI models. “It’s better at certain problems, but you don’t have this across-the-board improvement.”
For all of these reasons, it’s important to use o1 only for the questions it’s truly designed to help with: big ones. To be clear, most people are not using generative AI to answer these kinds of questions today, largely because today’s AI models are not very good at it. However, o1 is a tentative step in that direction.
OpenAI o1 is unique because it “thinks” before answering, breaking down big problems into small steps and attempting to identify when it gets one of those steps right or wrong. This “multi-step reasoning” isn’t entirely new (researchers have proposed it for years, and You.com uses it for complex queries), but it hasn’t been practical until recently.
“There’s a lot of excitement in the AI community,” said Workera CEO and Stanford adjunct lecturer Kian Katanforoosh, who teaches classes on machine learning, in an interview. “If you can train a reinforcement learning algorithm paired with some of the language model techniques that OpenAI has, you can technically create step-by-step thinking and allow the AI model to walk backwards from big ideas you’re trying to work through.”
OpenAI o1 is also uniquely pricey. In most models, you pay for input tokens and output tokens. However, o1 adds a hidden process (the small steps the model breaks big problems into), which adds a large amount of compute you never fully see. OpenAI is hiding some details of this process to maintain its competitive advantage. That said, you still get charged for these in the form of “reasoning tokens.” This further emphasizes why you need to be careful about using OpenAI o1, so you don’t get charged a ton of tokens for asking where the capital of Nevada is.
The idea of an AI model that helps you “walk backwards from big ideas” is powerful, though. In practice, the model is pretty good at that.
In one example, I asked ChatGPT o1 preview to help my family plan Thanksgiving, a task that could benefit from a little unbiased logic and reasoning. Specifically, I wanted help figuring out if two ovens would be sufficient to cook a Thanksgiving dinner for 11 people and wanted to talk through whether we should consider renting an Airbnb to get access to a third oven.
After 12 seconds of “thinking,” ChatGPT wrote me out a 750+ word response ultimately telling me that two ovens should be sufficient with some careful strategizing, and will allow my family to save on costs and spend more time together. But it broke down its thinking for me at each step of the way and explained how it considered all of these external factors, including costs, family time, and oven management.
ChatGPT o1 preview told me how to prioritize oven space at the house that is hosting the event, which was smart. Oddly, it suggested I consider renting a portable oven for the day. That said, the model performed much better than GPT-4o, which required multiple follow-up questions about what exact dishes I was bringing, and then gave me bare-bones advice I found less useful.
Asking about Thanksgiving dinner may seem silly, but you could see how this tool would be helpful for breaking down complicated tasks.
I also asked o1 to help me plan out a busy day at work, where I needed to travel between the airport, multiple in-person meetings in various locations, and my office. It gave me a very detailed plan, but maybe was a little bit much. Sometimes, all the added steps can be a little overwhelming.
For a simpler question, o1 does way too much — it doesn’t know when to stop overthinking. I asked where you can find cedar trees in America, and it delivered an 800+ word response, outlining every variation of cedar tree in the country, including their scientific name. It even had to consult with OpenAI’s policies at some point, for some reason. GPT-4o did a much better job answering this question, delivering me about three sentences explaining you can find the trees all over the country.
In some ways, Strawberry was never going to live up to the hype. Reports about OpenAI’s reasoning models date back to November 2023, right around the time everyone was looking for an answer about why OpenAI’s board ousted Sam Altman. That spun up the rumor mill in the AI world, leaving some to speculate that Strawberry was a form of AGI, the enlightened version of AI that OpenAI aspires to ultimately create.
Altman confirmed o1 is not AGI to clear up any doubts, not that you’d be confused after using the thing. The CEO also trimmed expectations around this launch, tweeting that “o1 is still flawed, still limited, and it still seems more impressive on first use than it does after you spend more time with it.”
The rest of the AI world is coming to terms with a less exciting launch than expected.
“The hype sort of grew out of OpenAI’s control,” said Rohan Pandey, a research engineer with the AI startup ReWorkd, which builds web scrapers with OpenAI’s models.
He’s hoping that o1’s reasoning ability is good enough to solve a niche set of complicated problems where GPT-4 falls short. That’s likely how most people in the industry are viewing o1, but not quite as the revolutionary step forward that GPT-4 represented for the industry.
“Everybody is waiting for a step function change for capabilities, and it is unclear that this represents that. I think it’s that simple,” said Brightwave CEO Mike Conover, who previously co-created Databricks’ AI model Dolly, in an interview.
The underlying principles used to create o1 go back years. Google used similar techniques in 2016 to create AlphaGo, the first AI system to defeat a world champion of the board game Go, former Googler and CEO of the venture firm S32, Andy Harrison, points out. AlphaGo trained by playing against itself countless times, essentially self-teaching until it reached superhuman capability.
He notes that this brings up an age-old debate in the AI world.
“Camp one thinks that you can automate workflows through this agentic process. Camp two thinks that if you had generalized intelligence and reasoning, you wouldn’t need the workflow and, like a human, the AI would just make a judgment,” said Harrison in an interview.
Harrison says he’s in camp one and that camp two requires you to trust AI to make the right decision. He doesn’t think we’re there yet.
However, others think of o1 as less of a decision-maker and more of a tool to question your thinking on big decisions.
Katanforoosh, the Workera CEO, described an example where he was going to interview a data scientist to work at his company. He tells OpenAI o1 that he only has 30 minutes and wants to asses a certain number of skills. He can work backward with the AI model to understand if he’s thinking about this correctly, and o1 will understand time constraints and whatnot.
The question is whether this helpful tool is worth the hefty price tag. As AI models continue to get cheaper, o1 is one of the first AI models in a long time that we’ve seen get more expensive.
Every weekday and Sunday, you can get the best of TechCrunch’s coverage.
Startups are the core of TechCrunch, so get our best coverage delivered weekly.
The latest Fintech news and analysis, delivered every Tuesday.
TechCrunch Mobility is your destination for transportation news and insight.
By submitting your email, you agree to our Terms and Privacy Notice.
Blockchain technology is all about decentralization and virtualization. So it’s a little ironic that humans love to come together in person at big blockchain events. Such was the case last…
I have a guilty pleasure, and it’s not that I just rewatched “Glee” in its entirety (yes, even the awful later seasons), or that I have read an ungodly amount…
It’s looking increasingly likely that OpenAI will soon alter its complex corporate structure. Reports earlier this week suggested that the AI company was in talks to raise $6.5 billion at…
Fusion startups have raised $7.1 billion to date, with the majority of it going to a handful of companies.
Netflix has never quite cracked the talk show formula, but maybe it can borrow an existing hit from YouTube. According to Bloomberg, the streamer is in talks with BuzzFeed to…
Alex Parmley has been thinking about building his latest company, ORNG, since he was working on his last company, Phood. Launched in 2018, Phood was a payments app that let…
Lawyers representing Sam Bankman-Fried, the FTX CEO and co-founder who was convicted of fraud and money laundering late last year, are seeking a new trial. Following crypto exchange FTX’s collapse,…
OpenAI this week unveiled a preview of OpenAI o1, also known as Strawberry. The company claims that o1 can more effectively reason through math and science, as well as fact-check…
There’s something oddly refreshing about starting the day by solving the Wordle. According to DeepWell DTx, there’s a scientific explanation for why our brains might feel just a bit better…
Soundiiz is a free third-party tool that builds portability tools through existing APIs and acts as a translator between the services.
In early 2018, VC Mike Moritz wrote in the FT that “Silicon Valley would be wise to follow China’s lead,” noting the pace of work at tech companies was “furious”…
Fei-Fei Li, the Stanford professor many deem the “Godmother of AI,” has raised $230 million for her new startup, World Labs, from backers including Andreessen Horowitz, NEA, and Radical Ventures.…
Bolt says it has settled its long-standing lawsuit with its investor Activant Capital. One-click payments startup Bolt is settling the suit by buying out the investor’s stake “after which Activant…
The rise of neobanks has been fascinating to witness, as a number of companies in recent years have grown from merely challenging traditional banks to being massive players in and…
OpenAI released its new o1 models on Thursday, giving ChatGPT users their first chance to try AI models that pause to “think” before they answer. There’s been a lot of…
Featured Article
TuSimple, once a buzzy startup considered a leader in self-driving trucks, is trying to move its assets to China to fund a new AI-generated animation and video game business. The pivot has not only puzzled and enraged several shareholders, but also threatens to pull the company back into a legal…
Welcome to Startups Weekly — your weekly recap of everything you can’t miss from the world of startups. Want it in your inbox every Friday? Sign up here. This week…
Silicon Valley startup accelerator Y Combinator will expand the number of cohorts it runs each year from two to four starting in 2025, Bloomberg reported Thursday, and TechCrunch confirmed today.…
Telegram has had a tough few weeks. The messaging app’s founder, Pavel Durov, was arrested in late August and later released on a €5 million bail in France, charged with…
Martin Casado, a general partner at Andreessen Horowitz, will tackle one of the most pressing issues facing today’s tech world — AI regulation — only at TechCrunch Disrupt 2024, taking…
Christina Cacioppo, CEO and co-founder of Vanta, will be on the SaaS Stage at TechCrunch Disrupt 2024 to reveal how Vanta is redefining security and compliance automation and driving innovation…
On Thursday, cybersecurity giant Fortinet disclosed a breach involving customer data. In a statement posted online, Fortinet said an individual intruder accessed “a limited number of files” stored on a…
Meta has confirmed that it’s restarting efforts to train its AI systems using public Facebook and Instagram posts from its U.K. userbase. The company claims it has “incorporated regulatory feedback” into a…
Following the moves of other tech giants, Spotify announced on Friday it’s introducing in-app parental controls in the form of “managed accounts” for listeners under the age of 13. The…
Uber users in Austin and Atlanta will be able to hail Waymo robotaxis through the app in early 2025 as part of a partnership between the two companies.
There are plenty of calendar and scheduling apps that take care of your professional life and help you slot in meetings with your teammates and work collaborators. Howbout is all…
Delhivery claims Ecom Express has inaccurately represented Delhivery’s business metrics when drawing comparisons in its IPO filing.
It was a matter of time, but Apple is going to allow third-party app stores on the iPad starting next week, on September 16. This change will occur with the…
The U.K.’s antitrust regulator has delivered its provisional ruling in a longstanding battle to combine two of the country’s major telecommunication operators. The Competition and Markets Authority (CMA) says that…
Late Thursday evening, Oprah Winfrey aired a special on AI, appropriately titled “AI and the Future of Us.” Guests included OpenAI CEO Sam Altman, tech influencer Marques Brownlee, and current…
Powered by WordPress VIP
source
Sponsor:News technical sponsor
Sponsor:News AI sponsor
Sponsor: AI sponsor
Sponsor: AI sponsor
Leave a Comment