On December 20, the Open Artificial Intelligence Research Center (OpenAI) introduced its latest artificial intelligence (AI) reasoning model-O3 and its lightweight version of O3-Mini.The company claims that O3 has more advanced and similar human reasoning capabilities. It has surpassed its predecessor O1 in terms of code writing, mathematical competitions, and scientific knowledge of human doctoral levels.
However, in a report on December 22, the British "New Scientist" website pointed out that although O3 has achieved remarkable performance leaps, it has not reached the level of general AI (AGI) in the industry.
Outstanding performance in many aspects
OpenAI revealed that when solving more complex multi -step problems, the O3 model will spend more time calculating the answer, and then give a response.The improvement of this reasoning ability has made O3 perform well in many tests.
Large language models are keen to score in various mathematical benchmark tests, and O3 is no exception.In the 2024 American Mathematical Invitational Tournament, the accuracy of the O3 model was as high as 96.7%, and only the wrong question was answered.In the OpenAI researchers believe that one of the strictest benchmark tests -Frontier Math, O3 also solved a problem of 25.2%.Although this score seems not high, other large language models have been collectively turned over here before, and the accuracy rate has not exceeded 2%.
FRONTIER MATH testing is extremely difficult. It was rated by Tao Zhexuan, a Chinese mathematician and the Fields winner, as a possibility that it may be difficult to live in AI for several years.However, O3 only needs to think for a few minutes to answer one of the questions, while human mathematicians spend a few hours to several days.
In terms of mastery of scientific knowledge, the performance of O3 is also exceeded the level of general doctoral.In the performance of GPQA Diamond (measurement model on doctoral scientific issues, covering professional knowledge in chemistry, physics, and biology), the accuracy of O3 reached 87.7%, exceeding 70%of human doctors, and than before, it was also previously more than before.O1 performed nearly 10%high.
In addition, O3's encoding ability is better than the previous O1 series.On the benchmark of Swe-Bench Verify (the ability to measure the AI model to solve the real world software problem), the accuracy of O3 is about 71.7%, which is more than 20%higher than O1.In the CodeForcess coding competition platform, the O3 score is 2727, which is equivalent to the level of 175 human programmers on the list, while the O1 score is only 1891.
After showing these proud results achieved by O3, Oldman, CEO of OpenAI, emphasized that the emergence of O3 marks that AI has entered the next stage of development. These models can handle complex tasks that require a lot of reasoning.
There is still differences with human intelligence
The "New Scientist" website also reported that in the abstract and reasoning corpus-AGI (ARC-AGI) contest considered to be regarded as an important measurement standard of AGI, the O3 model also set a new record: under low computing power configuration, it is 75.7%under 75.7%The score is on the forefront of the public rankings.Because of determining that the test of this prize winner has stricter computing power restrictions, under this computing power restriction, O3's challenge ended in failure.
However, under the high computing power that exceeded the official computing power limit of 172 times, O3 used brute force to achieve 87.5%, reaching the 85%threshold representing the human level.
Regarding the performance of O3, former Google Engineer and ARC-AGI main founder Francois Scholes wrote in a blog that this is an amazing and important jumper of AI capabilities.However, O3 has not yet implemented AGI because it still cannot solve some very simple problems in the ARC-AGI competition, which shows that it has fundamental differences with human intelligence.
AGI is an imaginary future system that can imitate human thinking, decision -making, self -consciousness, and actively act.However, AGI is currently active in science fiction works and has not yet entered reality.
