First reported by TechCrunch, OpenAI's system card detailed the PersonQA evaluation results, designed to test for hallucinations. From the results of this evaluation, o3's hallucination rate is 33 ...
OpenAI’s recently launched o3 and o4-mini AI models are state-of-the-art in many respects. However, the new models still hallucinate, or make things up — in fact, they hallucinate more than several of ...
Anthropic and OpenAI ran their own tests on each other's models. The two labs published findings in separate reports. The goal was to identify gaps in order to build better and safer models. The AI ...