AI benchmarking group criticized for ready to reveal funding from OpenAI

January 20, 2025

1

A company growing math benchmarks for AI didn’t disclose that it had obtained funding from OpenAI till comparatively not too long ago, drawing allegations of impropriety from some within the AI group.

Epoch AI, a nonprofit primarily funded by Open Philanthropy, a analysis and grantmaking basis, revealed on December 20 that OpenAI had supported the creation of FrontierMath. FrontierMath, a take a look at with expert-level issues designed to measure an AI’s mathematical expertise, was one of many benchmarks OpenAI used to demo its upcoming flagship AI, o3.

In a put up on the discussion board LessWrong, a contractor for Epoch AI going by the username “Meemi” says that many contributors to the FrontierMath benchmark weren’t knowledgeable of OpenAI’s involvement till it was made public.

“The communication about this has been non-transparent,” Meemi wrote. “For my part Epoch AI ought to have disclosed OpenAI funding, and contractors ought to have clear details about the potential of their work getting used for capabilities, when selecting whether or not to work on a benchmark.”

On social media, some customers raised issues that the secrecy might erode FrontierMath’s fame as an goal benchmark. Along with backing FrontierMath, OpenAI had visibility into most of the issues and options within the benchmark — a undeniable fact that Epoch AI didn’t disclose previous to December 20, when o3 was introduced.

In a put up on X, Stanford PhD arithmetic scholar Carina Hong additionally alleged that OpenAI has privileged entry to FrontierMath due to its association with Epoch AI, and that this isn’t sitting properly with some contributors.

“Six mathematicians who considerably contributed to the FrontierMath benchmark confirmed [to me] … that they’re unaware that OpenAI may have unique entry to this benchmark (and others gained’t),” Hong mentioned. “Most categorical they don’t seem to be positive they’d have contributed had they recognized.”

In a reply to Meemi’s put up, Tamay Besiroglu, affiliate director of Epoch AI and one of many group’s co-founders, asserted that the integrity of FrontierMath hadn’t been compromised, however admitted that Epoch AI “made a mistake” in not being extra clear.

“We had been restricted from disclosing the partnership till across the time o3 launched, and in hindsight we must always have negotiated tougher for the flexibility to be clear to the benchmark contributors as quickly as potential,” Besiroglu wrote. “Our mathematicians deserved to know who may need entry to their work. Although we had been contractually restricted in what let’s imagine, we must always have made transparency with our contributors a non-negotiable a part of our settlement with OpenAI.”

Besiroglu added that whereas OpenAI has entry to FrontierMath, it has a “verbal settlement” with Epoch AI to not use FrontierMath’s drawback set to coach its AI. (Coaching an AI on FrontierMath can be akin to educating to the take a look at.) Epoch AI additionally has a “separate holdout set” that serves as an extra safeguard for impartial verification of FrontierMath benchmark outcomes, Besiroglu mentioned.

“OpenAI has … been totally supportive of our resolution to take care of a separate, unseen holdout set,” Besiroglu wrote.

Nonetheless, muddying the waters, Epoch AI lead mathematician Ellot Glazer famous in a put up on Reddit that Epoch AI hasn’t have the ability to independently confirm OpenAI’s FrontierMath o3 outcomes.

“My private opinion is that [OpenAI’s] rating is legit (i.e., they didn’t prepare on the dataset), and that they haven’t any incentive to lie about inner benchmarking performances,” Glazer mentioned. “Nonetheless, we will’t vouch for them till our impartial analysis is full.”

The saga is but one other instance of the problem of growing empirical benchmarks to judge AI — and securing the mandatory assets for benchmark growth with out creating the notion of conflicts of curiosity.

Previous articlePublic Belief in Elementary Faculty Academics Declines—However Nonetheless Tops Most Different Professions

Next articleEyewear Model Ahlem Faucets Ex-Thélios Government Enrico Sanavia as CEO

AI benchmarking group criticized for ready to reveal funding from OpenAI

What’s really in Congress’s harsh new immigration invoice?

Open-source DeepSeek-R1 makes use of pure reinforcement studying to match OpenAI o1 — at 95% much less value

OpenAI’s agent instrument could also be nearing launch

LEAVE A REPLY Cancel reply

Most Popular

This Creamy Persian Porridge With a 1,000-12 months Historical past Is Scrumptious for Breakfast, Lunch, and Dinner

Svetlana Kirsanova: Biography, Banking and Telecom Profession

What’s really in Congress’s harsh new immigration invoice?

Discover the insurance coverage tendencies for 2022 and past | Insurance coverage Weblog

EDITOR PICKS

This Creamy Persian Porridge With a 1,000-12 months Historical past Is Scrumptious for Breakfast, Lunch, and Dinner

Svetlana Kirsanova: Biography, Banking and Telecom Profession

What’s really in Congress’s harsh new immigration invoice?

POPULAR POSTS

This Creamy Persian Porridge With a 1,000-12 months Historical past Is Scrumptious for Breakfast, Lunch, and Dinner

Svetlana Kirsanova: Biography, Banking and Telecom Profession

What’s really in Congress’s harsh new immigration invoice?

POPULAR CATEGORY

ABOUT US