Many safety evaluations for AI models have significant limitations
Comment
Despite increasing demand for AI safety and accountability, today’s tests and benchmarks may fall short, according to a new report.
Generative AI models — models that can analyze and output text, images, music, videos and so on — are coming under increased scrutiny for their tendency to make mistakes and generally behave unpredictably. Now, organizations from public sector agencies to big tech firms are proposing new benchmarks to test these models’ safety.
Toward the end of last year, startup Scale AI formed a lab dedicated to evaluating how well models align with safety guidelines. This month, NIST and the U.K. AI Safety Institute released tools designed to assess model risk.
But these model-probing tests and methods may be inadequate.
The Ada Lovelace Institute (ALI), a U.K.-based nonprofit AI research organization, conducted a study that interviewed experts from academic labs, civil society, and who are producing vendors models, as well as audited recent research into AI safety evaluations. The co-authors found that while current evaluations can be useful, they’re non-exhaustive, can be gamed easily, and don’t necessarily give an indication of how models will behave in real-world scenarios.
“Whether a smartphone, a prescription drug or a car, we expect the products we use to be safe and reliable; in these sectors, products are rigorously tested to ensure they are safe before they are deployed,” Elliot Jones, senior researcher at the ALI and co-author of the report, told TechCrunch. “Our research aimed to examine the limitations of current approaches to AI safety evaluation, assess how evaluations are currently being used and explore their use as a tool for policymakers and regulators.”
The study’s co-authors first surveyed academic literature to establish an overview of the harms and risks models pose today, and the state of existing AI model evaluations. They then interviewed 16 experts, including four employees at unnamed tech companies developing generative AI systems.
The study found sharp disagreement within the AI industry on the best set of methods and taxonomy for evaluating models.
Some evaluations only tested how models aligned with benchmarks in the lab, not how models might impact real-world users. Others drew on tests developed for research purposes, not evaluating production models — yet vendors insisted on using these in production.
We’ve written about the problems with AI benchmarks before, and the study highlights all these problems and more.
The experts quoted in the study noted that it’s tough to extrapolate a model’s performance from benchmark results and unclear whether benchmarks can even show that a model possesses a specific capability. For example, while a model may perform well on a state bar exam, that doesn’t mean it’ll be able to solve more open-ended legal challenges.
The experts also pointed to the issue of data contamination, where benchmark results can overestimate a model’s performance if the model has been trained on the same data that it’s being tested on. Benchmarks, in many cases, are being chosen by organizations not because they’re the best tools for evaluation, but for the sake of convenience and ease of use, the experts said.
“Benchmarks risk being manipulated by developers who may train models on the same data set that will be used to assess the model, equivalent to seeing the exam paper before the exam, or by strategically choosing which evaluations to use,” Mahi Hardalupas, researcher at the ALI and a study co-author, told TechCrunch. “It also matters which version of a model is being evaluated. Small changes can cause unpredictable changes in behaviour and may override built-in safety features.”
The ALI study also found problems with “red-teaming,” the practice of tasking individuals or groups with “attacking” a model to identify vulnerabilities and flaws. A number of companies use red-teaming to evaluate models, including AI startups OpenAI and Anthropic, but there are few agreed-upon standards for red teaming, making it difficult to assess a given effort’s effectiveness.
Experts told the study’s co-authors that it can be difficult to find people with the necessary skills and expertise to red-team, and that the manual nature of red teaming makes it costly and laborious — presenting barriers for smaller organizations without the necessary resources.
Pressure to release models faster and a reluctance to conduct tests that could raise issues before a release are the main reasons AI evaluations haven’t gotten better.
“A person we spoke with working for a company developing foundation models felt there was more pressure within companies to release models quickly, making it harder to push back and take conducting evaluations seriously,” Jones said. “Major AI labs are releasing models at a speed that outpaces their or society’s ability to ensure they are safe and reliable.”
One interviewee in the ALI study called evaluating models for safety an “intractable” problem. So what hope does the industry — and those regulating it — have for solutions?
Mahi Hardalupas, researcher at the ALI, believes that there’s a path forward, but that it’ll require more engagement from public-sector bodies.
“Regulators and policymakers must clearly articulate what it is that they want from evaluations,” he said. “Simultaneously, the evaluation community must be transparent about the current limitations and potential of evaluations.”
Hardalupas suggests that governments mandate more public participation in the development of evaluations and implement measures to support an “ecosystem” of third-party tests, including programs to ensure regular access to any required models and data sets.
Jones thinks that it may be necessary to develop “context-specific” evaluations that go beyond simply testing how a model responds to a prompt, and instead look at the types of users a model might impact (e.g. people of a particular background, gender or ethnicity) and the ways in which attacks on models could defeat safeguards.
“This will require investment in the underlying science of evaluations to develop more robust and repeatable evaluations that are based on an understanding of how an AI model operates,” she added.
But there may never be a guarantee that a model’s safe.
“As others have noted, ‘safety’ is not a property of models,” Hardalupas said. “Determining if a model is ‘safe’ requires understanding the contexts in which it is used, who it is sold or made accessible to, and whether the safeguards that are in place are adequate and robust to reduce those risks. Evaluations of a foundation model can serve an exploratory purpose to identify potential risks, but they cannot guarantee a model is safe, let alone ‘perfectly safe.’ Many of our interviewees agreed that evaluations cannot prove a model is safe and can only indicate a model is unsafe.”
Every weekday and Sunday, you can get the best of TechCrunch’s coverage.
Startups are the core of TechCrunch, so get our best coverage delivered weekly.
The latest Fintech news and analysis, delivered every Tuesday.
TechCrunch Mobility is your destination for transportation news and insight.
By submitting your email, you agree to our Terms and Privacy Notice.
COVID-19 pushed people to take up outdoor activities. Now, startups are helping companies and consumers keep up with demand.
Despite increasing demand for AI safety and accountability, today’s tests and benchmarks may fall short, according to a new report. Generative AI models — models that can analyze and output…
OpenAI has built a tool that could potentially catch students who cheat by asking ChatGPT to write their assignments — but according to The Wall Street Journal, the company is…
Chief Product Officer Craig Saldanha says AI is already transforming the Yelp experience.
Featured Article
Any goal that puts cultivated meat in big box grocery stores or on fast food menus in the 2020s is “unrealistic,” according to experts.
Warren Buffett’s Berkshire Hathaway cut its Apple holding by around half, to $84.2 billion, according to an SEC filing. While Apple remains the firm’s largest stock holding by far, Buffett…
A fireside chat between Jensen Huang and Mark Zuckerberg at SIGGRAPH 2024 took some unexpected turns. What started as a conversation about the capabilities of Nvidia GPUs and Zuckerberg’s vision…
We spoke to Harness CEO and founder Jyoti Bansal about his previous company, which Cisco bought for $3.7 billion in 2017.
Dojo is Tesla’s custom-built supercomputer that’s designed to train its “Full Self-Driving” neural networks.
Featured Article
Trade My Spin has pieced together a logistics network capable of offering same or next day delivery in most major cities in the continental U.S.
Featured Article
Sanjiva Weerawarana co-founded WSO2 in 2005, recently selling it for more than $600M. He sometimes drives for Uber, too.
Investors are assisting startup founders earlier than ever in an effort to help them bridge the first climate tech valley of death.
While both the DSA and DMA aim to achieve distinct things, they are best understood as a joint response to Big Tech’s market power.
Featured Article
A scathing rebuke by the U.K. data protection watchdog reveals what led to the compromise of tens of millions of U.K. voters’ information.
Self-driving technology company Aurora Innovation was hoping to raise hundreds of millions in additional capital as it races toward a driverless commercial launch by the end of 2024. The company, which…
The U.S. Federal Trade Commission and the Justice Department are suing TikTok and ByteDance, TikTok’s parent company, with violating the Children’s Online Privacy Protection Act (COPPA). The law requires digital…
Welcome to Startups Weekly — your weekly recap of everything you can’t miss from the world of startups. This week we are looking at acquisitions of small startups, two new…
In a big move, Character.AI co-founder and CEO Noam Shazeer is returning to Google after leaving the company in October 2021 to found the a16z-backed chatbot startup. In his previous…
The startup developed a two-material system that helps homes self-regulate their internal humidity.
When the developers replied to the July 19 email, Yelp sent a deck of pricing tiers with base pricing starting from $229 per month for a limit of 1,000 API…
Featured Article
The cloud infrastructure market has put the doldrums of 2023 firmly behind it with another big quarter. Revenue continues to grow at a brisk pace, fueled by interest in AI. Synergy Research reports revenue totaled $79 billion for the quarter, up $14.1 billion or 22% from last year. This marked…
The pharma giant won’t say how many patients were affected by its February data breach. A count by TechCrunch confirms that over a million people are affected.
Payments infrastructure firm Infibeam Avenues has acquired a majority 54% stake in Rediff.com for up to $3 million, a dramatic twist of fate for the 28-year-old business that was the…
The ruling confirmed an earlier decision in April from the High Court of Podgorica which rejected a request to extradite the crypto fugitive to the United States.
A day after Meta CEO Mark Zuckerberg talked about his newest social media experiment Threads reaching “almost” 200 million users on the company’s Q2 2024 earnings call, the platform has…
TechCrunch Disrupt 2024 will be in San Francisco on October 28–30, and we’re already excited! Disrupt brings innovation for every stage of your startup journey, and we could not bring you this…
Featured Article
The tech layoff wave is still going strong in 2024. Following significant workforce reductions in 2022 and 2023, this year has already seen 60,000 job cuts across 254 companies, according to independent layoffs tracker Layoffs.fyi. Companies like Tesla, Amazon, Google, TikTok, Snap and Microsoft have conducted sizable layoffs in the…
Intel announced it would lay off more than 15% of its staff, or 15,000 employees, in a memo to employees on Thursday. The massive headcount is part of a large…
Following the recent lawsuit filed by the Recording Industry Association of America (RIAA) against music generation startups Udio and Suno, Suno admitted in a court filing on Thursday that it did, in…
In spite of a drop for the quarter, iPhone remained Apple’s most important category by a wide margin.
Powered by WordPress VIP
source
Sponsor:News technical sponsor
Sponsor:News AI sponsor
Sponsor: AI sponsor
Sponsor: AI sponsor
Leave a Comment