How OpenAI stress-tests its large language models

OpenAI is once again lifting the lid (just a crack) on its safety-testing processes. Last month the company shared the results of an investigation that looked at how often ChatGPT produced a harmful gender or racial stereotype based on a user’s name. Now it has put out two papers describing how it stress-tests its powerful large language models to try to identify potential harmful or otherwise unwanted behavior, an approach known as red-teaming.  Large language models are now being used by millions of people for many different things. But as…

This content is for Member members only.
Log In Register