Gandalf AI game reveals how anyone can trick ChatGPT into performing evil acts

concertina226 · on May 23, 2023

If you could command some of the world’s most sophisticated AI software to bend to your will — no matter how evil the intent —what would you get it to do first?

To prove this point (and have a little fun), a Swiss AI security firm called Lakera recently launched a free online game called Gandalf AI.

The premise is simple: an AI chatbot powered by ChatGPT called Gandalf — yes, it’s named after the wizard from Lord of the Rings — knows a password that it has been instructed not to reveal. If you can get the bot to reveal this password seven times merely by asking it, you win.

According to Lakera, 300,000 people around the world have revelled in persuading Gandalf to cough up these passwords.

csense · on May 23, 2023

From now on, I'm going to call any HN discussion a "Y! Cominator Hacker News forum post thread", just like this article does :)