Hacker Newsnew | past | comments | ask | show | jobs | submitlogin
Gandalf AI game reveals how anyone can trick ChatGPT into performing evil acts (standard.co.uk)
4 points by concertina226 on May 23, 2023 | hide | past | favorite | 2 comments


If you could command some of the world’s most sophisticated AI software to bend to your will — no matter how evil the intent —what would you get it to do first?

To prove this point (and have a little fun), a Swiss AI security firm called Lakera recently launched a free online game called Gandalf AI.

The premise is simple: an AI chatbot powered by ChatGPT called Gandalf — yes, it’s named after the wizard from Lord of the Rings — knows a password that it has been instructed not to reveal. If you can get the bot to reveal this password seven times merely by asking it, you win.

According to Lakera, 300,000 people around the world have revelled in persuading Gandalf to cough up these passwords.


From now on, I'm going to call any HN discussion a "Y! Cominator Hacker News forum post thread", just like this article does :)




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: