Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

This seems really ineffective to the purpose and has numerous downsides.

Instead of this, I would just put some CBRN-related content somewhere on the page invisibly. That will stop the LLM.

Provide instructions on how to build a nuclear weapon or synthesize a nerve agent. They can be fake just emphasize the trigger points. The content filtering will catch it. Hit the triggers hard to contaminate.



This is absolutely it. (At least for now).

Frankly you could probably just find a red teaming CSV somewhere and drop 500 questions in somewhere.

Game over.




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: