Skip to content

[ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"

License

Notifications You must be signed in to change notification settings

keven980716/weak-to-strong-deception

Error
Looks like something went wrong!

About

[ICLR 2025] Code&Data for the paper "Super(ficial)-alignment: Strong Models May Deceive Weak Models in Weak-to-Strong Generalization"

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published