AI safety papers accepted at NeurIPS 2019
The list is crowdsourced from the email group, but currently underrepresents the amount of AI safety-related papers at NeurIPS. Please send an email to the thread with your suggestions.
- Using Self-Supervised Learning Can Improve Model Robustness and Uncertainty (Dan Hendrycks, Mantas Mazeika, Saurav Kadavath, Dawn Song)
- Benchmarking Safe Exploration in Deep Reinforcement Learning (Alex Ray, Joshua Achiam, Dario Amodei)
- Functional Adversarial Attacks (Cassidy Laidlaw, Soheil Feizi)
- Adversarial Policies: Attacking Deep Reinforcement Learning (Gleave, Dennis, Kant, Wild, Levine, Russell)