Questions.

And Answers.

If you have a question that is not answered below, please open an issue on GitHub and we will answer it as soon as possible. You can also sent us an email, but we will usually answer faster on GitHub.

Open an issue on GitHub
How is the score for a model-attack-pair calculated?
For each test image we run the attack against the model and record the L2-norm of the smallest perturbation that leads to misclassification. The score is given by the median of the L2-norms across images.
How is the total score of a model calculated?
For each test image we run all attacks against the model and record the L2-norm of the smallest perturbation that leads to misclassification found by any attack. The score is given by the median of the L2-norms across images.
How is the total score of an attack calculated?
The total score of an attack is just the average of the corresponding model-attack-pair scores.
Will the winner receive prize money?
There is no prize money. The Robust Vision Benchmark is a continuously running benchmark that tracks the state of the art. If you would like to sponsor a prize, please contact us.
Why should I submit my attack?
It's the easiest way to show the effectiveness of your attack.
Why should I submit my model?
It's the best way to demonstrate your model’s robustness.
What about other adversarial criteria such as targeted misclassification?
We plan to add more challenges in the future. Please let us know which challenges you would like to see.
Why are you doing this?
The susceptibility of deep neural networks to adversarial examples exposes one of the most striking differences in the sensory decision making of humans and machines. While there have been efforts to increase the robustness of ML models, the current state of the field and the actual progress being made is unclear. We believe that the best way to measure the robustness of ML models is to apply all possible adversarials attacks and to measure the smallest perturbations to which a model is susceptible. To make this practical we recently open-sourced Foolbox. This benchmark takes this idea even further: to prove the robustness of a model it should not only withstand existing attacks (which can easily be made failing) but it should also withstand the scrutiny of novel attacks (that might be specifically designed for that model). In addition, this benchmark also allows us to quantitatively compare attacks across a wide variety of models.