Welcome to Perceptron Robustness Benchmark’s page!¶
Perceptron is a benchmark to test safety and security properties of neural networks for perceptual tasks.
It comes with support for many frameworks to build models including
- Cloud API
- PaddlePaddle (In progress)
See currently supported evaluation metrics, models, adversarial criteria, and verification methods in Summary.
See current Leaderboard.
perceptron benchmark improves upon the existing adversarial
toolbox such as
advbox in three important aspects:
- Consistent API design that enables easy evaluation of models across different deep learning frameworks, computer vision tasks, and adversarial criterions.
- Standardized metric design that enables DNN models’ robustness to be compared on a large collection of security and safety properties.
- Gives verifiable robustness bounds for security and safety properties.
More specifically, we compare
perceptron with existing DNN benchmarks
in the following table:
|Consistent API design||\(\checkmark\)||\(\cdot\)||\(\checkmark\)||\(\cdot\)|
|Custom adversarial criteria||\(\checkmark\)||\(\cdot\)||\(\checkmark\)||\(\cdot\)|
|Multiple perceptual tasks||\(\checkmark\)||\(\cdot\)||\(\cdot\)||\(\cdot\)|
|Verifiable robustness bounds||\(\checkmark\)||\(\cdot\)||\(\cdot\)||\(\cdot\)|
Explanation of compared properties:
- Multi-platform support: supports at least the three deep learning frameworks,
- Consistent API design: implementations of evaluation methods are platform-agnostic. More specifically, the same piece of code for an evaluation method (e.g., a
C&Wattack) can run against models across all platforms (e.g.,
- Custom adversarial criterion: a criterion defines under what circumstances an
(input, label)pair is considered an adversary. Customized adversarial criteria other than
misclassificationshould be supported.
- Multiple perceptual tasks: supports computer vision tasks other than
- Standardized metrics: enables DNN models’ robustness to be comparable on all security and safety properties.
- Verifiable robustness bounds: supports verification of certain safety properties. Returns either a verifiable bound, indicating that the model is robust against perturbations within that bound, or return counter-examples.
You can run evaluation against DNN models with chosen parameters using
python perceptron/launcher.py \ --framework keras \ --model resnet50 \ --criteria misclassification\ --metric carlini_wagner_l2 \ --image example.png
In above command line, the user lets the framework as
keras, the model as
the criterion as
misclassification (i.e., we want to generate an adversary which is
similar to the original image but has different predicted label), the metric as
carlini_wagner_l2, the input image as
You can try different combinations of frameworks, models, criteria, and metrics. To see more options using -h for help message.
python perceptron/launcher.py -h
We also provide a coding example which serves the same purpose as above command line. Please refer to Examples for more details.