Title: With Great Training Comes Great Vulnerability Practical Attacks against Transfer Learning
Authors: Bolun Wang, Yuanshun Yao, Bimal Viswanath, Haitao Zheng and Ben Y. Zhao
Published: 2018
Link: https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-wang.pdf
Summary: Building a model via transfer learning increases its vulnerability to misclassification attacks.

daniel etzold sketchnote: With Great Training Comes Great Vulnerability Practical Attacks against Transfer Learning

Extended summary:

Building a state-of-the-art deep learning model from scratch requires large datasets and a lot of computational power. Hence, only a few companies like Google, Microsoft or Facebook are in the position to do this. Transfer learning is a popular technique to build models from these existing models when large datasets and the computational power are not available.

The idea is to adapt the existing models to a similar task. For example, a face recognition systems that was trained on faces of a particular dataset can easily be retrained to identify faces of another dataset. This is typically achieved by retraining only the last few layers (which are responsible for high-level features) of a model and freezing the parameters of the preceding layers (low-level features).

Obviously, the centralized model training results in lack of diversity and the authors argue that this allows an attacker to launch highly effective misclassification attacks. An adversary who wants to attack a model (target model) which was build from some public model via transfer learning can leverage knowledge which he has about the public model even if he cannot access the parameters of the target model (black box attack).

The authors make the following assumptions:

  • Attacker has white-box access to the teacher model (public model), i.e. he knows its architecture and parameters.
  • Attacker has no knowledge about the student dataset (the dataset which is used to retrain the model).
  • Attacker has no knowledge about the paramaters of the student model.
  • The number of queries to the teacher model is limited.
  • Attacker knows which layers are frozen during the retraining.

The authors key insight is that if an adversary can create a sample for which the internal representation at some layer K (i.e. the output of the neurons at that layer) perfectly matches the internal representation of the target image at that layer, then it must be misclassified into the same label as the target image.

Hence, the optimization problem is reformulated. The goal is not to minimize the error of the output of the network, i.e. the last layer (the way it’s usually done) but to minimize the error at layer K. The idea is that if the victim freezes this layer and the previous layers but does not freeze the subsequent layers for the retraining process, the sample will always be classified into the target label no matter how the subsequent layers are retrained.

Experiments on facial, iris, traffic sign and flower recognition were done with different transfer methods. It has been found out, that the attack is effective for facial and iris recognition for which it was possible to freeze a large number of layers and that it’s less effective for the other datasets for which many layers had to be retrained to achieve a good accuracy on normal data.