Title: Skill Squatting Attacks on Amazon Alexa
Authors: Deepak Kumar, Riccardo Paccagnella, Paul Murley, Eric Hennenfent, Joshua Mason, Adam Bates, Michael Bailey
Published: 2018
Link: https://www.usenix.org/system/files/conference/usenixsecurity18/sec18-kumar.pdf
Summary: Systematic interpretation errors made by Amazon Alexa when listening to users can be exploited to route a user to an Alexa Skill developed by an adversary. The user does not recognize that he’s talking to a malicious application under the control of the adversary.

daniel etzold sketchnote: Skill Squatting Attacks on Amazon Alexa

Extended summary:

Motivated from the fact that many users experience frequent misinterpretations when talking to Amazon Alexa the authors conduct an empirical analysis of the interpretation errors which Amazon Alexa makes. Based on the observations a new attack is developed which the authors call the skill squatting attack. An attacker can leverage systematic interpretation errors (i.e. a specific word is reliably misunderstood as some other word) to route a user to a malicious application. The user does not recognize that he’s talking to an Amazon Skill (applications developed by third-party vendors for Alexa) created and controlled by an adversary.

The authors also develop a variant of the attack which they call the spear skill squatting attack. This attack works like skill squatting but is targeted to a specific group of people (e.g. only male, female or only people of a particular region). The name for this attack comes from the related spear phishing attacks, which are phishing attacks targeted at specific groups of individuals.

The attacks become possible because Alexa sometimes makes systematic interpretation errors. By doing experiments it was observed that specific words are predictably misinterpreted, i.e. one word is always/often misinterpreted as another word. For example, the word “coal” is interpreted as “call”, “sell” as “cell”, “boil” as “boyle”, etc. Especially short words can cause such errors.

Spear skill squatting attacks become possible because Alexa tends to make different predictable misinterpretations depending on the gender or the accent of a person.

Now, predictable misinterpretations can be exploited as follows:

  • The attacker first chooses a group of users he wants to attack. Let’s assume he wants to attack users of the “Bean Stock” skill.
  • The attacker creates a squatted skill and registers this skill as “Been Stock”.
  • Now, when users talk to the “Bean Stock” skill, Alexa routes the users to the “Been Stock” skill as Alexa misinterprets “bean” as “been”.

This attack is somehow related to domain name typosquatting, where an attacker predicts a common “typo” in domain names. However, typosquatting relies on the user to make a mistake. Here, the attack is intrinsic to the speech recognition service itself.