To paraphrase Mark Twain, the reports of the death of the password are greatly exaggerated. Proclaiming their imminent demise is wishful thinking, for they are with us for the indefinite future.
Biometric, multifactor and other forms of authentication will become more prevalent, but passwords will remain the ubiquitous form of authentication. We should focus on making them better, not replacing them.
The problems with passwords
When passwords were first used at MIT in the early 1960s, they comprised letters, numbers and punctuation from the American Standard Code for Information Interchange (ASCII) character encoding. ASCII has 95 printable characters (including the space) and 33 teletype control characters. Despite its age, it remains the dominant source for password characters around the world.
The fundamental flaw of conventional passwords is the limited number of permitted characters. For a password with x characters drawn from y allowed characters, there are xy password combinations. Historically, most passwords have been limited to a subset of the 95 printable ASCII characters, but today no valid technical reason remains for prohibiting a space or any other "special" character in your password. Yet, these counterproductive limitations abound.
Attackers, researchers and penetration testers use highly optimized password cracking tools like hashcat and John the Ripper to break passwords in a number of ways. The most fruitful attack is to load these tools with a dictionary of tens of millions of passwords pilfered from previous exploits, sorted with the most prevalent passwords first. If the dictionary attack fails, a cracker tries every combination of allowed characters in a brute force attack until the password has been correctly guessed.
Today, $20,000 can buy a computer filled with repurposed graphics cards that can try up to 100 billion passwords per second. If the attacker knows that your bank prohibits spaces and ten other punctuation characters, then there are, at most, 84 characters that could be used at each position of your password - excluding eleven characters results in 100 trillion fewer 10-character password combinations. Thus, prohibiting even one character makes thing easier for hackers, for no good technical reason and certainly to no benefit.
Most systems use a "byte," or 8-bit value, to store each character in your password. A byte can take one of 256 numeric values, from 0 to 255. In our bank example, an attacker need only try 84 of 256, or a measly 32.8 percent, of the possible values for each character of the password.
What we need is a password scheme in which an attacker must try all 256 possible values for each character in a brute force attack. The difference is striking. For a 10-character password, there are 10172 more permutations when all 256 byte values are potentially used, making brute force attacks infeasible.
The Solution
Passwords have barely evolved from their teletype form in the 1960s, but the solution has been within our grasp since the early 1990s: the Unicode Standard. The principal aim of Unicode is to define all written scripts, modern and ancient, including symbols and emoji.
The latest version, Unicode 9.0, defines nearly 130,000 characters. Unicode is the native encoding scheme in all major operating systems and the internet. The first block of 128 Unicode characters, Basic Latin, maps precisely to ASCII, including the unusable, obsolete teletype control codes.
One of the useful attributes of Unicode is that all 256 byte values are used due to the sheer number of encoded characters, provided one uses characters from blocks in addition to Basic Latin. For example, the Thai currency symbol baht, ?, is encoded as three bytes in UTF-8: 224 184 191, none of which are ASCII encodings. The baht character is visually distinctive and easy to remember, and of course one does not need to know what it represents to use it in a password.
Using even just one such non-ASCII character in your password can make brute force and dictionary attacks infeasible. Since the number of password permutations is xy, the use of the baht symbol makes the password stronger in two ways. First, three bytes are necessary to encode the character in UTF-8, so x has been increased. Second, the use of Unicode forces an attacker mounting a brute force attack to test all 256 possible byte values, so y has increased from 84 to 256.
A Unicode-enabled system would also permit any of thousands of symbols and emojis to be used, further strengthening the password. For example, the emoji ‘family with mother, father, son and daughter’ is encoded in UTF-8 as 25 bytes, none of which are used in Basic Latin/ASCII. Thus, the addition of this single emoji multiplies the number of possible combinations by 25256, or 10357.
Where do we go from here?
Benjamin Franklin said, "Experience keeps a dear school, but fools will learn in no other." We're well beyond mere foolishness in using conventional passwords — they are an unmitigated disaster.
Several billion passwords were compromised in 2016 alone, and without decisive action, 2017 could be worse. Future articles will look at other aspects of the solution, but permitting the unrestricted use of Unicode in passwords is the most important step we can take to level the playing field against attackers.