🦄 django-pwny

A look at password handling in light of changes to the NIST guidelines.

2018-08-29

Originally published at The Data Shed.

Much as we might lament their continued use, passwords are an important part of modern life. A part that seems only to be increasing in ubiquity; as I sit here I can see at least half a dozen devices, from this laptop to my television, that each require an inordinate number of passwords to keep them from turning into little more than novelty paperweights.

So let's take an inexplicable foray into the fun we have managing our little "memorized secrets"...

NIST: National Institute of Standards and Technology

NIST Special Publication 800-63

I'm sure that many are intimately familiar with NIST Special Publication 800-63. Appendix A, by Bill Burr. Well, if not the document itself then at least its consequences. Therein, amidst discussions of Claude Shannon's work on entropy in information systems, he outlined what effectively became the de facto best practices for password generation for over a decade, asking us to consider:

a minimum of 8 character passwords, selected by subscribers from an alphabet of 94 printable characters,

required subscribers to include at least one upper case letter, one lower case letter, one number and one special character, and;

Used a dictionary to prevent subscribers from including common words and prevented permutations of the username as a password.

He recently had this to say on the subject:

"Much of what I did I now regret. It just drives people bananas and they don't pick good passwords no matter what you do."

He's not wrong, is he.

NIST Special Publication 800-63B

Released in June 2017, the latest NIST guidelines make some surprising (or perhaps not so surprising) changes. Certainly it's a significantly lengthier document, given the weight of the task at hand and there's a definite formality to the language this time around. Perhaps the biggest change is a more general recognition that passwords themselves aren't enough.

Some of the more significant things to note:

Gone are the periodic password changes ("Verifiers SHOULD NOT require memorized secrets to be changed arbitrarily (e.g., periodically)").
Gone is the convoluted alphanumeric-song-and-dance ("Verifiers SHOULD NOT impose other composition rules (e.g., requiring mixtures of different character types or prohibiting consecutively repeated characters) for memorized secrets").

Oh, and for all you sites that won't let me paste my password: "Verifiers SHOULD permit claimants to use "paste" functionality when entering a memorized secret"!

Most noteworthy of all (at least in the context of my writing this), there's this little nugget from section 5.1.1.2:

When processing requests to establish and change memorized secrets, verifiers SHALL compare the prospective secrets against a list that contains values known to be commonly-used, expected, or compromised. For example, the list MAY include, but is not limited to:

Passwords obtained from previous breach corpuses.

Dictionary words.

Repetitive or sequential characters (e.g. 'aaaaaa', '1234abcd').

Context-specific words, such as the name of the service, the username, and derivatives thereof.

What's that? "Passwords obtained from previous breach corpuses" you say?

';--have i been pwned?

Launched in 2013, the site Have I Been Pwned? stores and processes data acquired from breaches, typically when said data are exposed publicly or some attempt is made to profit therefrom, in some from the Web's more insidious locales. It's a matter the site's creator, Troy Hunt, can better explain but the part most interesting to this particular topic is the fact that it exposes an API.

Specifically, the API allows us to send it a password and check whether it has appeared in a previous data breach, thus fulfilling the "previous breach corpuses" part of the NIST guidelines.

Obviously, sending your password across the Web to any third-party service is Not A Very Good Thing To Do™ and thankfully the API's design takes this into account: the password isn't sent directly, we instead send a cryptographic hash derived from the password. Nor do we send the entire hash, instead passing only the first five characters.

The data returned from the API comprise a list of matching hash suffixes and number of occurrences of that hash within the corpus.

Perhaps a demonstration?

Let's say we have a password, a truly terrible password that no one would ever use:

BAD_PASSWORD="password"

No one would ever be silly enough to use that, right? So, purely for demonstration purposes as no one would ever have this password, we calculate the SHA1:

$ BAD_PASSWORD_HASH=$(
    echo -n ${BAD_PASSWORD} | \
        sha1sum | \
        cut -d ' ' -f1 | \
        tr '[:lower:]' '[:upper:]'
)

Now we have a cryptographic hash of a password that no one would ever use 'cause that would be silly and can send the first five characters to the the Have I Been Pwned API:

$ curl "https://api.pwnedpasswords.com/range/${BAD_PASSWORD_HASH:0:5}"
003D68EB55068C33ACE09247EE4C639306B:3
012C192B2F16F82EA0EB9EF18D9D539B0DD:1
01330C689E5D64F660D6947A93AD634EF8F:1
0198748F3315F40B1A102BF18EEA0194CD9:1
01F9033B3C00C65DBFD6D1DC4D22918F5E9:2
0424DB98C7A0846D2C6C75E697092A0CC3E:5
047F229A81EE2747253F9897DA38946E241:1
04A37A676E312CC7C4D236C93FBD992AA3C:5
...

Well that's a lot of results, 511 in fact (at time of writing). Of course, that doesn't mean that our password is actually in that list—because, of course, no one would ever use it—but we can double-check by seeing if the suffix from our SHA1 is in the list:

$ curl "https://api.pwnedpasswords.com/range/${BAD_PASSWORD_HASH:0:5}" | \
    grep ${BAD_PASSWORD_HASH:5:}
1E4C9B93F3F0682250B6CF8331B7EE68FD8:3645804

Oh. Oh dear. So password has been used as a password and found in no fewer than 3,645,804 known data breaches? Burr was right, "they don't pick good passwords no matter what you do".

Django

To perhaps give a more practical demonstration of how this might be integrated into a functioning site, I'm going to look to Django. Password validation was introduced in 1.9 and essentially comprises a list of validators—objects with a validate() method—into each of which is passed the incoming password.

Of course, passwords aren't stored so there are only two points at which you can do this:

User registration, where potentially you can protect users from using at-risk passwords.
Login, at which point you can merely warn users about the potential risk.

To perhaps put the escalating problem of password management into perspective, in version 1.9 (December 2015) Django's CommonPasswordValidator contained a list of 1,000 commonly-used passwords. By version 2.1 (August 2018) it had grown to 20,000.

`pwny.validators.HaveIBeenPwnedValidator`

Here's a quick implementation, hereby dubbed django-pwny (and hence the ridiculous title of this post):

class HaveIBeenPwnedValidator:

    def validate(self, password, user=None):
        sha1 = hashlib.sha1()
        sha1.update(password.encode())
        digest = sha1.hexdigest().upper()
        prefix = digest[:5]
        url = f"https://api.pwnedpasswords.com/range/{prefix}"
        r = requests.get(url, headers={"User-Agent": "django-pwny"})
        for suffix_count in r.text.splitlines():
            suffix, count = suffix_count.split(":")
            if digest == f"{prefix}{suffix}":
                raise ValidationError(
                    f"Your password has been pwned {count} times!"
                )

    def get_help_text(self):
        return (
            "Your password should not appear in a list of compromised"
            "passwords."
        )

While I'm sure there's room for improvement, added to the settings.py file, it should allow users to be alerted should their chosen password be a little too common:

AUTH_PASSWORD_VALIDATORS = [
    {
        "NAME": "pwny.validators.HaveIBeenPwnedValidator",
    },
]

So there it is: a way, in accordance with the latest NIST guidelines, to compare users' passwords against a substantial list of known breaches. Quite whether this particular recommendation will see widespread adoption, we'll have to wait and see.

Shortly after I drafted this, GitHub decided to get in on the act. While that's definitely a major voice in the industry following the guidelines, they're definitely in the minority.