Blasted Bioinformatics!?: Validating ID via Gravatar

Most people will have seen a Gravatar user icon online, short for the rather grand sounding "Globally Recognized Avatar". For example GitHub.com and StackOverflow use them, and many blog platforms uses them for user comments (sadly Blogger doesn't, yet). To get a user's icon, you construct a URL with the MD5 checksum of their email address - and if the user isn't registered you get default image or a unique generated abstract icon. This means you can cross-reference a list of email address with a list of Gravatar icon URLs (i.e. a list of email MD5 checksums).

How does Gravatar work? Starting with an email address, trim any white space, convert to lower case, and compute the MD5 as a hex digest (see creating the hash). Then use this MD5 string to make a Gravatar icon URL (see requesting the image).

For example, 205e460b479e2e5b48aec07710c08d50 is the MD5 hex digest of an email address belonging to Beau Lebens who works for Gravatar - it is used as an example in their documentation. This gives the following Gravatar icons - the default personalised one, and the auto-generated ones too:

Suppose you have found an old blog comment with a Gravatar icon claiming to be from "Steve Jobs", and you want to verify this. The blog owner might be able to see the commentator's email address, but it isn't so easy for the blog reader. However, if you know some candidate email addresses (say steve@apple.com, sjobs@apple.com and steve.jobs@apple.com), you can check if the MD5 in the Gravatar icon's URL matches that of the expected email address(es) or not. If it does, the comment probably was by the person you think it was.

You can calculate the MD5 in Python like this:

import hashlib
emails = ["steve@apple.com",
          "sjobs@apple.com",
          "steve.jobs@apple.com"]:
for email in emails:
    hash = hashlib.md5(email.strip().lower()).hexdigest()
    print email, hash

Using that I've constructed Gravatar image URLs requesting 50 pixel squares, G rated, showing the various abstract image options:

steve@apple.com
dc8dd7b4026f67f9c2f46b170875305c

sjobs@apple.com
ed263e3881d6ae44d258eda63a1fbda1

steve.jobs@apple.com
d47d026acd8074a8feb0736e7f047fac

Clearly none of those are a user defined icon, and I doubt Steve Jobs would have been happy with any of those defaults - of which the identicon is the safest bet ;)

But suppose an icon matched the blog post you're trying to verify - then you might have found a true blog comment from Steve. You should check if the MD5 checksum in the Gravatar URL matches, rather than just the picture. It would be trivial to register a Gravatar account (with any email address) and deliberately upload a copy of the avatar you want to impersonate - a social engineering trick which would work better with a personalised Gravatar icon.

However, strictly speaking, even a matching MD5 is not proof. It is possible for two different strings to have the same MD5 checksum, called a collision, which means a clever hacker could potentially generate an arbitrary email address with the same MD5 as someone they want to impersonate, and thus get to piggy back on the original email address' Gravatar icon. A proof of principle would be pretty cool, but MD5 collisions are still pretty hard to generate.

Being aware of all this will make it possible to spot some impersonations.

Which brings us to the next logical step, can you work out someone's email address from the Gravatar icon? i.e. Can we go from the MD5 checksum back to the email address? Of course, I'm not the first to think about this and why publishing your email's hash is not a good idea - they found 10% of Stack Overflow user's email addresses could be determined from their user name and the email MD5 checksum, trying various guesses at big email providers. They also talked about rainbow tables where if you have a list of email addresses you could pre-compute all their MD5 checksums, and check those against the Gravatar URLs.

In my last blog post, I'd been tricked by someone posting comments under someone else's good name. Unfortunately the imposter comments have been deleted from the original blog, so I can't study their MD5 signature (it was a WordPress blog using Gravatar). It would have been fun to run that against a list of suspect email addresses to see if I could ID the hoaxer ;)

Blasted Bioinformatics!?

2011-12-13

Validating ID via Gravatar

No comments:

Post a Comment