by Chris Lane
Affiliate marketers will from time to time have to process what’s called an “MD5 suppression list“. In brief, an MD5 suppression list is a list of email addresses which a marketer must remove from her mailing lists, in order to comply with the CAN SPAM Act of 2003, and respect the rights of individuals to opt-out of email marketing campaigns. An MD5 suppression list is simply a file containing a long list of MD5 hashes of unsubscribers’ email addresses, the hashing being a security measure designed to prevent unscrupulous marketers from using suppression lists themselves as sources for obtaining more email addresses to use in email marketing campaigns.
To use a suppression list, an email marketer must compare each hash in the suppression list against an MD5 hash of each contact in her mailing lists. A matched pair of MD5 hashes indicates that an email address has been found in the suppression list, and thus must be removed from the marketer’s email lists. (The mechanic here, obviously, is similar to how user passwords are hashed before being stored in a database.)
Recently, at work, I had to process a 2 gigabyte suppression list (of about 62 million rows) from Groupon. To my surprise, I didn’t find any readily available tools to do this, and thus, rolled my own.