The Data Diode

TL;DR: Create an Amazon S3 bucket which allows your application to only write to it. Don’t forget to enable versioning.

Recently we had to meet a compliance requirement at one of our projects where we needed to make sure that inactive data is no longer readable by our application. This isn’t just to make sure that our application prohibits the user access to that data, but also to make sure that any misconfiguration or a hacking attempt would never leak any data.

The idea: The Data Diode. A diode is an electronic component which only allows to let current flow in one direction.

At Amazon Web Services (AWS) we store data in Amazon S3 buckets. These buckets are basically virtual disks which are fast, reliable and allow a high level of flexibility.

Ideally your application uses AWS access keys which are restricted to a certain set of services and actions permitted at those services. For instance, to have read and write access to your hot data storage at Amazon S3.

You can also add ACL rules which allows to only write into a given bucket and prohibit any read or listing action. You can even allow other AWS accounts to write in your bucket.

We wrote a script which takes records older than 90 days, erases all data except data required to run some anonymized statistics and export them to our write-only Amazon S3 bucket.

That way we make sure that we never leak old data by accident or when we should get hacked. An attacker can only access the most recent data but never the old one. Not even when they got access to the AWS credentials.

One thing we considered is to enable versioning. In case an attacker was successful or our application would go rogue due to a bug we can restore any overwritten data.

I also recommend to use a separate AWS account to host your archive. That way you can make sure that not even an inside job has access to that data.

Are you interesting in implementing this in your company or project? Feel free to contact me!