Monday, June 14, 2010

Encrypting Fields in Google App Engine

Recently, I've been coding an application (MindWell) in Google App Engine.  One of the more difficult features to implement was storing data in an encrypted format.  This is needed because Mindwell is intended for therapists/counselors and other mental health professionals.  Therefore, security of client's data is very important.  Below is some code taken from Mindwell.  For the complete code listing see: models.py

Code to create an Encrypted Field:


class EncryptedField(db.StringProperty):
    data_type = str
  
    def __GetSHADigest(self, random_number = None):
        """ This function returns a sha hash of a
            random number  and the secret
            password."""
        sha = SHA256.new()
        if not random_number:
            random_number = os.urandom(16)
        # mix in a random number      
        sha.update(random_number)
        # mix in our secret password
        sha.update(secret_passphrase)
        return (sha.digest(), random_number)
      

    def encrypt(self, data):
        """Encrypts the data to be stored in the
           datastore"""
        if data is None:
            return None
        if data == 'None':
            return None
        # need to pad the data so it is
        # 16 bytes long for encryption
        mod_res = len(data) % 16
        if mod_res != 0:
            for i in range(0, 16 - mod_res):
                # pad the data with ^
                # (hopefully no one uses that as
                # the last character, if so it
                # will be deleted
                data += '^'  
        (sha_digest, random_number) = self.__GetSHADigest()
        alg = AES.new(sha_digest, AES.MODE_ECB)
        result = random_number + alg.encrypt(data)
        # encode the data as hex to store in a string
        # the result will otherwise have charachters that cannot be displayed
        ascii_text = str(result).encode('hex')
        return unicode(ascii_text)
      
    def decrypt(self, data):
        """ Decrypts the data from the
            datastore.  Basically the inverse of
            encrypt."""
        # check that either the string is None
        # or the data itself is none
        if data is None:
            return None
        if data == 'None':
            return None
        hex_decoder = codecs.getdecoder('hex')
        hex_decoded_res = hex_decoder(data)[0]
        random_number = hex_decoded_res[0:16]
        (sha_digest, random_number) = self.__GetSHADigest(random_number)
        alg = AES.new(sha_digest, AES.MODE_ECB)
        dec_res = alg.decrypt(hex_decoded_res[16:])
        #remove the ^ from the strings in case of padding
        return unicode(dec_res.rstrip('^'))
      
    def get_value_for_datastore(self, model_instance):
        """ For writing to datastore """
        data = super(EncryptedField,
                     self).get_value_for_datastore(model_instance)
        enc_res = self.encrypt(data)
        if enc_res is None:
            return None
        return str(enc_res)

    def make_value_from_datastore(self, value):
        """ For reading from datastore. """
        if value is not None:
            return str(self.decrypt(value))
        return ''

    def validate(self, value):
        if value is not None and not isinstance(value, str):
            raise BadValueError('Property %s must be convertible '
                                'to a str instance (%s)' %
                                (self.name, value))
        return super(EncryptedField, self).validate(value)

    def empty(self, value):
        return not value

Code to use Encrypted Field:

class ClientInfo(db.Model): 
    lastname = EncryptedField(verbose_name='Last Name')

In a separate file, secret_info.py:
secret_passphrase = '0123456789abcdef' # must be a 16 byte long value

Explanation:
Why mix in a random number? To more securely encrypt the data, a random number (sometimes called salt) is mixed in.  This ensures that even if the same algorithm and method encrypts the data then the encrypted result will be unique.  So if two pieces of the same information are entered, now they will have different values stored in the database.  See Wikipedia for some more information about the problems of not mixing in some randomness.

There is also a secret_passphrase which is also used in the encryption which is not stored along in the database.  So in order to decrypt the fields the hacker needs to determine this secret passphrase or acquire it via some method.