Friday, December 31, 2010

Speeding up pages in App Engine - Part 2

So you've got AppStats configured and you've sped up some page loading times by removing obviously unnecessary queries and gets.  Also by using prefetch_refprops from Speeding up pages in App Engine - Part 1 you've removed the staircase of gets.  Now what?

In my case I realized that one of my Models was no longer needed.  I was storing DOSRecurr objects (which contain a reference to DOS object which in turn contain a reference to ClientInfo objects) in the datastore.  Fetching objects from the datastore is slow so if you can avoid looking into the datastore, your pages will be loaded faster.

In my case, I realized I was having to calculate all of the DOSRecurr objects every time anyway, since if a DOS changed then my DOSRecurr might have changed as well.  So I kept the same DOSRecurr model, but instead of storing it in the database, I just created them every time and then threw them away afterwards.

By keeping the DOSRecurr model I was able to keep all the logic and implementation that used the DOSRecurr model but by no longer storing it in the datastore I sped up my application significantly.

The main take away from this lesson is:
Don't store and fetch things from the App Engine Datastore if you don't have to!

Saturday, November 6, 2010

Speeding up pages in App Engine - Part 1

After adding numerous features to MindWell things had started to slow down a bit.  In this article I'll discuss various ways I sped up Mindwell and improved page loading times.  After going through these steps perhaps your users will thank you and start to feel like they're flying...



First a little information about Mindwell:
In MindWell there are three main models:
ClientInfo - which contains information about clients (encrypted of course).
DOS - date of service, essentially this is information about an individual session with a client.
DOSRecurr - this is a model for recurring appointments.  So if a client comes every week this is used to model those appointments.

DOS use a reference property to ClientInfo and DOSRecurr use a reference property to DOS.  So the hierarchy looks like:
ClientInfo <- DOS <- DOSRecurr

Techniques
First of all use app stats.  This will help tell you exactly how much time is dedicated to running various queries and database calls.  In my case I noticed the deadly staircase of gets which can be resolved by prefetching reference properties.  Essentially rather than iterating through a list of items you can group a bunch of objects into one get rather than a sequence of gets.  This alone made a remarkable speed up in my application.
 Below is a modified version of prefetch_refprops that ignores empty references.  Some of my objects do not set a reference to another model so I first filter out those entities.

def prefetch_refprops(entities, *props):
   non_empty_entities = [entity for entity in entities for prop in props if prop.get_value_for_datastore(entity)]
   fields = [(entity, prop) for entity in non_empty_entities for prop in props]
   ref_keys = [prop.get_value_for_datastore(x) for x, prop in fields]

   ref_entities = dict((x.key(), x) for x in db.get(set(ref_keys)))
   for (entity, prop), ref_key in zip(fields, ref_keys):
       if ref_entities[ref_key]:
         prop.__set__(entity, ref_entities[ref_key])

Monday, June 14, 2010

Encrypting Fields in Google App Engine

Recently, I've been coding an application (MindWell) in Google App Engine.  One of the more difficult features to implement was storing data in an encrypted format.  This is needed because Mindwell is intended for therapists/counselors and other mental health professionals.  Therefore, security of client's data is very important.  Below is some code taken from Mindwell.  For the complete code listing see: models.py

Code to create an Encrypted Field:


class EncryptedField(db.StringProperty):
    data_type = str
  
    def __GetSHADigest(self, random_number = None):
        """ This function returns a sha hash of a
            random number  and the secret
            password."""
        sha = SHA256.new()
        if not random_number:
            random_number = os.urandom(16)
        # mix in a random number      
        sha.update(random_number)
        # mix in our secret password
        sha.update(secret_passphrase)
        return (sha.digest(), random_number)
      

    def encrypt(self, data):
        """Encrypts the data to be stored in the
           datastore"""
        if data is None:
            return None
        if data == 'None':
            return None
        # need to pad the data so it is
        # 16 bytes long for encryption
        mod_res = len(data) % 16
        if mod_res != 0:
            for i in range(0, 16 - mod_res):
                # pad the data with ^
                # (hopefully no one uses that as
                # the last character, if so it
                # will be deleted
                data += '^'  
        (sha_digest, random_number) = self.__GetSHADigest()
        alg = AES.new(sha_digest, AES.MODE_ECB)
        result = random_number + alg.encrypt(data)
        # encode the data as hex to store in a string
        # the result will otherwise have charachters that cannot be displayed
        ascii_text = str(result).encode('hex')
        return unicode(ascii_text)
      
    def decrypt(self, data):
        """ Decrypts the data from the
            datastore.  Basically the inverse of
            encrypt."""
        # check that either the string is None
        # or the data itself is none
        if data is None:
            return None
        if data == 'None':
            return None
        hex_decoder = codecs.getdecoder('hex')
        hex_decoded_res = hex_decoder(data)[0]
        random_number = hex_decoded_res[0:16]
        (sha_digest, random_number) = self.__GetSHADigest(random_number)
        alg = AES.new(sha_digest, AES.MODE_ECB)
        dec_res = alg.decrypt(hex_decoded_res[16:])
        #remove the ^ from the strings in case of padding
        return unicode(dec_res.rstrip('^'))
      
    def get_value_for_datastore(self, model_instance):
        """ For writing to datastore """
        data = super(EncryptedField,
                     self).get_value_for_datastore(model_instance)
        enc_res = self.encrypt(data)
        if enc_res is None:
            return None
        return str(enc_res)

    def make_value_from_datastore(self, value):
        """ For reading from datastore. """
        if value is not None:
            return str(self.decrypt(value))
        return ''

    def validate(self, value):
        if value is not None and not isinstance(value, str):
            raise BadValueError('Property %s must be convertible '
                                'to a str instance (%s)' %
                                (self.name, value))
        return super(EncryptedField, self).validate(value)

    def empty(self, value):
        return not value

Code to use Encrypted Field:

class ClientInfo(db.Model): 
    lastname = EncryptedField(verbose_name='Last Name')

In a separate file, secret_info.py:
secret_passphrase = '0123456789abcdef' # must be a 16 byte long value

Explanation:
Why mix in a random number? To more securely encrypt the data, a random number (sometimes called salt) is mixed in.  This ensures that even if the same algorithm and method encrypts the data then the encrypted result will be unique.  So if two pieces of the same information are entered, now they will have different values stored in the database.  See Wikipedia for some more information about the problems of not mixing in some randomness.

There is also a secret_passphrase which is also used in the encryption which is not stored along in the database.  So in order to decrypt the fields the hacker needs to determine this secret passphrase or acquire it via some method.