I recently implemented paging for my Google App Engine app, Braineos, ordering results using a unique inequality filter. My implementation is based on Joe Gregorio’s Paging through large datasets tutorial, but tweaked in order to overcome apparent flaws in Gregorio’s implementation and to meet the search result ordering requirements of my application.
Paginating results has important performance implications. While GAE limits results sets to 1000 and it’s unlikely that a user will ever require more than a 1000 results, fetching up to the maximum number of results is less optimal that fetching a smaller defined limit with each query: pagination is included in Google’s Best practices for writing scalable applications.
Gregorio describes several means of ordering paginated results, and the paging on a unique property approach was most appropriate for the needs of my application: I wished to order user generated content first by the number of positive ‘votes’ the content received and second by the most recently created content. This approach uses shard counters to enable the creation of a unique key as well as quick updates to the datastore. Gregorio’s Contributor class and unique ‘when’ property creation were added to my code base (the when property added to the model object representing the user generated content):
class Contributor (db.Model):count = db.IntegerProperty(default=0) def _unique_user(user):email = user.email() def txn():contributor = Contributor.get_by_key_name(email)if contributor == None:contributor = Contributor(key_name=email) contributor.count += 1contributor.put()return contributor.count count = db.run_in_transaction(txn) return hashlib.md5(email + '|' + str(count)).hexdigest() def whenfromcreated(created):return created.isoformat()[0:19] + '|' + _unique_user(users.get_current_user())
My search module then used the ‘when’ property when ordering results sets:
def find(self, searchTerm, pagesize=configuration.Configuration.PAGESIZE):searchTerm = searchTerm.split(',')searchTerm = [term.strip().lower() for term in searchTerm] query = UGC.all()for term in searchTerm:query.filter('searchTerms =', term)ugcs = query.order('-when').fetch(pagesize + 1) return ugcs def findNext(self, searchTerm, next, pagesize=configuration.Configuration.PAGESIZE):searchTerm = searchTerm.split(',')searchTerm = [term.strip().lower() for term in searchTerm] query = UGC.all()for term in searchTerm:query.filter('searchTerms =', term)query.filter('when <= ', next).order('-when')ugcs = query.fetch(pagesize + 1)return ugcs
This implementation fitted with the app’s existing shard counting implementation and provided an excellent starting point. I came up against two issues. First, appending the count to the user’s email before digesting with MD5 resulted in unreliable desc ordering. Second, because the ‘when’ property is used as an inequality filter, ordering must first be on that property (so I couldn’t first order my results on ‘votes’ and then ‘when’).
I worked around the first issue by appending the count value after the user’s email was digested:
return hashlib.md5(email).hexdigest() + '|' + str(count)
I worked around the second issue by prepending the number of votes to the ‘when’ property:
def whenfromcreated(created, votes=0):return votes + '|' + created.isoformat()[0:19] + '|' + _unique_user(users.get_current_user())
This approach appeared to behave as expected, however my initial unit test used a sample of less than 10 UGC objects and it became apparent that once a user’s contributions exceeded nine, desc ordering would be inaccurate: a unique user string ending in ‘|9’ is deemed greater than a string ending in ‘|10’. Happily a fellow GAE app developer encountered a similar problem when attempting to use int values in a str comparison and suggested converting the int values to hex to allow alphabetical comparison. I altered the code to wrap votes and count in the hex() factory function and my unit test, which seeks to order 10 items into two pages, is now passing.
There is significant discussion around pagination in GAE and while my implementation seems to stick pretty close to Google’s reference implementation, if others have devised more elegant solutions, I’d be interested to know more – comment below!