April 26, 2009

Detour into Python and optimizations (Part 2)

Posted in Python tagged , , , , at 6:45 pm by davy803

In my last post, I showed you my first implementation of  a simple word game using an A* algorithm.  I also mentioned that I managed to optimize it by two orders of magnitude.  It’s also worth noting that which direction you go in matters a fair amount.  For example, in the test I showed at the end of my last post, I was converting from “poets” to “verse”  and that took 189 seconds.  However if I go the other way and tried to convert from “verse” to “poets” it only takes 24 seconds.  I’m fairly certain it has to do with the branching factor and the number of different words that can be made from each one.  I tried to first see the branching factor of each the initial branch, but in this instance they both returned 4 words, so it didn’t make any difference other than the slight overhead of the first check being slower.

From that information it seems that the expensive branching is further down the line, and we’d have to keep going down until we find it and then switch directions.  Alternatively, it seems like it might be a good candidate for a Bidirectional Search, but that seems a little more complex than I’d like for a simple program, and the slowness is well isolated to a specific call within the algorithm, so let’s focus on optimizing that.

Let’s get started and look at the first optimization made:

The first thing I noticed was that that len is called 32 million times.  Looking at our distance method we see that len is called twice for input validation:

def distance(word1, word2):
    if len(word1) != len(word2):
        raise ValueError("Word length do not match")
    matches = 0
    for i in range(len(word1)):
        if word1[i] == word2[i]:
            matches += 1
    return len(word1) - matches

Now if I were developing a library for external use, it might be good to leave the check there for the sake of fail fast, but for a one-off program that I know verifies the input before it’s passed to the function, it’s unnecessary, and better left for either a debug assert (which I haven’t learned how to do yet) or unit tests.  What happens if I remove those 2 lines and try again?

(poets:verse) 187.18 -> 154.42 (22%, 22%) 
(verse:poets)  23.79 ->  18.74 (27%, 27%)

The format I’ll be using is shown above where “Results(1)” means it’s the first optimization, times are in seconds, and the first % is percent improvement from the previous version, and the second % is percent improvement from the initial version in my first post.  

A respectable improvement for a simple change.  But there’s another spot that we do unnecessary checking:

def getchildren(word, wordlist):
    return [ w for w in wordlist if len(word) == len(w)
            and distance(word, w) == 1 ]

For every word in the wordlist we check if that word has the same length as the word we’re finding children of.  Well we already know that they are because we filtered out wordlist before we passed it to the AStar function.  So let’s take out that check, which give this:

def getchildren(word, wordlist):
    return [ w for w in wordlist if distance(word, w) == 1 ]
(poets:verse) 154.42 -> 101.82 (52%, 84%) 
(verse:poets)  18.74 ->  13.91 (35%, 80%)

Now that seems may seem a little surprising at first if you remember that getchildren is only called about 1000 times compared to the 5 million times distance is called, but the list comprehension is actually a foreach loop, so for each time getchildren is called, len is called twice for EVERY word in wordlist.

The next optimization is something I mentioned in the last post that seemed obvious in hindsight but didn’t strike me until this stage in the optimization game.  And that is the fact that our distance method counts the number of letters that are the same and then subtracts it from the total length.  What we want is the letters that are different so we can just do that directly:

def distance(word1, word2):
    difference = 0
    for i in range(len(word1)):
        if word1[i] != word2[i]:
            difference += 1
    return difference

That eliminates a single call to len and a subtraction operator, but every little thing counts right?  Well it does when you’re doing it 5 million times.  Here’s the results:

(poets:verse) 101.82 -> 79.30 (28%, 136%) 
(verse:poets)  13.91 -> 11.92 (17%, 100%)

We’ve doubled the initial speed, which is great.  It’s still a fair bit slower than I’d like though.  I was at a bit of a loss as to where I should optimize from here, so I asked the great people over at Stackoverflow and got a lot of good advice and suggestions (I recommend you check it out after reading this post).  It was pointed out that there’s a reason they left out the standard for loop that most other popular languages have, and even though you can fake it with the range method, it’s more “Pythonic” to iterate through the actual list.  

I didn’t consider this before because well I had to iterate through the letters of 2 words at once.  Turns out Python has a way to deal with this: the zip(list1, list2) method (where a “list” here just means any collection that’s iteratable, which I was happy to find out includes strings).  What zip does is it takes 2 lists and creates a list of tuples.  To show you what I mean, this is the result of zip(“ABC”, “abc”):

[(‘A’, ‘a’), (‘B’, ‘b’), (‘C’, ‘c’)]

Here is what distance looks like after changing to use zip:

def distance(word1, word2):
    difference = 0
    for x,y in zip (word1, word2):
        if x != y:
            difference += 1
    return difference

Multi-variable assignment is totally sweet!  That’s a lot more elegant looking, but how does it perform?

(poets:verse) 79.30 -> 74.59 ( 6%, 151%) 
(verse:poets) 11.92 ->  9.18 (30%, 159%)

This seemed to help the 2nd case a lot more than the first.  Why?  I have no idea, but I’ll take any kind of optimization I can get.  (Although I’d be nice if it helped the slower case more instead of the one that’s already fast)

The next thing that was pointed out to me was the fact that I’m counting the number of differences, when all I want to know (at least for the purposes of getchildren) is whether or not they differ by one letter.  I can stop checking at that point.  I still need the actual distance for calculating the h_score, so I created a new function called is_neighbors that returns true if there is a difference of exactly because that way it can stop checking when the difference is more than one, because we know it’s not a neighbor.  Here’s what that looks like:

def is_neighbors(word1,word2):
    different = False
    for c1,c2 in zip(word1,word2):
        if c1 != c2:
            if different:
                return False
            different = True
    return different

What this does now is to flag the word as different if one letter is found to be different.  If a second letter is found to be different when the flag is already set, then we can return False right away without checking the remaining letters.  If that doesn’t happen then that means either the different flag was set meaning one letter was different, or it’s not set meaning they were the same word.  Let’s see how much this helps:

(poets:verse) 74.59 -> 70.14 (6%, 181%) 
(verse:poets)  9.18 ->  8.83 (4%, 169%)

Another small gain, but still an improvement nonetheless.  But where do we go now?  Well I was directed to the izip function in the itertools module instead of zip because it returns an iterator instead of generating a list.  A list generates more overhead because it is indexed and allows for random access, but we don’t need this since we just need to iterate through the letters in order exactly once.  I won’t bother with a code snippet for this one since all I did was “from itertools import izip” and changed zip to izip in distance and is_neighbors.  Here’s the results:

(poets:verse) 70.14 -> 41.69 (68%, 349%) 
(verse:poets)  8.83 ->  5.02 (76%, 374%)

Wow! It’s amazing what the right data structure can do.  At this point I was going to consider it done.  It’s still not lightning fast, but I didn’t really think I could squeeze much more out of it.  Mark Ransom left a comment on my question that he wanted me to try out the latest version of his answer.  He first checks if the first letter is different, and if it is, then see if the rest of the string is identical.  If it is then we can return true, then it’s different by exactly one letter (the first one), otherwise there’s more than one letter that’s different.  We don’t care which ones.  If the first letter is the same then we continue as usual.  Here is his solution:

if word1[0] != word2[0]:
        return word1[1:] == word2[1:]
    different = False
    for i in range(1, len(word1)):
        if word1[i] != word2[i]:
            if different:
                return False
            different = True
    return different

We already determined that zip/izip is faster than range but I wanted to try it as posted first.  (He later said he meant to use izip, and he did in his testing, just forgot to update his answer).  The “word[1:]” syntax is called slicing which is kinda like substring except that it works with any indexable collection.  If the first letter is omitted it is assumed from the beginning and if the last is omited it is assumed to the end.  As a side-note, an interesting note is you can use list[:] it create a copy of the whole list.  Anyway, let’s see what the profiler has to say about this short-circuit method.


(poets:verse) 41.69 -> 29.82 (40%, 528%) 
(verse:poets)  5.02 ->  3.59 (40%, 563%)


I was initially shocked at how much of a difference that made.  It didn’t make sense to me how that would speed it up so much, and that’s even with reverting to using range which we’ve shown is much slower than izip.  Let’s consider this for a second though.  Most of the time the first letter is different.  We don’t need to compare the rest of the letters one at a time, a straight equality comparison on 2 strings is much faster.  Well you might say we were already cutting it off as soon as we found a 2nd letter that was different, and that’s what I thought at first.  But really what happens is that if the next letter to differ isn’t until the last letter, then we still ended up checking the whole word.  If there is only one letter different, again we end up checking the whole word.  With this, if the first letter is different, which is very likely given that there’s 26 letters in the alphabet and thus only a 1 in 26 chance that the first letter is the same (okay that’s not true because the list consists of real words and not every combination of possible letters, but it’s close enough for the sake of example) so 25/26 times we only need to check a single letter and we know the answer with a simple straight comparison.

Let’s see what happens if we change it back to use izip


def is_neighbors(word1,word2):
    if word1[0] != word2[0]:
        return word1[1:] == word2[1:]
    different = False
    for x,y in izip(word1[1:],word2[1:]):
        if x != y:
            if different:
                return False
            different = True
    return different


Instead of zipping the whole word this time, we only need to zip starting from the 2nd letter.  So let’s see how this compares:


(poets:verse) 29.82 -> 27.88 (7%, 571%) 
(verse:poets)  3.59 ->  3.38 (6%, 604%)


Not quite as big of a difference as what we saw before when we first switching to izip, but that confirms our earlier suspicions that the vast majority of the time we know the answer after checking a single letter.

So now we’re done right?  I thought so too.  Another person, Sumudu Fernando, suggested an entirely different approach:


If your wordlist is very long, might it be more efficient to generate all possible 1-letter-differences from ‘word’, then check which ones are in the list? I don’t know any Python but there should be a suitable data structure for the wordlist allowing for log-time lookups.

I suggest this because if your words are reasonable lengths (~10 letters), then you’ll only be looking for 250 potential words, which is probably faster if your wordlist is larger than a few hundred words.


At first I was skeptical.  For each word you had to generate a list of all strings that was one letter off,  and that’s huge, for a 5 letter word, there’s 25 possibilities foe each letter, so that’s 25 ^ 5, that’s 9 million strings, I don’t know where you’re getting 250 potential words for a 10 letter word from!

I was actually going to write a comment to that effect, until luckily I caught my own mistake before I posted it.  That’s all possible words, that don’t share any letters in common.  We want words with exactly one letter in common, which is what he said.  The number of possibilities was 25 * 5 not 25 ^5, and 125 is a lot more reasonable than 9 million.  I still felt that generating all those words would bring some overhead, but it’s possible that it could speed it up, so I tried it out and this is what it looked like:


def one_letter_off_strings(word):
    import string
    dif_list = []
    for i in xrange(len(word)):
        dif_list.extend((word[:i] + l + word[i+1:] 
            for l in string.ascii_lowercase if l != word[i]))
    return dif_list
def getchildren(word, wordlist):
    oneoff = one_letter_off_strings(word)
    return ( w for w in oneoff if w in wordlist )


Someone else introduced me to xrange instead of range, which is analogous to what izip is to zip.  The extend function of a list adds the element of a 2nd list to the first.  You may notice I did a little magic with the segment:

word[:i] + l + word[i+1:]

Remember the slicing we did earlier, well for each i, we create a word that’s identical to word and replace it with the letter l (that’s an L not the number one, probably should’ve used something less confusing, but I wasn’t thinking about that at the time).   Slicing includes the first index, but excludes the last one, so for i=0 and the word “hello”, you get word[0:0] + l + word[1:end], which translates into “” + l + “ello”, then “h” + l + “llo”, and so forth.  We do this for every letter in the alphabet that isn’t equal to the the actual one in the word.

We then change getchildren to loop through the oneoff list and return a tuple of words in the oneoff list that are also in wordlist.  A tuple is basically a read-only list that has a slight performance bonus since it’s read-only.  The one_letter_off function doesn’t look too pretty and probably can be optimized a bit, but I’m not quite sure how, so let’s just see how this does:

(poets:verse) 27.88 -> 34.40 (-23%, 444%) 
(verse:poets)  3.38 ->  3.74 (-11%, 536%)

Ok, this did a little worse, and I can’t really think of any ways to optimize the one_letter_off function, so this is probably a dead-end.  A closer look at the profiler shows that my gut feeling was wrong and that the one_letter_off function ended up being an insignificant amount of time.  The majority of the time was spent in this line:

return ( w for w in oneoff if w in wordlist )

Well, I’m stumped.  I don’t see how much simpler it could get than that.  What if I switched wordlist and oneoff?  What are we really looking for anyway?  We’re looking for the intersection of the 2 lists.  Intersection… that’s a term that’s often applied to sets in set theory.  I wonder if there’s a built-in fast way to do that.  With a bit of google-fu it seems that we can convert a list to a set which is another built-in type, and we can get the intersection with the bitwise and operator (&).  This is what it looks like:

def getchildren(word, wordlist):
    oneoff = one_letter_off_strings(word)
    return set(oneoff) & set(wordlist)

Should get a bit of an improvement with this, I’m sure they optimized set operations on sets.

(poets:verse) 27.88 -> 2.25 (1139%,  8219%) 
(verse:poets)  3.38 -> 0.23 (1370%, 10243%)

!@$!!  Really?  Are you serious?  I had to run it again to make sure it wasn’t a fluke.  Ran it with different inputs to make sure the answer was correct.  Holy $!@%.  Apparently set operations are absurdly fast.  I don’t know how it’s implemented under the hood, but WOW!

For fun I added some little debug prints to see what kind of data it was working with since it seemed like it was checking a lot of words.  What I found was that, there were lots of words with the same f_score, but different h_score.  The algorithm cares about f_score only, but it seems that if the f_score is the same, we’d have more luck with trying words that are closer to the goal first, or in otherwords, have a secondary the h_score if f_score is the same.  Here’s what changed in the main algorithm function:

def AStar(start, goal, wordlist):
    import Queue
    closedset = []
    openset = [start]
    pqueue = Queue.PriorityQueue(0)
    g_score = {start:0}         #Distance from start along optimal path.
    h_score = {start:distance(start, goal)}
    f_score = {start:h_score[start]}
    pqueue.put((f_score[start], h_score[start], start))
    parent_dict = {}
    while len(openset) > 0:
        x = pqueue.get(False)[2]
        if x == goal:
            return reconstruct_path(parent_dict,goal)
        for y in getchildren(x, wordlist):
            if y in closedset:
            temp_g_score = g_score[x] + 1
            temp_is_better = False
            appended = False
            if (not y in openset):
                appended = True
                h_score[y] = distance(y, goal)
                temp_is_better = True
            elif temp_g_score < g_score[y] :
                temp_is_better = True
            else :
            if temp_is_better:
                parent_dict[y] = x
                g_score[y] = temp_g_score
                f_score[y] = g_score[y] + h_score[y]
                if appended :
                    pqueue.put((f_score[y],h_score[y], y))
    return None

I highlighted the changes.  I added the h_score to the tuple in the priority queue so that acts as a secondary sort.  This change was actually made just now, as in the time of this posting.  As happy as I was with the previous optimizations, once I noticed all the words being checked that didn’t need to be, I was compelled to give this a try.  Here’s the results:

(poets:verse) 2.25 -> 1.28 (76%, 14523%) 
(verse:poets) 0.23 -> 0.19 (21%, 12421%)

A pretty nice gain if I do say so myself.  One thing to keep in mind though, is that these results are looking at the time for the AStar method.  I did this initially because compared to that, everything else had negligible execution time.  It actually takes about 1 second to clean the wordlist, which is not so insignificant now compared to the optimizations done, but for the sake of consistency, and not wanting to wait several minutes for all the earlier tests to rerun, that’s not included in here.  I’m sure that could be optimized as well, but I’m happy with this, so I’m calling it done.  Also, filtering out wordlists by number of letters could easily be pre-calculated so it doesn’t really make sense to bother optimizing that.  There you have it, showed you the optimizations I promised and one freebie.  Guess this goes to show optimizations still matter.  And also profiling is important.  Otherwise I would’ve wasted my time trying to optimize the one_letter_off method in version 9 and completely missed the biggest optimization of all, using sets for intersection.

Well, hope you enjoyed reading that was much as I enjoyed optimizing it.  Until next time!


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )


Connecting to %s

%d bloggers like this: