def count_words_split(sen): return len(sen.split())This is nice since split automatically takes care of multiple consecutive spaces if present. However in my case all the words were guaranteed to be separated by 1 space only. So following should get the job done with a little less work:
def count_words_count(sen): return sen.count(' ') + 1
This is essentially a single pass over the string with no need to create an intermediate list of strings and so should run faster. Surprisingly, on Python 2.5, the first method is twice as fast as the second one. I have no idea why. However sanity is restored on Python 2.6 and the second version is not only faster but also gets better with increasing size of input.
This got me thinking about a good introductory programming language. I learned programming with C and algorithms with Java. Many people have argued that Python makes a better introductory programming language. I have also liked Python in the one year I have been using it. One nice feature of CPython, the primary Python implementation, is that the critical parts of your program where you have need-for-speed can be written as C extensions thus getting a significant performance benefit. Many standard Python modules are written as C extensions.
To a newcomer, however, it is not always clear what is implemented in C and what is not. Most of the time it is OK for somebody who is only learning to program. However an important part of learning to program is to learn about various data structures and the algorithms and how they compare on problems. Now if it so happens that an algorithms that should run faster in theory ends up slower because it uses parts of language implemented in pure Python while the other algorithm silently makes use of parts ported to C and runs faster, it can be confusing. This was the situation I found myself in while running my 2 algorithms on Python 2.5.
In fact, even beyond python, I would argue that a good introductory programming language should be consistent in the results it generates even if they are not the fastest. Perhaps Jython is a better bet in that sense since it will make sure that there are no optimized C modules skewing the timing results. Perhaps it is possible to create a dumb-down version of CPython which will not use modules written in C, using pure Python replacements instead.
PS: In fact, given that algorithms often have a space-time trade-off, even GC might play spoilsport. So does that mean going back to C? :)