Defying Classification

by Malcolm Tredinnick

Topic: software/python

Mon 3 Sep 2007

This Stuff Can Be Dangerous

Posted at 20:36 +1000 (edited 21:55)

Always a bit worrying to read things like this:

A member of our team has sustained two injuries today in the course of our efforts. Fortunately they were minor and she seems confident that she can continue work tomorrow. I will be sure to file a full injury report and an updated risk assessment with our safety advisor before then.

Then you realise this comes from a diary entry for a programming event and you just have wonder, what were they doing?! Still, on balance, I think that particular team has their priorities straight.

Bonus Feature

The paparazzi were at work and took a shot of two thirds of our team trying to come up with an idea for pyweek. Suffice it to say, the topic ("twisted") was a little challenging.

Unfortunately, this is one of about 43 things I am juggling at the moment and not the highest priority, by far, so I suspect I'm not going to be a good team member here.

Topics: software/python/pyweek

Sat 7 Apr 2007

Odd Python Fact

Posted at 19:26 +1000

Using the long weekend to get some intensive hacking done, I'm converting Django's internals to be more transparently unicode aware. All this character encoding twiddling has me thinking about performance, so I've been writing lots of little test programs to time features.

One unusual result that popped up this afternoon concerned reading a UTF-8-encoded file. Contrary to my intuition, this version:

data = open(filename).read()
data.decode('utf-8')

was consistently a little bit faster than this version:

data = codecs.open(filename, 'r', 'utf-8').read()

Admittedly the differences were generally (much) less than 5%, in favour of the first version, but I was a little surprised there was any real difference at all. I'm not worried by this result, but I would have guessed incorrectly.

In both cases, I'm reading in the data and converting it a unicode string. I was running it against some examples I had lying around from Markus Kuhn. The results were consistent if I changed the order of the tests or intermixed them. Aliasing codecs.open to a global variable sped up the second method very slightly, but not enough to catch up. I was careful to pre-fill the disk buffer cache and run each test enough times in a loop for any noise on a single run to be absorbed.

Turns out, the results are closest (essentially identical speed) for files that have mostly one byte per character (pure ASCII files being the fastest) and diverged the most for more complex characters. The runic poem, with lots of three byte characters, and Greek text, which is entirely two byte characters were the most divergent.

Topics: software/django, software/python

Mon 26 Feb 2007

Why Changing "Just This One Thing" Can Lead To Tears

Posted at 20:21 +1100

Apropos of nothing in particular (I was looking for something totally different and typed in a lazy Google query), I stumbled across a quote Guido van Rossum made last year:

"This is an illustration of the dilemma of maintaining a popular language: Everybody hates change (me too!) but everybody also has one thing that's bothering them so much they absolutely want it to be changed. If you were to implement all those personal pet peeves, you'd get a language that's more different from Python than Python is from Fortran." — from the python-dev archives

Too true, and it strikes me that this doesn't just apply to Python. It is also very true for any large system with a userbase. This isn't an argument against change, but it's a rationale for why caution is a good idea.

Topics: software/django, software/python