RSS Feed

Peter Goodman's blog about PHP, Parsing Theory, C++, Functional Programming, Applications,

My First Python Program

For my first Python program I got the idea of making an ORM. I haven't learnt how to use most libraries yet and so it would need to be an ORM to Python's built-in data structures. To make it persistent, I used the cPickle library to serialize and load data to/from a file. Right now you might be thinking: "What problem does this solve?" or "It would never be able to handle concurrent data saving predictably." Well, it solves no existing problem and you're right, I wouldn't want more than one person modifying the data at the same time! This was an exercise in learning how to program with Python.

Off the bat I wanted to be able to store and access the data as if it were a SQL table. This meant that Python's dictionary object (similar to PHP's associative arrays) would be involved. I decided early on that storing individual dictionaries for each row of data in the data sets would not only be overkill but that it wouldn't be flexible. Just imagine trying to change a column name if the column names are hard-coded into each row! That leaves me with either a tuple (an unmutable list) or a list (similar to PHP's numeric arrays). Tuples are faster, but would have been a pita to work with because I need to deal with setting default values for columns with no values specified, and I also need to be able to set the values of each row without necessarily knowing the order but knowing where the values fit in. Lists it was.

Another interesting thing I did which has turned out to be instrumental in getting everything working in a nice way is the idea of primary keys. Each row in the data set has a primary key, which just happens to also be the value of the index of the row in the list. Reading this, you might realize that if this is the case, then I can't actually delete any rows from the data set or else the primary keys stored in the individual rows of data won't correspond to the list indices. Well, you're right. I solved this problem by simply setting the row to None.

Filtering data from the data sets was an interesting challenge, but Python ended up making the task easy. My requirement was that the programmer could simply pass the argument name and either the value it should equal or a lambda to return true/false given a value. Remember, rows in the data set are not dictionaries, and so given the column names, the filter function needed to be able to figure out which data corresponded to the proper columns. This was generally easy.

Remember how I said that each row has a primary key and that key is also the list index in the data set? Well, this makes some things wildly easy. For example, if no arguments are passed to the filter function, then presumably we should return all rows in the data set--excluding deleted rows. Using Python's list comprehension, this task is made easy:

cls.pks = [i for i in range(len(cls.data)) if cls.data[i] is not None]

Notice how this comprehension doesn't actually return any data (if it did, instead of [i for ..., it would be: [cls.data[i] for ...). This is the best idea behind my whole system: data isn't fetched unless it is explicitly asked for. You might wonder why and the the main reason is because I want to minimize how many times the data needs to be copied. As I said, the data is stored in lists but is represented as dictionaries. That means that at some point there is a translation from a list to a dictionary (a copy) that needs to take place and by using Python's generators I am able to do this only when the data is asked for.

This has a huge implication: the majority of the time the data itself is not passed around, only the primary keys to that data. This means that to limit or slice the data set, all one is actually doing is slicing the list of primary keys! So, instead of having a slice return a (posssibly) large list of dictionaries its returning a large list of integers.

I feel like I've written quite a bit, so the next article will probably go more into the code side of the ORM, how I solved the problems that came up, and examples of syntax for all I/O operations.


Comments

There are no comments, but you look like you have something on your mind.


Comment