Sunday, November 18, 2012

Math In The Time of Cholera

I got enough nice comments on the last post that when I was struck by a piece of curiosity I decided to go for another. To the extent that this blog has a purpose, it is essentially going to be my tinkering, which may or may not involve programming, but will (I imagine) pretty much always involve some form of mathematical modeling.

This week, I've decided that what's cool is evolution. In particular, antibiotic resistance. We all know that antibiotics, as they are (over)used, eventually fail to work. As bacteria are exposed to them, they evolve around them, learning to survive the deluge of drugs. Although I understand that this is a serious health problem for many, somehow I can't help but feel a smidgen of respect for these microorganisms, or at least for the machinery of life that enables their adaptation. As a great chaos theorist once said, "Life finds a way."

Disclaimer: Last time, I put up a disclaimer saying I didn't know anything, really, about cryptography. I would like to similarly note that I'm quite ignorant with respect to microbiology. At least with cryptography, I was staying in the realm of my academic training, which is highly mathematical. What I'm doing here is mathematical, but it's really a biological application, and the math is just how I approach it, because that's what I'm comfortable with. I worry that here I enter a long tradition of physical scientists who think they know what they're talking about. Hopefully I can get away with it by being self aware: I am not a doctor, I am not pre-med, I am not in the life sciences. The analysis and programs that follow are ridiculously simplistic and took me only a few hours, and are not to be taken seriously by anyone, including me. Feel free to point out all of the failings in this analysis.

So, anyway, we all know what antibiotic resistance is. When I examined the model I had in my head of antibiotic resistance, though, I discovered that it didn't really hold up, probably because I constructed most of it when I was ten. In my head, I imagined two kinds of bacteria--"resistant" and "vulnerable." When exposed to antibiotics, the resistant ones lived and the vulnerable ones died. Afterwards, only the resistant ones were around to produce offspring,  so the population as a whole would be resistant.

The flaws in this model come about when I considered the frequent advice that we hear: always finish your antibiotic regimen, because that helps prevent resistant bacteria from arising. This doesn't make a whole lot of sense in the context of my previous mental model. If resistant bacteria are all already resistant, then taking more antibiotics (or antibiotics for longer) won't affect them. In fact, arguably, you'd just be applying more selective pressure to the population, leading future generations to be even more resistant. 

The problem arises because I conceived of bacteria as falling into two categories, a binary distinction between "dies when exposed to antibiotic" and "lives when exposed to antibiotic." Perhaps, instead, it would make more sense to talk about probabilities. After all, often in toxicity we hear about "LD50" -- the lethal dose that kills 50% of a population. No antibiotic is 100%, right? Even bleach, alcohol, and other chemicals so strong that bacteria just straight-up do not evolve around them advertise "kills 99.9% of germs!"

In that last paragraph, I switched from talking about words ("resistant" and "not resistant") to talking about numbers ("how resistant?") which is my personal sign that I could analyze this for real. My goal was this: I wanted to watch a virtual population of bacteria evolve antibiotic resistance. 

The first part of any mathematical modeling is abstraction. For example, if i calculate the trajectory of a baseball, I will treat the baseball as if it were a point mass. I will ignore air resistance and how fast it spins in flight and the gravitational effects of the moon. Of course, these things can factor into an analysis, if I need a very detailed or comprehensive answer--but in general, why worry about them? Similarly, I don't need to do a real simulation of a human body full of bacteria, or to worry about how the antibiotics get there, or anything like that. I want to abstract out so all I have to worry about is the actual resistance, since that's what I'm interested in.

To do this, I organized my program like so: I have a list, called Petri. Petri is a petri dish, except obviously not a real one because doing biotech in my bedroom is a good way to give myself a terrible disease on accident. Petri is a list of objects which are just called "cells." (An aside, to any programmers reading, this was my first ever delve into the object-oriented world, with "cell" the first class I've ever defined. I had read about the concept and so I had some idea of what I was doing, but still. Milestone.) A cell has two associated variables. One variable "alive", which is either "true" or "false" to tell me whether the cell is alive, naturally. Another variable is "resistance." Resistance goes from .05 to .95, and represents the chance that the cell survives a dose of antibiotic. 

Resistance is an exceptionally abstract statistic, and I'm somewhat proud of how much information it carries in it. Everything about the interaction of the cell with the outside chemical is contained in that number. If you took a larger volume antibiotics, all that would do is change resistance. If you took them more frequently, I would just have to apply a dose more frequently. It doesn't matter how the antibiotic works, or how it's distributed, or anything. One thing I have thought of that it doesn't account for is this: suppose all the drug does is stop replication, rather than killing? That bacterium still takes up space on my virtual petri dish, but doesn't replicate (presumably eventually dying of, I dunno, entropy). To work that in, I'd have to allow for bacteria that don't die but also don't replicate, and so continue to take up space for a little while, etc. An interesting future extension, maybe.

Ok, so now that I have my list of cells, what do I do with it? Well, I can "dose" the dish, which makes all the bacteria roll a die generating a number between 0 and 1, and if that number is higher than their resistance, they die. I can also call a function called "generation." Generation means that every bacterium that is still alive makes a copy of itself into one of the empty spaces. (The number of total spaces is fixed and initially every space is full of bacteria, so the population is limited by its starting total.) I also want to be able to keep track of how the population is doing, so I have a function that can count the number of remaining live cells and another that calculates the average resistance of all the cells, which is interesting because this should increase as doses continue.

There are several arbitrary aspects of the simulation I'd like to talk about, because there are parameters that I essentially pull out of thin air. One of these is the generations/dose ratio. How many generations pass between doses? A thousand? Two? Ten? Manipulating this turns out to be very interesting, more on that later. Another is the initial resistance distribution. I have to pick a resistance for these guys to start with, after all, before any selection can occur. Also, they'll have to have different resistances, because otherwise there is no natural selection.

For the trials I'm about to show you, the generation/dose ratio was 2/1. That is, my program counts time in steps, t = 0, 1, 2..., and there is a new generation every step, and a dose of antibiotic applied at every even step. Again, this is arbitrary. I also started with an initial resistance distributed exponentially, with the average resistance at .15 or so. Also, there are 1000 spots in my petri dish. (This is arbitrary too, but it needs to be large enough that statistics works, but small enough that the program doesn't take too long to run. It is, therefore, limited mainly by patience, and more would be better.)

Anyway, here are the results. First, let's look at a chart of how many cells were alive:

Notice what happens here. First, the population drops extremely quickly. It makes a few small rebounds and is driven further down--and then there's a turnaround, and the bacteria win from there on out, their growth outpacing the antibiotic. Eventually they reach saturation, and every dose of the antibiotic does indeed drive them down--but their reproduction is fast enough to recover before another dose comes. Notably, they are consistently driven down to about six hundred or so. Let's look at what happened to the average resistance during the simulation:

First of all, it obviously rose quite a bit--remember our starting average was on the order of .1, and we ended around the .6 mark. That aligns perfectly with the survival rates we saw at the end of the simulation But why didn't it go higher? Well, the fact is that the simulation only has certain values in it, and so it appears that eventually all surviving cells had a common ancestor with that resistance, so it stops there. 

That reveals a flaw in the program that goes with another flaw I had already noticed and decided not to tell you about. First: no new traits emerge in this. Natural selection operates on the resistances that are already present, but if there was no resistance .9 in the sample, nothing will ever generate that high of resistance. Second: I don't think I have a very good distribution. Every biological trait I can think of tends to be distributed normally--on a bell curve, that is. But if I had a bell curve centered on .1, I might not have any of high resistance at all, and then my colony could never evolve that, so I cheated and used an exponential distribution.

Fixing one of these problems requires fixing the others, so here's what I'm going to do: First, I am replacing the exponential distribution with a bell curve centered on .15, with a standard deviation of .03. (On a normal distribution, 99.7% of samples lie within three standard deviations, so virtually all of our bacteria will lie between .06 and .24.) Second, I am introducing a new stat that each cell carries called "mutability." A cell's mutability ranges from 0 to .1, and when that cell reproduces, its daughter cell's resistance is modified by a random percentage of its mutability in either direction. That is, a cell with mutability .1 and resistance .2 will have daughters equally likely to be anywhere from .1 to .3. (I could make this a bell curve as well, I suppose, but it's not as crucial.) This will allow the population to produce resistances not present in the initial sample as the offspring may mutate to have better resistance than their parents.

Now, with our new simulations, here are the outputs:


Oh man. Ok, I kind of make a life out of geeking out about this stuff, but still, you gotta admit this is cool. First, look at the cell counts. Much like the others, it drops precipitously, recovers, reaches saturation, and oscillates as dosing continues. Unlike the others, it doesn't get stuck. Every time the dose applies, another round of natural selection occurs, until eventually the population is maximally resistant--which I set at .95, because it didn't seem all that realistic to allow perfect resistance. (In development, before I put that limit in, the resistance would go to 1.) Especially cool is the average resistance chart, because we started with a very tightly clustered distribution, but it doesn't take very long before every member of the population is more resistant than the entire original generation, which I think is pretty sweet.

Alright, until now, I've only shown you depressing graphs in which disease is nearly beaten back and then roars to a maximally-resistant dominance over feeble medicine. Luckily for you and I as humans, this is not inevitable. The variable I'm going to tweak is the generations/dose ratio, and I explored this by trial and error. If we flip it--two doses per generation--the bacteria die out in less than a dozen generations. I wanted to find something close to the tipping point, so I experimented a bit more, and it turns out we were never far from the tipping point--if there are three generations for every two doses, instead of four, the antibiotic wins. This only takes twenty-something generations, because there is no dramatic turnaround--only the cold hard victory of medical science:


Note that the resistance still goes up--the bacteria are, in fact, evolving--but because we finish our antibiotics they die and their resistance never escapes. If you stopped taking antibiotics, you would indeed potentially birth a new population of more-resistant (if probably not totally-resistant) bacteria. Repeat this a few times and you could well end up with maximally-resistant germs, at which point you are well and truly screwed.

Since I'm so big on the power of mathematical abstraction, let me point something out that represents the other side, that is, the flaws in this. My "doses" and my "generations" are quite abstract--in real life, antibiotics have a continuous effect if you're taking them constantly, and bacteria don't all divide in lockstep. This is fine for my extremely-simplistic model, but it means that ideas like "doses per generation" don't necessarily translate into real-life terms very well. In other words, don't read "doses per generation" as "pills per hour." Still, the basic concept--you need enough drugs to overcome their reproduction, and you need to stay on them long enough to actually wipe out the population--is there.

I want to point out that there are several other interesting things I could manipulate here--one would be starting resistance, one would be mutability range. Biological modeling is a real field, obviously, and I don't really have time this afternoon to reinvent it. There is one other experiment I do really want to run, though, and it didn't occur to me until I was typing this. I want to make mutability, uh, mutable. Before, a cell was exactly as mutable as its parent--now, I'm going to modify that number by saying that each cell's mutability can change...by its mutability. So mutability could as much as double, or it could drop all the way to zero. I don't know what will happen here exactly, but I have a hunch I'm highly interested in.

Neat. Super neat. Check it out: mutability at first increased, which makes sense because when you're under attack by terrible deadly chemicals it makes sense to be flexible. These cells found it favorable to be changeable, in other words. But at some point, the cells were as resistant as they were going to get, and at that point, mutations became a detriment. If there's nowhere to go but down, you're better off staying still. So mutability fell and continued falling for the rest of the simulation. (This is what I expected and so it makes me feel really cool, though I could easily lie about that. You'll have to trust me.) Again, we see the power of abstraction: real mutability presumably depends on, I dunno, DNA transcription error correction and stuff that I have no idea about. But because I looked at behavior (rather than mechanisms of behavior) I can still see these trends even though my "cells" are nothing like the real thing.

Natural selection: ain't life grand? That's kind of how I feel at the end of this project. I wanted to watch simple digital evolution, and I did. I hope you enjoyed coming along for the ride, if you've read this far.

Once again I'm attaching a tarball of my Python code. It is not very fast. (I can see ways to optimize it but don't really need to at this point, because, ah, I'm done with it for right now.) It is probably not very good. But, here it is, if you are interested.

http://dl.dropbox.com/u/32594622/antibio.tgz

wait a minute don't i have a test tomorrow

Tuesday, November 13, 2012

Codes and Keys

You know what's cool? Cryptography. Cryptography is SO DAMN COOL. Cryptography is cool, I think, because it sits in this really neat spot between math and language and computer science. And, of course, if you're like me, you like playing with ideas of information and all that. The idea that a text could be full of information utterly unreadable to most people is fascinating. I don't remember not being able to read, because I learned to read very early. But there's something enchanting about the familiar symbols of literacy--word, space, letter--twisted into something incomprehensible to you. Simple cryptograms, the kind you get in newspaper puzzle columns, have these disparate patterns that some part of the mind latches onto, trying to twist the letters back into the words you know and love--is that "ads" a "the"? Would that explain the "adsks" later on in the sentence?

Of course, if you think the idea of turning information into apparent gibberish and back is interesting when you have the key, I think you probably can't help but be enchanted by the idea of breaking codes. I recently learned to pick locks; the psychological drive is no different. Actually it's not that different from scientific curiosity. This is a pattern, there is information here--how can I tease it out? Pushing pins, analyzing letter frequencies, solving equations: these are just more ways of me asserting my mind on the unknown or (for locks) disallowed. Anything I don't understand and anywhere I can't go are limitations, and what can you do with a limitation but try to think your way around it?

Enough philosophizing, let's talk crypto.

Disclaimer: I don't actually know a whole lot about cryptography, I'll admit. I have read Neal Stephenson's Cryptonomicon, and that's probably the most advanced text I've gotten on the subject. I once read Dan Brown's Digital Fortress but I think that counts negative. Cryptographers: if I misuse terminology in this blog post, I apologize. This is certainly an example of me just fooling around.

My recent interest in cryptography started in my intro-level Botany class on Monday. (We all need our Gen Eds.) I was bored, and the professor was talking about evolution. I don't know if it was evolutionary algorithms or genetic codes or what, but I started thinking about codes. On the back of last week's exam, I scrawled out a simple rotating cipher. For those who don't know, what you do is write out the message, then write the key underneath it. Then, you "add" the letters together, numbering them 1 to 26. So A + B = C because 1 + 2 = 3. A + I = J because 1 + 9 = 10. If you go over 26, you divide by 26 and take the remainder. (Math people will recognize this as "mod 26" arithmetic.) The message "MESSAGE" encoded via the key "ABC" would look like:

MESSAGE
ABCABCA
NGVTCJF

So "MESSAGE" becomes "NGVTCJF." This code has some good properties. A simple cipher, where you just exchange each letter with one other letter, can be broken easily by picking out common words, or just counting the letters--"E" will appear the most. But in our code, "E" maps not to one letter but to several--so that attack can't be done as simply.

Once I started thinking about this I thought several things, kind of all at once.

1) It would be really easy to teach a computer to encrypt and decrypt messages this way, given a key.
2) It wouldn't be that much harder to teach a computer to break these codes, perhaps. More on the method 

One more complication: I decided to write this all in Python. This is a complication because I didn't really know any Python until I decided this--I had installed it once and flipped through the beginning of a tutorial, but had never motivated myself to actually learn it. My default programming language is Fortran 90, which is what I use to do research. (yes, yes, Fortran. Laugh, why don't you?) Thinking through how to do this in Fortran, I realized it might be time to pick up a more modern language. I've attached all my code--if it looks hilariously awkward or weird, please blame my inexperience with the language. I have no doubt it could be waaaay more elegant.

The first thing I needed to write was a dictionary. I needed to define a map from letters to numbers. But wait. There's more to writing than letters, right? I mean, for one, there are uppercase and lower letters. Also, spaces, periods, commas, semicolons, dashes...you get the idea. I opted for simplicity--all letters are treated as lowercase. I added periods, commas, dashes, question marks, and spaces. Anything that wasn't in those categories got translated to a space. Once those were numbered, I was done--I could take a number and get its corresponding character, and vice versa. I could also put in a string of characters--a word, a paragraph, a page--and get a list of numbers out, and vice versa. That's good! Computers are better at working with numbers than letters, so I needed to be able to convert that. (All these functions are in dictionary.py, if you want to see the code.)

Next up, the actual encryption program. This part is pretty simple, actually, once the dictionary was done. Take in a string. Convert it to a list of numbers. Take a key. Convert it to a list of numbers. Add them together, meaning you get a new list of numbers. Turn that into a string. Boom, encryption. Make that addition a subtraction, and it's decryption. Let's see an example. I'll encrypt the first line of the Gettysburg address with the key "lincoln"


>>> encrypt.encrypttext("Four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in Liberty, and dedicated to the proposition that all men are created equal.","lincoln")

Which outputs:

'qwctn-pzzrboyqk.rxsymemntbknrwmqd?mqibjs?akj qdru hsqa ukw.bcsv-hpq, vym.vkknkvrynyn q,pkkpzvpgwbrohvpnwvmm vhhmlvqbrpqtknvsom wmvvpm.z,r?-v q,pn ul,mczwmxm.bo?rkk go rohrsdlyg'

Since I have the key, this is easily decrypted:

>>> encrypt.decrypttext(_,"lincoln")
'four score and seven years ago our fathers brought forth on this continent, a new nation, conceived in liberty, and dedicated to the proposition that all men are created equal.'


Notice that this is now in all-lowercase. This is very fixable--all I have to do is expand the dictionary--but I'm not worried about it right now.

Ok, well, that's cool I guess, if you like nerdy shit. (I do.) But suppose we stole this encrypted message off of one of our enemies, and we want to break it? We could try to see what the most common characters are--the problem is that what means "e" in one place means something different in another, so our letter frequencies won't be right.

Here, we apply a bit of insight. "E" doesn't map to the same letter every time--but every seventh "E" will, since "Lincoln" has seven letters. In fact, if we looked only at every seventh letter, then we'd be adding "l" every time--and that is a very simple cipher to solve, no more difficult than a newspaper cryptogram. While we won't have any particular words we can pick out, we don't actually need our encoded text (ciphertext) to have any meaning--we just need it to display standard English letter frequencies. Since English letters don't have a strong pattern every seventh letter, a "subtext" (my word--I don't know the real crypto word for this) of every seventh character works fine.

So, here's what I came up with: suppose I know the key length. Suppose I know the key is a seven-character string. Then I look at every seventh character. The most common letter repeated must stand for the most common character--which is actually not "E", but space, sorry for the deceit. Let's say the most common letter is "cipher-space." Cipher-space could be any character in my dictionary, but I know that it is formed from

"Space" + [first letter of the key].
I subtract
Cipher-space - Space = [first letter of key]

And I've found the first letter of the key! To find the second letter, I just add a letter of offset, so I'm looking at the second, ninth, sixteenth...etc. I repeat this until I'm done with the whole key.

Ok, well, that is a lot of nice theory. Does it work? To find out I had to first write a program that would generate letter frequencies from a string. That was neat to program but not particularly interesting, cryptographically, so I won't talk about it. Then, I wrote a program that implemented exactly the procedure I detailed above. Let's see how it does on our earlier example.


'qwctn-pzzrboyqk.rxsymemntbknrwmqd?mqibjs?akj qdru hsqa ukw.bcsv-hpq, vym.vkknkvrynyn q,pkkpzvpgwbrohvpnwvmm vhhmlvqbrpqtknvsom wmvvpm.z,r?-v q,pn ul,mczwmxm.bo?rkk go rohrsdlyg'
>>> breakcipher.findkey(_,7)
'liscoan'

(The "_" symbol refers to the previous output--so it is just plugging in the ciphertext) Well, that didn't quite work--notably, it did ALMOST work. Here's a problem with our method, then: it doesn't work on short amounts of text. A seven-character key means that our message got cut down to sevenths, and then we tried to do statistics on that. Understandably, that's prone to error. Suppose we try the entire Address?

>>> encrypt.fencrypt("gettysburg.txt","gettysburg.txt","lincoln")
>>> open("gettysburg.txt").read()
'qwctn-pzzrboyqk.rxsymemntbknrwmqd?mqibjs?akj qdru hsqa ukw.bcsv-hpq, vym.vkknkvrynyn q,pkkpzvpgwbrohvpnwvmm vhhmlvqbrpqtknvsom wmvvpm.z,r?-v q,pn ul,mczwmxm.bo?rkk go rohrsdlygfkp?cmcmmcapmpvtcupqkq.bokt?mnvnnvbqybfl hhbgb vyomyvpbsm bcsn h.cct,yemqaknybmpo vzvmu?kpzvpgwbrohnprkazhqgrtpl,rfkkplvmn?ytkm.fd?rghegnl phzgck,yhnbu?rl,mdo bwmlhwpyoh,hn ul,myo?ik rbvldphpq.pm wmfsovnibgnlm.w vwz.kwsbcsn hskswqhhnunlmqq.czk p.bk,rm.tneskszzmvvzaphej?kupzrbuldphbjst ktvxs-m pnvn ul,mpo vzvmowru hykepikqbbw-mltbqupbsm bttb q.inl.oh?t?.r?hbjo mcmmuvzcwlmf?kbsqa?lioa,jbwymlhycarr?hag,-rhhegnnnyh.qckqplveo rkglbfpmni.b,zbkk,pbpp?ibgnjlk rbql.kv,vnsnwt,ynjlk,ukbkt?wcprgm prbp?nbmmosyjktvxwytki.fnorlljbfs,k.btdrtwmqbvp pemjobrkk,pbpp?ibgrkv emho?mlj,xsk,azmr?z kx,ys?m wmcromzzmfs  lkb?n upheqawqk vnzkyt,bnsk.z,r-ny,?hyq,rm?mzg.mr?hejo mcmmuoemsm gkkoa,mkckplvmpsbr?hsqarr hejo m pr.novohugapikqbbw-mqw bd-m prbztdtvt-n?n prtkkbzhognoroqpccpqkprtskbzhbjskcynvpw-uplmy??xk ukqsm pr.ncuzhsqdru hugapmsidgn ua.mho?m-wmp?myehnfel.nmq?ntbkqabalbsm btz k?abczmmmmjs?rklrfwnn mqbczm prbu?rl,mvo-xkzroot.tvtbppszzrbd-mjgmvvlbkn q.kbsmagns,yw grkqpiqbfpm ixgnt.nzrcbpqklrx? vzvmv?kbsibbqlc-mmh??mcpvevkbsmgbuldphbjskyl.bbtaywhzgo-c?mmqtkqp-,vwz.kglbcsn hegnsr?mmjwruwbmts-,w-rbcsn hbjs-rklrcrkasiynny, hucepmoqrfnt.k-nk,kljhbjo m pvunyn q,pkkcylrtnr,oemuvlywhucepmlh.gfkotzbjnzskn gso,xhlanl.ohbjo mrwdgayzpvbb?qm prb-p,.tr-nmgk,ugn.rzxygkkszzmvvpm.m,rzpjk.uczwmywbb-p t.ubt?,xhbjskrlzbjji'
>>> breakcipher.findkey(_,7)
'lincoln'

THERE WE GO! Awesome, awesome. We took a scrambled text, knowing only the length of the key, and we pulled out the key via frequency analysis. Pretty sweet, imo, but I think we could do better. I mean, why would we know how long the key is? Let's get real here. 

Ok, ok. Well. How do we do this, then? Suppose we thought we knew the key length. We could decrypt it, then, we know that. And then we could check whether we had it right. Obviously, we could do this by just looking at it--but the point of this is to make computers do all our work, isn't it? (It is.) What's the difference between a real, decrypted text and an incorrectly decrypted string of gibberish? Well, actually, we already have the answer: letter frequency! Gibberish doesn't use e's more than q's, or vice versa. Those are properties of English language, and that's what we're trying to end up with.

So, here's our algorithm: first try a single-character key, and determine what letter frequencies that would give. If the difference of that and normal English letter frequencies (drawn from some reference text, I used Project Gutenberg's Beowulf, but obviously there are lots of possibilities for this) is too much, try a two character key. Then a three character. Etc. We do this until the key gets up to some particular proportion of the total length of the text.

There are two arbitrary parameters here: first, how far off of letter frequency is normal letter frequency? I picked a total 5% deviation, but there's not strong science behind it other than "this works." I tried 1% but found that too stringent. Another question is: how long do we try? At this point my algorithm stops when the key is one-fifth of the size of the message. That's waaaaaaay too big, because that only gives us five characters in each subtext. So far, it stops before it gets that high because it solves the problem. But obviously there's no guarantee of that, and a (much) lower stopping point would be a better idea, but I don't know what. Sounds like a good time to do some math.

Again, that's all theory. Let's test it.

'qwctn-pzzrboyqk.rxsymemntbknrwmqd?mqibjs?akj qdru hsqa ukw.bcsv-hpq, vym.vkknkvrynyn q,pkkpzvpgwbrohvpnwvmm vhhmlvqbrpqtknvsom wmvvpm.z,r?-v q,pn ul,mczwmxm.bo?rkk go rohrsdlygfkp?cmcmmcapmpvtcupqkq.bokt?mnvnnvbqybfl hhbgb vyomyvpbsm bcsn h.cct,yemqaknybmpo vzvmu?kpzvpgwbrohnprkazhqgrtpl,rfkkplvmn?ytkm.fd?rghegnl phzgck,yhnbu?rl,mdo bwmlhwpyoh,hn ul,myo?ik rbvldphpq.pm wmfsovnibgnlm.w vwz.kwsbcsn hskswqhhnunlmqq.czk p.bk,rm.tneskszzmvvzaphej?kupzrbuldphbjst ktvxs-m pnvn ul,mpo vzvmowru hykepikqbbw-mltbqupbsm bttb q.inl.oh?t?.r?hbjo mcmmuvzcwlmf?kbsqa?lioa,jbwymlhycarr?hag,-rhhegnnnyh.qckqplveo rkglbfpmni.b,zbkk,pbpp?ibgnjlk rbql.kv,vnsnwt,ynjlk,ukbkt?wcprgm prbp?nbmmosyjktvxwytki.fnorlljbfs,k.btdrtwmqbvp pemjobrkk,pbpp?ibgrkv emho?mlj,xsk,azmr?z kx,ys?m wmcromzzmfs  lkb?n upheqawqk vnzkyt,bnsk.z,r-ny,?hyq,rm?mzg.mr?hejo mcmmuoemsm gkkoa,mkckplvmpsbr?hsqarr hejo m pr.novohugapikqbbw-mqw bd-m prbztdtvt-n?n prtkkbzhognoroqpccpqkprtskbzhbjskcynvpw-uplmy??xk ukqsm pr.ncuzhsqdru hugapmsidgn ua.mho?m-wmp?myehnfel.nmq?ntbkqabalbsm btz k?abczmmmmjs?rklrfwnn mqbczm prbu?rl,mvo-xkzroot.tvtbppszzrbd-mjgmvvlbkn q.kbsmagns,yw grkqpiqbfpm ixgnt.nzrcbpqklrx? vzvmv?kbsibbqlc-mmh??mcpvevkbsmgbuldphbjskyl.bbtaywhzgo-c?mmqtkqp-,vwz.kglbcsn hegnsr?mmjwruwbmts-,w-rbcsn hbjs-rklrcrkasiynny, hucepmoqrfnt.k-nk,kljhbjo m pvunyn q,pkkcylrtnr,oemuvlywhucepmlh.gfkotzbjnzskn gso,xhlanl.ohbjo mrwdgayzpvbb?qm prb-p,.tr-nmgk,ugn.rzxygkkszzmvvpm.m,rzpjk.uczwmywbb-p t.ubt?,xhbjskrlzbjji'
>>> breakcipher.decode(_)
'lincoln'

Success! I have functions now to encrypt, decrypt, and use frequency analysis to forcibly extract a key. Given a message which I know has been encrypted a particular way, I am one short script and a few seconds of waiting away from getting the actual secret message. Not too shabby for a couple night of screwing around in Python, I'd say.

Anyone interested in my (probably sloppy and amateurish, definitely uncommented) code can find it here: 

http://dl.dropbox.com/u/32594622/simplecrypto.tar

If you actually read all this, I salute you. This has been an episode of "Zach screws around with computers", and I had a lot of fun with it. If nothing else it kept my mind occupied through a lecture or two. If you can suggest some good websites/books about crypto, I'd appreciate that, and if you have comments or questions, I'd be happy to answer them.