Ready to get your nerd on?
I’ve decided to start playing around with programming. I took a couple computer science classes in college, and I’ve always wanted to get better at it, plus I’m about to start work at a software company, so I guess it’s a good thing to be good at. Obviously, I’m building baseball-related programs, and ideally, I’ll get good enough at this that I can start doing some real research, or at least build some handy tools for fantasy baseball.
Full disclosure before I start: I’m not very good at this, so I’m going to basically be walking through some of the things I’m doing and what I’m struggling with. If you know nothing about programming, then this will probably be very uninteresting for you. If you are an expert software developer, then I encourage you to laugh at how badly I’m screwing up the simplest code. If you’re a nice expert software developer, then after you are done laughing, I would appreciate any advice you might have about my particular code, or any other baseball-related projects that you think would be good practice for me.
Ok, so for the past few days, I’ve been fiddling with two different projects. The first is fantasy baseball related. Usually when I’m making my preseason rankings, I download an Excel projections spreadsheet, then I make a bunch of columns with z-Scores for the categories I want, total them, and adjust for position and the like. It’s a painfully long and arduous process, so I want to create a program that does it for me.
There are a lot of ways I could do this, and I probably chose the worst one. I decided to use mySQL, and after hours and hours of figuring out how to install the mySQL for Python module thing on my computer, I finally got it working and started coding. First I had to write a script to move the data (ZiPS hitting projections for 2012) from the .csv file to an SQL table, with columns for every stat in the projection sheet. Easy enough.
Then came the fun part. My end goal: have a .csv file with all the projections for the categories of my choice, plus z-scores for each category, total of the z-scores, somehow normalized and adjusted for position. So how the hell would I do that?
Here’s what I did. I’m realizing now that I went about this project in probably the entirely wrong way, so right now I’m at the point where I can either keep moving forward or start over using a different method.
- Got data from the SQL table based on inputted stat categories, and created a dictionary with the keys being player names and the values being lists containing their projections in the order that the stats were inputted.
- Iterated through the dictionary to create a “statPopulations” list of lists, each list containing every value for a certain category.
- Iterated through the dictionary again, this time creating z-scores (using statlib) for each category for each player and appending that to the dictionary entry. For batting average, since it is a rate stat, added another entry that was the z-score times number of AB (called zBATimesAB)
- Iterated through the dictionary a third time (you see why I think this was a bad way to do it now) to find z-scores for zBATimesAB and added that to each dictionary entry.
- Used csv.writer to create a header row with stat names and a row for each player, containing each of the original projections plus z-scores for each one and the extra two calculations for batting average.
After a lot of fiddling and error handling (banging my head on the computer), it somehow worked! I created a .csv file, which I opened in Excel, that looked just how I wanted it to look. Just a few huge problems though.
First of all, it is a horribly inefficient program, and takes about 20 seconds to run.
Secondly, I don’t have the z-score totals there, and I really don’t want to have to iterate through the dictionary again to make it. This makes me think that building a dictionary wasn’t the best course of action.
I also still need to adjust for position, and figure out what my population is. I have about 1000 players so far, but obviously not all of them are getting drafted. I need to figure out who is getting drafted, and then calculate z-scores based on that. But that’s going to depend on position and the z-scores themselves! Yeah, it’s a pain.
So that’s where I’m at with this. I’m happy that I’ve managed to actually write a program that works to some extent, but I have a feeling that if I really want it to be useful, I’m going to have to rewrite it completely. Using mySQL might be completely pointless, and building dictionaries instead of lists might be wrong too. I have a feeling that I either need to use SQL the whole time, adding z-scores to the SQL table instead of making dictionaries, or scrap SQL altogether and figure out a better way to transfer the original data to Python form.
Wow, I can’t believe I just wrote all that. Well, if you got through it all and have any suggestions, questions, or comments for me, let me know. Sometime later, I’ll talk about the other project I’ve been messing with: simulations (also related to fantasy baseball – I have no life)!