Category Archives: Programming

Simulating Head-to-Head Fantasy Baseball

Pat Sheridan approves of my fantasy baseball programming posts.

In this post, I will, yet again, not only bring together two nerdy activities, fantasy baseball and programming, but I will also write about the amalgamation of said activities in moderate detail, which is enough detail to make the vast majority of you bored and somewhat uncomfortable. Pumped? Me too, my friends, me too.

Last week, I talked about my experimenting with creating a z-score spreadsheet for fantasy baseball projections. The other project that I’ve been working on is, or at least will be, a little more theoretical. Well, I’m not sure theoretical is the right word for it. Let me explain.

The question: What makes a fantasy baseball team win? More specifically, in head-to-head leagues, do certain types of teams, or teams with certain types of players, perform better than others, even if their overall stats are similar?

The end goal: Simulate fantasy baseball seasons using various types of team builds.

The process: Ha. As if I actually know how to go about doing this. But I’ll try, because I have a week until I start my new job and nothing to do. Instead of telling you what I’ll do, I’ll show you what I’ve done so far, then my plan – if I have one – for the future of the program.

Here are my two functions so far:

def createTeams(numOfTeams,numOfCategories):
    teams = {}
    records = {}
    for i in range(numOfTeams):
        teams[i+1] = []
        for j in range(numOfCategories):
            teams[i+1].append(random.random())
        records[i+1] = [0,0]
    return teams, recordsdef simulateWeek(teamDict,records,numOfTeams,numOfCategories):
    teamsYetToPlay = []
    team1 = 0
    team2 = 0
    for i in range(numOfTeams):
        #List of teams that have yet to play this week.
        teamsYetToPlay.append(i+1)
    while len(teamsYetToPlay)>1:  
        while team1 not in teamsYetToPlay:
            #Creates random integers until one it finds a team that hasn't played yet.
            team1 = random.randint(1,numOfTeams)
        teamsYetToPlay.remove(team1)
        while team2 not in teamsYetToPlay:
            team2 = random.randint(1,numOfTeams)
        teamsYetToPlay.remove(team2)
        for i in range(numOfCategories):
            A = teamDict[team1][i]
            B = teamDict[team2][i]
            log5 = (A-A*B)/(A+B-(2*A*B))
            z = numpy.random.geometric(log5, size=1)
            if z==1:
                records[team1][0] += 1
                records[team2][1] += 1
            else:
                records[team1][1] += 1
                records[team2][0] += 1
    return records

Yay giant blocks of code! But what does this mean?

Well it’s pretty simple. “createTeams” creates two dictionaries: the first pairs the number of the team with a list of values that represent the probability of that team winning a particular category against an average opponent in one week. The second dictionary pairs the team number with their win-loss record. (As I write this, I’m realizing that there is no point in these being dictionaries if the keys are just numbers, since lists are ordered anyway.)

The second function, “simulateWeek”, is my attempt to – you guessed it! – simulate a week in a fantasy baseball head-to-head league. Basically, I iterate through teamDict and create matchups so that each team plays once (probably only works with an even number of teams).

Determining the probability of one team beating another in a category given two probabilities was tough, but after some research I came across Bill James’ “log5” formula, which calculates a team’s chance of beating another team given their respective winning percentages. I used this formula to calculate the chance of one team beating the other in a category, then used numpy’s geometric distribution sample function¬†to generate one trial based on the log5 probability. If the trial “succeeds”, team1 wins; otherwise, team2 wins, and their records are consequently updated.

So that’s what my program does so far. The good news is that it works! In my main function I called the simulateWeek function 20 times to simulate a season, with 12 teams in the league and 10 categories. It returned the probabilities of each team winning the categories as well as their final records.

The bad news is that it’s pretty much useless at this point. I’ve created a nice little simulation of a season using random probabilities, but remember that my goal is to simulate with teams of differing structures and types of players. For example, would a team with a bunch of good relievers and only a few good starters be better than a team with a ton of middle-of-the-pack starters and only a few mediocre closers? What about a team that punts batting average in exchange for more home runs and RBIs?

To do that kind of simulation, I need to do more than provide random probabilities for the categories. I probably need to use some sort of projection system so that for every week simulation, the program creates totals for the categories based on the projections and random variation. This seems doable at first, but like all programmable ideas, I’m sure it will take a lot more time and effort than one would initially think.

I’ll get working on this and sometime in the future, write up another post updated you all on my progress. Until then, let me know if you have any suggestions, comments, questions, etc. I’m writing up these posts partly because I think writing my thoughts down will help me think about these projects, but also because a lot of you are much better at this than I am, and I want help. So help me.

Advertisements
Tagged , , , , ,

Z-Scores in Python, Part 1

Ready to get your nerd on?

I’ve decided to start playing around with programming. I took a couple computer science classes in college, and I’ve always wanted to get better at it, plus I’m about to start work at a software company, so I guess it’s a good thing to be good at.¬†Obviously, I’m building baseball-related programs, and ideally, I’ll get good enough at this that I can start doing some real research, or at least build some handy tools for fantasy baseball.

Full disclosure before I start: I’m not very good at this, so I’m going to basically be walking through some of the things I’m doing and what I’m struggling with. If you know nothing about programming, then this will probably be very uninteresting for you. If you are an expert software developer, then I encourage you to laugh at how badly I’m screwing up the simplest code. If you’re a nice expert software developer, then after you are done laughing, I would appreciate any advice you might have about my particular code, or any other baseball-related projects that you think would be good practice for me.

Ok, so for the past few days, I’ve been fiddling with two different projects. The first is fantasy baseball related. Usually when I’m making my preseason rankings, I download an Excel projections spreadsheet, then I make a bunch of columns with z-Scores for the categories I want, total them, and adjust for position and the like. It’s a painfully long and arduous process, so I want to create a program that does it for me.

There are a lot of ways I could do this, and I probably chose the worst one. I decided to use mySQL, and after hours and hours of figuring out how to install the mySQL for Python module thing on my computer, I finally got it working and started coding. First I had to write a script to move the data (ZiPS hitting projections for 2012) from the .csv file to an SQL table, with columns for every stat in the projection sheet. Easy enough.

Then came the fun part. My end goal: have a .csv file with all the projections for the categories of my choice, plus z-scores for each category, total of the z-scores, somehow normalized and adjusted for position. So how the hell would I do that?

Here’s what I did. I’m realizing now that I went about this project in probably the entirely wrong way, so right now I’m at the point where I can either keep moving forward or start over using a different method.

  1. Got data from the SQL table based on inputted stat categories, and created a dictionary with the keys being player names and the values being lists containing their projections in the order that the stats were inputted.
  2. Iterated through the dictionary to create a “statPopulations” list of lists, each list containing every value for a certain category.
  3. Iterated through the dictionary again, this time creating z-scores (using statlib) for each category for each player and appending that to the dictionary entry. For batting average, since it is a rate stat, added another entry that was the z-score times number of AB (called zBATimesAB)
  4. Iterated through the dictionary a third time (you see why I think this was a bad way to do it now) to find z-scores for zBATimesAB and added that to each dictionary entry.
  5. Used csv.writer to create a header row with stat names and a row for each player, containing each of the original projections plus z-scores for each one and the extra two calculations for batting average.

After a lot of fiddling and error handling (banging my head on the computer), it somehow worked! I created a .csv file, which I opened in Excel, that looked just how I wanted it to look. Just a few huge problems though.

First of all, it is a horribly inefficient program, and takes about 20 seconds to run.

Secondly, I don’t have the z-score totals there, and I really don’t want to have to iterate through the dictionary again to make it. This makes me think that building a dictionary wasn’t the best course of action.

I also still need to adjust for position, and figure out what my population is. I have about 1000 players so far, but obviously not all of them are getting drafted. I need to figure out who is getting drafted, and then calculate z-scores based on that. But that’s going to depend on position and the z-scores themselves! Yeah, it’s a pain.

So that’s where I’m at with this. I’m happy that I’ve managed to actually write a program that works to some extent, but I have a feeling that if I really want it to be useful, I’m going to have to rewrite it completely. Using mySQL might be completely pointless, and building dictionaries instead of lists might be wrong too. I have a feeling that I either need to use SQL the whole time, adding z-scores to the SQL table instead of making dictionaries, or scrap SQL altogether and figure out a better way to transfer the original data to Python form.

Wow, I can’t believe I just wrote all that. Well, if you got through it all and have any suggestions, questions, or comments for me, let me know. Sometime later, I’ll talk about the other project I’ve been messing with: simulations (also related to fantasy baseball – I have no life)!

Tagged , , ,