Pat Sheridan approves of my fantasy baseball programming posts.

In this post, I will, yet again, not only bring together two nerdy activities, fantasy baseball and programming, but I will also write about the amalgamation of said activities in moderate detail, which is enough detail to make the vast majority of you bored and somewhat uncomfortable. Pumped? Me too, my friends, me too.

Last week, I talked about my experimenting with creating a z-score spreadsheet for fantasy baseball projections. The other project that I’ve been working on is, or at least will be, a little more theoretical. Well, I’m not sure theoretical is the right word for it. Let me explain.

The question: What makes a fantasy baseball team win? More specifically, in head-to-head leagues, do certain types of teams, or teams with certain types of players, perform better than others, even if their overall stats are similar?

The end goal: Simulate fantasy baseball seasons using various types of team builds.

The process: Ha. As if I actually know how to go about doing this. But I’ll try, because I have a week until I start my new job and nothing to do. Instead of telling you what I’ll do, I’ll show you what I’ve done so far, then my plan – if I have one – for the future of the program.

Here are my two functions so far:

def createTeams(numOfTeams,numOfCategories):
teams = {}
records = {}
for i in range(numOfTeams):
teams[i+1] = []
for j in range(numOfCategories):
teams[i+1].append(random.random())
records[i+1] = [0,0]
return teams, recordsdef simulateWeek(teamDict,records,numOfTeams,numOfCategories):
teamsYetToPlay = []
team1 = 0
team2 = 0
for i in range(numOfTeams):
#List of teams that have yet to play this week.
teamsYetToPlay.append(i+1)
while len(teamsYetToPlay)>1:
while team1 not in teamsYetToPlay:
#Creates random integers until one it finds a team that hasn't played yet.
team1 = random.randint(1,numOfTeams)
teamsYetToPlay.remove(team1)
while team2 not in teamsYetToPlay:
team2 = random.randint(1,numOfTeams)
teamsYetToPlay.remove(team2)
for i in range(numOfCategories):
A = teamDict[team1][i]
B = teamDict[team2][i]
log5 = (A-A*B)/(A+B-(2*A*B))
z = numpy.random.geometric(log5, size=1)
if z==1:
records[team1][0] += 1
records[team2][1] += 1
else:
records[team1][1] += 1
records[team2][0] += 1
return records

Yay giant blocks of code! But what does this mean?

Well it’s pretty simple. “createTeams” creates two dictionaries: the first pairs the number of the team with a list of values that represent the probability of that team winning a particular category against an average opponent in one week. The second dictionary pairs the team number with their win-loss record. (As I write this, I’m realizing that there is no point in these being dictionaries if the keys are just numbers, since lists are ordered anyway.)

The second function, “simulateWeek”, is my attempt to – you guessed it! – simulate a week in a fantasy baseball head-to-head league. Basically, I iterate through teamDict and create matchups so that each team plays once (probably only works with an even number of teams).

Determining the probability of one team beating another in a category given two probabilities was tough, but after some research I came across Bill James’ “log5” formula, which calculates a team’s chance of beating another team given their respective winning percentages. I used this formula to calculate the chance of one team beating the other in a category, then used numpy’s geometric distribution sample function to generate one trial based on the log5 probability. If the trial “succeeds”, team1 wins; otherwise, team2 wins, and their records are consequently updated.

So that’s what my program does so far. The good news is that it works! In my main function I called the simulateWeek function 20 times to simulate a season, with 12 teams in the league and 10 categories. It returned the probabilities of each team winning the categories as well as their final records.

The bad news is that it’s pretty much useless at this point. I’ve created a nice little simulation of a season using random probabilities, but remember that my goal is to simulate with teams of differing structures and types of players. For example, would a team with a bunch of good relievers and only a few good starters be better than a team with a ton of middle-of-the-pack starters and only a few mediocre closers? What about a team that punts batting average in exchange for more home runs and RBIs?

To do that kind of simulation, I need to do more than provide random probabilities for the categories. I probably need to use some sort of projection system so that for every week simulation, the program creates totals for the categories based on the projections and random variation. This seems doable at first, but like all programmable ideas, I’m sure it will take a lot more time and effort than one would initially think.

I’ll get working on this and sometime in the future, write up another post updated you all on my progress. Until then, let me know if you have any suggestions, comments, questions, etc. I’m writing up these posts partly because I think writing my thoughts down will help me think about these projects, but also because a lot of you are much better at this than I am, and I want help. So help me.