Friday 24 March 2017

From @baseballsavant to @Tableau; a step-by-step for Batters #baseball #visualization

 

BaseballSavant is a fantastic way to access baseball data. It provides great search capabilities and ways to display the data, but for a data nerd the visualization and analyzation options are limited. This is where Tableau enters the scene. Numerous other tools and languages can achieve something similar, but if you have read other articles on this blog you will have noticed Tableau is my tool of choice.

 

So the goal is to download batter data from Baseball Savant, and display it in Tableau as shown above.

 

1.       Download some batter data from Baseball Savant. For demo purposes is limited the size of the data to only records from Khris Davis in 2016 with a Batted Ball Distance of 1ft and more (this excludes records like swing & miss, balls, etc.). I used this query but you can use anything you want, as long as you make sure to select Batter as Player Type.

2.       You could load this data straight into Tableau, but I used Excel to remove some unneeded columns first.

3.       Connect to the data source in Tableau and create a Custom SQL (if you don’t see the option in your version of Tableau, connect to data source > select your text or xls file > click little triangle next to open > click Open with Legacy Connection > click on New Custom SQL button

   

4.       Because Tableau needs a record per X/Y coordinate in a line, we need to pull in the same data set numerous times. How many times depends on how smooth you want you arcs to look, and how fast you want your report to perform. Downloading all batter data for 2016 and then cloning it 20 times leads to, well a LOT of records. In this example I have cloned 6 copies for just under 2,500 records. To identify the different clones, not the added Type field in which you need to add a unique identifier. Here is the top part of the  Custom SQL I used:

The first SELECT loads the original data set, every following UNION ALL SELECT adds one clone.

5.       Now we need to let Tableau know when to use what X and Y value, fields I call Height and Distance. My Distance Calculation looks like this:

IF [Type] = "0" THEN 0                                                                                    The first time I look at distance (when Davis hits the ball) the distance is 0

ELSEIF [Type] = "1" THEN [hit_distance_sc] *0.2                                The 2nd time I look at distance I look at 20% of the total distance

ELSEIF [Type] = "2" THEN [hit_distance_sc] * 0.4                               The 3rd time I look at distance I look at 40% of the total distance

ELSEIF [Type] = "3" THEN [hit_distance_sc] * 0.6                               Etc. I used these percentages but you can pick your own, as long as

ELSEIF [Type] = "4" THEN [hit_distance_sc] *0.75                              your last time uses 100% of the total distance.

ELSEIF [Type] = "5" THEN [hit_distance_sc] * 0.89

ELSEIF [Type] = "6" THEN [hit_distance_sc]

END

6.       Height is slightly different in the sense that we don’t have a value from BaseballSavant, but it can be calculated fairly easily with the formula TAN([hit_angle_radians])*[hit_distance_sc] which is used to calculate the Max height (not accounting for drag, gravity and such). The TAN function uses angles in radians rather than degrees which can be converted from hit_angle using ([hit_angle]*PI())/180. Now that we have the height sorted out we can assign the correct heights to the different data clones. I should note that these height percentages are debatable. I used the excellent tool on The Physics of Baseball by Alan M. Nathan to look at a nice looking trajectory and looked at distance and height percentages that lined up well. You can of course change as you see fit.

IF [Type] = "0" THEN 2                                                                                                                                    The first time looking at height is not 0 but ~2ft above ground

ELSEIF [Type] = "1" THEN (TAN([hit_angle_radians])*[hit_distance_sc])* 0.225                  I assume the max. height of the ball is at 50% of the max. height without

ELSEIF [Type] = "2" THEN (TAN([hit_angle_radians])*[hit_distance_sc]) * 0.41                   gravity, drag, etc.

ELSEIF [Type] = "3" THEN (TAN([hit_angle_radians])*[hit_distance_sc]) * 0.50

ELSEIF [Type] = "4" THEN (TAN([hit_angle_radians])*[hit_distance_sc]) * 0.44

ELSEIF [Type] = "5" THEN (TAN([hit_angle_radians])*[hit_distance_sc]) * 0.23

ELSEIF [Type] = "6" THEN 0

END

7.       I don’t use this field just yet as I don’t want the hit balls with negative hit_angle to go underground:

IIF([height]>= 0 ,[height],0)        Turns all height values below 0 into a 0.

8.       Now you can draw the lines by dragging Distance onto Columns, Height onto Rows, sv_id on details to show all batted balls, and Type on Path so tell Tableau in what order of data clones to draw the data.

9.       You can show all batted balls defined as barrels by creating a calculated field called barrels:

IF [hit_speed] >= 98 AND [hit_angle] >= 26 - ([hit_speed] - 98)

AND [hit_angle] <= 30 + ([hit_speed] -98)

THEN "Barrel"

ELSE ""

END

10.   Now you pretty it up to your liking or add a spray chart, like I will do in the next article.

 

 

 

 

 

 

 

 




This email and any files transmitted with it are confidential and intended solely for the use of the individual or entity to whom they are addressed. If you have received this email in error please notify the system manager.

This message contains confidential information and is intended only for the individual named. If you are not the named addressee you should not disseminate, distribute or copy this e-mail.

1 comment: