Friday, 7 April 2017

From #baseballsavant to #Tableau; Part III of a step-by-step for Batters #baseball #visualization #pitchFx

In step 1 of this series I explained how to visualize baseball savant data from a side perspective, where part II did the same from top-down, or bird's eye view.

The typical way to show lines and curves in Tableau is to duplicate every record multiple times and calculate the X and Y for every copy of a record. I never liked this approach but it was the best option I knew of to do what I wanted.

Now I found a new and in my view better approach to do the same thing. Rather than duplicating records in the batted ball files, I just link it up to a file that has "slices" of a batted ball arc. Let me explain.

As mentioned before, Alan M. Nathan's must-read blog The Physics of Baseball offers tools to calculate the path, or arc of a batted baseball, based on numerous factors like ball speed, hitting angle, gravity, air pressure, etc. For my purposes this is a bit too detailed but I was able to use his calculator non-the-less. When entering averages for angle and speed the tool generates a list of basically X/Y's for the trajectory, with X measuring distance, and Y the height above ground.

I used this list (let's call it the average-baseball-trajectory-list, or abt-list) to calculate a list that shows at what percent of distance a certain percentage of the the max height is measured. For example, based on the calculated list I know that - in general terms - at 58.8% of my batted ball distance, the height of the ball above ground is 100% of it's maximum height. 50% height is around 20% distance, as well as around 88% distance.

Now that I have this table (with over 500 some records) I ordered it by PercentDistance, added a T(ime) field with just 0, 1, 2, etc. and a link field with a consistent value in the table. Then I loaded a baseball savant data export into a spreadsheet, with the calculated list from above in a second sheet.
The baseball savant data needs another column also called link with the same constant value (you can do this in the spreadsheet or as a calculated field in Tableau).

Loading both sheets (in the version I use to join data it needs to be in the same spreadheet; no longer the case in Tableau 10.x) in Tableau and combined through a LEFT JOIN based on the link fields, I now have the same result that in every baseball savant record in duplicated as many times as there are records in the calculated version of the abt-list (remember that one?).

The nice thing here is that I control in my calculated version of the abt-list how many duplicates I want to create, perhaps adjust my arc slightly, and I don't have to worry about Custom SQL anymore. I also don't have to hard code calculated field for every duplicate record, as X and Y are calculated as follows:

X (or batted distance)
([Perc Distance]*[Hit Distance Sc])/100

Y (or height of ball above field)
IF [T] = 0 THEN 2
ELSE
(IIF([Hit Angle]<0,0,[Perc Height]*(TAN([hit_angle_radians])*[Hit Distance Sc])* 0.5))/100

The if statement is strictly to avoid balls going underground.

The big difference is that by managing the number of records in the calc. abt-list I control the smoothness of the line:


Now I wanted to keep the processing / functioning in Tableau workable, so I looked at how many records from the calc-abt-list I could remove to reduce file size but maintain the smooth lines, and found that about 90% of the over 500 records can be removed.



Now that I finally figured out a way to draw arcs without cloning the data source for every XY on the line, I'm moving on to combining the side and top-down view in a dashboard, with some back ground imagery. But that's for the next post.

RJ

PS when if have figured this all out a bit more I'll pu the cal-abt-list on dropbox or Google drive for people to use.




No comments:

Post a Comment