Friday 5 May 2017

First dabbles in R

Made some progress in the scraper using the fantastic baseballr package by @billpetti:

setwd("C:/TRAINING/R")

rm(list=ls())                                                                                         #removes all datasets, vars etc from the environment

for (y in c(2015, 2016, 2017))                                                                          #loop through the years
  {
  yearpart <- y
  for  (m in c(4,5,6,7,8,9))                                                                                    #loop through the months
  {
    monthpart <- m
    
    for (d in c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31))   #loop throught the days of the month
    {
      daypart <- d
      startpartD <- (paste(yearpart,"-",monthpart,"-",daypart,sep=""))
      endpartD <- startpartD
      #print(endpartD)
      
      scrapebb <- try(scrape_statcast_savant_pitcher_all(startpartD,endpartD),silent=T)                 #the try function is used to skip the next step if the data is null (no play data for that day)
      
      Filetext <- paste("bbsavant_",startpartD,"_",endpartD,".csv",sep="")                              #combine variables and text for the file name
      
      try(write.csv(scrapebb,file = Filetext))                                                          #write the csv. Try is there in case the scrapebb function did not return anything (and then does not exist)
      print(Filetext)
      remove(list=c("d","daypart","endpartD","Filetext","startpartD"))                                  #removes all daily vars from the environment
    }
    remove(list=c("m", "monthpart"))                                                                    #removes all monthly vars from environment
  }
}

So far the results are promising: files are created for every day that has data, and days without data are just skipped without problems. Later today I'll try to check the content to make sure everything runs as expected, and then will look into combining this into one data source.

-----

UPDATE: got a little further ahead:
for  (m in c(4,5,6,7,8,9))
{
  monthpart <- m

  for (d in c(1,2,3,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30))
  {
    daypart <- d
    startpartD <- (paste("2017-",monthpart,"-",daypart,sep=""))
    endpartD <- startpartD
#need to find a way to skip the next step if the data is null (no play data for that day)
    scrapebb <- scrape_statcast_savant_pitcher_all(startpartD,endpartD)
   
    Filetext <- paste("bbsavant-",startpartD,"--",endpartD,".csv",sep="")
    #write.csv(scrapebbs,file = Filetext)
    print(Filetext)
  }
}



After years of thinking about getting into R, I finally had some time to do so. Using the R tutorial from Bill Petti, I fairly quickly got to being able to scrape Baseball Savant data for daily data sets:

for (i in c(01,02,03,04,05,06,07,08,09,10,11,12,13,14,15,16,17,18,19,20,21,22,23,24,25,26,27,28,29,30,31)){
  startpart <- i
  endpart <- i+1
  startpartD <- (paste("2017-04-0",startpart,sep=""))
  endpartD <- (paste("2017-04-0",endpart,sep=""))
    dat <- scrape_statcast_savant_batter_all(startpartD,endpartD)
    Filetext <- paste("bbsavant-",startpartD,"--",endpartD,".csv",sep="")
    write.csv(dat,file = Filetext)
    #print(Filetext)
}


Definitely not the cleanest, and all the hard coding is in the packages used, but hey its a start!

RJ

No comments:

Post a Comment