
BaseballInfoSolutionsData(fromFangraphs):PercentageofPitchThrownand
AverageVelocityfortheFollowingPitches:Fastball,Cutter,Slider,andCurveball
TommyJohnDatabase:Adummyvariablesignalingwhetherornotapitcherhadhad
TommyJohnSurgerybefore
Wedidenduptrimmingsomeofthedatathatwedidnotfindrepresentativeofadecent
MLBpitcher.Forone,wedidnotincludethreepitcherseasonswherethepitchermadean
appearancedidnotrecordanout,whichmesseswithFIPandmakestheseasonsunusable.
Also,ifapitcherdoesn’trecordanouttheentireyear,he’smorelikelythannotamediocre
talentthatshouldnotinfluencethemodel.
Inaddition,wedidnotincludepitcherswhorecordedlessthan10inningsinaseason.
Werealizethatthismaybeabitproblematicinthatsomepitchersmayhavegotteninjuredless
than10inningsintotheseasonanddidnotplayforthatreasonratherthannotplayduetolack
oftalent—sevenpercentofthesepitchersendedupgettinginjured.Thatbeingsaid,theoverall
injuryrateforpitchershoveredaround23%,somostofthepitcherswhopitchedfewerthan10
inningsweresimplyunderutilizedratherthaninjured.However,therewasverylittleBaseball
InfoSolutionsdataforpitcherswithfewappearances,soitwouldhavethrownofftheactual
impactofpitchspeedandselection.Attheend,wewereleftwith2749pitcherseasons.
VariableSelection:
Amongthevariablesweelectedtoeliminatefromtheregressionmodelwereinnings
pitched,whichshowedsignsofmulticollinearitywithbattersfaced(correlationgreaterthan.99).
Wedidkeepbattersfacedinthemodelthough.Wealsodidnotincludegamesstarted,asit
showedstrongmulticollinearitywithbattersfaced(0.941).
Anothervariablewedidawaywithwashitbypitches,asHBPandwildpitcheshadalso
shownsignsofmulticollinearity(.43).Wekeptwildpitches,butwedidtransformit(seebelow).
Hitsandrunsshowedcollinearity(.97),sowedecidedtokeephits,becausewehypothesize
thoseweremorestableacrossyearsthanwereruns,whichisbasedmoreontheclusteringof
hits.
Wealsonoticedcollinearitybetweenthreevariables—strikeouts,hits,andwalks—and
battersfaced(.93,.98,.89),whichmakessense,sincethemoreyouplay,themorehitsand
walksyouletupandstrikeoutsyoucanregister,regardlessofskilllevel.Thus,wetransformed
thethreecountingstatistics(SO,H,BB)intorates,comparingthemeachtobattersfaced
(SO/BF,H/BF,BB/BF).Asaresult,wewereabletoeliminatethestrongcorrelationbetweenthe
aforementionedvariablesandbattersfacedwiththesenewstatistics.Wealsotranslatedwild
pitchestoarateparameterforsimilarreasons,eventhoughthecollinearitybetweenWPand
BFwasnotstrong.
WealsonoticedarelationshipbetweenFIPandtherateparametersofSO,H,andBB,
sowedeletedFIPfromthemodel(0.5897501,0.4537724,0.321768).Thiscorrelationmakes
sense,becauseFIPisastatisticcreatedstrictlywithSO,BB,andHR.