Data modeling: Does hitting or pitching help a Major League Baseball team win a World Series?

Joey Stipek
6 min readOct 25, 2022

Hitting or pitching? What is the most important factor in assembling a championship Major League Baseball team? There are a variety of variables coaches, general managers and scouts assess when assembling a winning team. In baseball there are three areas where the game is played: defense, offense and pitching. The author Michael Lewois published a book titled Moneyball about how Oakland A’s try to leverage data to their advantage with limited resources in comparison to other baseball teams like the Boston Red Sox or New York Yankees.

In attempting to define whether offense is a determining factor over pitching to determine a teams overall success on the field utilizing data from a 10 year period from the 2006 to 2015 seasons. Historical data tends to favor the use of current data for analysis and trends.

The hitting data analyzed will feature batting average, home runs, and runs batted in (RBI).

The pitching data analyzed will feature the following attributes: Earned run average (ERA), strikeouts and win-loss records.

The data compiled features the ERA of teams which went on to win the World Series during the 2006 to 2015 seasons. There are 40 values in the dataset featuring 20 teams and their batting average during the seasons they went to play in the World Series. The correlation here is does a team playing in the World Series have a team ERA of sixth for pitching with a ranking of eighth for team batting average for offense?

Explanation

The data has been compiled by Sean Lehman. He is an investigative journalist for the Rochester Democrat and Chronicle and works on the investigative team at USA Today. He has compiled historical data from 1871 to 2020 as well as managerial records, post-season data, statistics, standings, and team stats. Lehman has compiled and written a variety of books based on statistical performance data for baseball and professional football.

Complimenting Lehman’s data will be articles from ESPN and Major League baseball tracking free agent signings to an organization. Free agents are players who can choose which team they want to play for based on personal preference whether through contract value or money after the player has fulfilled the the tenure of their rookie contracts to an organization. A bad contract can cripple or hinder an organization’s growth and success depending on the length and value of the contract. Any other notable changes will be noted or discussed in the context of the paper.

Determining the player’s success will be based on an average 3.00 ERA or lower.

Determining the ERA is calculated is by measuring the total amount of earned runs and innings pitched then multiplied by nine.

Determining the success of a player’s hitting prowess is by having a batting average of .300 or above and 90 RBIs. Over the course of a baseball season, there are 162 games played by all teams.

There are potential outliers within the data. The potential outliers include: Individual performances might not be reflected in a teams overall win-loss record. Even though the analysis is based upon 10 years worth of data, these outliers are worth noting to help determine whether what is the makeup of a successful team.

Decision Tree

The decision tree will utilize top-down decision modeling. The reason for top-down decision modeling is for salaries based on hitting, pitching, and to see if player salaries were at financial loss for the organization.

The salary will start at $5.5 million to see which has more value: a low cost hit or game won. The likelihood of a hitter having a season where he has a lower than .300 batting average than the pitcher having a season where his ERA is over 3.00. Factoring in the cost is double for signing a player whose skill is primarily hitting in contrast to a player whose skill is a starting pitcher.

After analyzing the starting hitting and pitching of players who were in the World Series, teams averaging in the top three for hitting had a 5% chance of making the World Series in comparison to organizations with a team pitching ERA of the bottom three in the league.

For teams who won the World Series, half won for hitting and half won for pitching. If I were operating as a General Manager of a Major League Baseball franchise, there isn’t much of a difference whether I acquired starting hitting or pitching in free agency. Ultimately, it is up to the discretion of the club whether the organization chooses to allocate its resources to hitting or pitching.

Evaluation of the Results

Operating as a General Manager of an MLB franchise, an analysis is performed whether to acquire starting hitting or pitching before the beginning of free agency. The General Manager evaluates the statistical performance of the player at the end of the season while measuring against data to determine the value of the position.

When evaluating players, the general manager is looking for runs scored for hitters and games won for pitchers. A successful player whose primary skill is hitting is considered successful by batting in 90 runs a season and a successful player whose primary skill is pitching is considered successful by winning 15 games per season. The General manager associates a cost variable to the players when determining whether it makes more sense to add hitting or pitching.

When further evaluating the cost of starting hitting and pitching from 2006–2015, the cost for a free agent player whose primary skill is hitting is $20,032,000 and for a player whose primary skill is pitching is $14,660,000. The cost for hitting averages out to 222,587 per win and the cost of pitching averages out to 977,344 per win for the threshold for success. The reason for the threshold for hitting being low is there is a low increase in the salary for hitters due to the number of games played in contrast to the number of games a pitcher plays.

From an organizational viewpoint, it’s up to the General Manager whether he or she values runs batted in or wins from starting pitching. Statistically there is a $5 million variation between hitters and pitchers affecting the bottom for success of the decision tree. There is a better cost value for starting hitting based on the cost of RBIs per team win. Teams more often than not choose to sign players whose primary skill is pitching in spite of the data skewing favorably toward hitting.

What are the limitations of the data?

There are limitations to every data set including bias. When analyzing this data, there were a few areas which were not considered. The first was the age of the players. Age was omitted during this analysis. Younger free agents were more coveted than older free agents by organizations wishing to acquire the services of a player for a period of time (or length of a contract).

Often, General Managers pay players based upon past performance. There’s the potential of the analysis performed here skewing the evaluation model due to the omission of player ages. Ensuring the proper models are being used in this instance is essential such as hitting and pitching statistics.

In this instance, there is a confounding bias of the data. Confounding bias occurs when “A systematic distortion in the measure of association between exposure and the health outcome caused by mixing the effect of the exposure of primary interest with extraneous risk factors”

When attempting to lure free agents, ultimately it is up to the General Manager of a Major League Baseball organization to determine whether the club values hitting or pitching more. When analyzing the data, the decision tree shows value in pitching in spite of the offensive impact hitting can provide an organization.

Pitching provides a greater role in the game of baseball in regards to production than hitting in terms of obtaining outs and duration played. Furthering the process, contracts in Major League Baseball are fully guaranteed so an erroneous free agent signing can break or make a team’s season.

--

--

Joey Stipek

Joey Stipek’s data research + writing has been featured at newsrooms including The Oklahoman, Colorado Springs Gazette, New York Times, The Frontier, and KOSU.