AI in Politics Applications, Part 1: Building a Machine Learning Model to Predict Election Outcomes
The potential for AI in political campaigns — like many industries — is immense. While AI has taken off in 2023 due to advances in quantum computing, developments in deep learning, and availability of data, AI has yet to meaningfully disrupt political campaigns.
This blog post — split into two parts — intends to show an example of how even very basic applications of artificial intelligence can intersect with political campaigns to increase understanding, efficiency, and effectiveness of movement building. The first blog post aims to provide transparency into how machine learning data science techniques might be effectively utilized, while the second post will dig deeper into how we might best leverage the results.
Last year, Campaign Brain founder Nate Levin experimented with developing a machine learning model written in Python to solve the question: can artificial intelligence help predict which candidates are most likely to win future races based on fundraising and expenditure?
Using data from the Federal Election Commission on 1,814 Senate and House races from 2016, a machine learning model was created to understand the role of fundraising and expenditure in predicting likelihood of victory for a certain political candidate.
The first step in building an AI model is to define the target intelligent action. In this case, the target intelligent action is to predict the likelihood that a candidate will win the Congressional or Senate election based on their fundraising levels. The input used by the machine to create the intelligent action is the fundraising levels for candidates. Specifically, the value proposition from this model is to provide an additional data point to predict elections beyond polls.
The second step in building an AI model is to collect data. In this case, the dataset is provided by the Federal Election Commission, and uses data on 1,814 Senate and House races from 2016. The dataset contains information on what the candidate is running for, where they are located, as well as myriad data points pertaining to candidate fundraising and expenditure. The dataset, as well, critically, contains if the candidate won or not, which was used as the dependent variable when creating the model.
The third step in building an AI model is to provide labels to the data. In this case, the dataset was already equipped with labels, so there was not additional work to do to build this model.
The fourth step in building an AI model is to extract features. Due to the redundancy of many of the variables in the dataset pertaining to fundraising, receipts, and expenditure, many features were extracted from the dataset. For instance, party contributions and individual contributions were both included in total contributions, so these features were removed. As well, in the dataset, columns existed pertaining to superfluous aspects, such as loans taken out by the candidate. The features that were kept in order to build the model are: total contributions, total amount of money received, total operating expenditure, total cash-on-hand at the end of the political fundraising period, total net contributions, total net operating expenditure, and whether the candidate won. These features were kept and utilized to build the model as they were considered to be critical determinants in success of a campaign. Moreover, they cover each of the main categories that the dataset provides insight on, including contributions, expenditure, and cash on hand.
The fifth step in building an AI model is to design the machine learning algorithm architecture. The architecture employed in this project was a logistic regression. Due to the fact that the dependent variable in this case is a binary outcome (winner or loser of a political race, indicated by 1 or 0), a linear regression is an effective tool, especially when the size of the dataset is large, such as was the case for this dataset, which maintains more than 1,800 rows. Additionally, this choice of machine learning architecture informed which features to extract, as multicollinearity was attempted to be minimized.
The sixth step in building an AI model is to train the model. To train the model, it was indicated that the desired target was the ‘winner’ column in the dataset. Then, utilizing pre-established code for a test-train split, the comprehensive dataset was divided into a subset that was used to train the model. Because 20% of the dataset was set aside for validation of the model, the remaining 80% of the dataset was utilized to train the model.
The seventh step in building an AI model is to test and validate the model. To test the model, the remaining 20% of the dataset that was withheld for testing was utilized. This is done to maintain some of the original dataset that can then be leveraged to evaluate the predictive power of the model. In this case, when the code was written to test the model, it specifically called for utilization of the subset of data not incorporated into the training of the model. The resulting output for the logistic regression score was .7438, meaning that the model has 74.38% accuracy in predicting victory based on the input variables. While not an incredibly high score, it may still have some value in helping to predict the desired outcome.
Overall, the accuracy result of 74.38% means that the model is significantly better than a coin-flip in guessing if a candidate will win, but is far from certainty. In the long run, utilizing the model to make predictions would have the model-user correct more often than they are incorrect, but it would evidently be unwise to believe that the model will always be correct.
In totality, this model to predict likelihood of victory in United States Senate and Congressional races has real value for candidates and campaigns, individual supporters, and allied groups and political parties that provide resources. By utilizing fundraising levels — which are consistently reported in line with FEC standards — campaigns can better understand their position in the race, and groups allocating resources have a clearer picture for which to begin their game-theory analysis that informs who they will distribute resources towards. Currently, much of this analysis is executed from a base of district analysis and polling, and this model represents a cheaper, replicable, way to understand relative position in the race. Moreover, as political fundraising continues to increase in terms of total expenditure each political cycle, this model is likely to become stronger over time.
At Campaign Brain, we are working to democratize artificial intelligence for progressive political campaigns. To learn more, visit us at campaignbrain.ai.