For 2024, 538 built a new presidential election forecast model from the ground up. This article explains how we did it: the ingredients that go into the forecast, how we process them and how we ultimately end up with a set of several thousand potential outcomes for the election.

## 1. Calculate polling averages

Aside from two major factors (and a few minor ones), the way we calculate polling averages for our forecasting model is identical to the methodology for the presidential general election polling averages we publish on the 538 website.

The first major difference is that, when calculating averages for our forecasting model, **we also apply a convention adjustment to the averages**. We have found that parties tend to enjoy a reliable temporary bounce in the polls immediately following their nominating conventions in the summer. For example, former Secretary of State Hillary Clinton posted her best numbers of the 2016 campaign in the two weeks after the Democratic National Convention that year — and the only time former President Donald Trump led in 538’s “polls-only” forecast that year was just after the Republican National Convention in late July.

Our convention adjustment works by subtracting a small amount from each candidate’s polling average once their party’s convention begins. This penalty quickly increases over the course of the convention, peaks the day after the convention concludes, and then slowly fades to zero over the next 20 days according to a geometric decay function. Unlike in past years, we do not assume conventions generate a polling bounce of a certain size. Rather, our model fits this convention adjustment for each party according to actual movement in the polling data. If no bounce is observed in the data, we won’t adjust a candidate’s numbers down; that would unduly punish them.

The second major difference between our published polling averages and the ones we calculate for our forecasting model is that **we allow movement to be correlated between states** in addition to between a state and the nation as a whole. In our published averages, if a candidate gains nationally, they gain a corresponding amount in every state — but if a candidate gains in a specific state, they don’t automatically also gain in other states as well.* This is an intentional choice; our polling averages are meant to represent the state of the polls on any given day, and we want people to be able to use them to gauge the reliability of the polls in each state separately after the election is over.

But our forecast is different; its goal is a little more predictive, and our research has found that inferring across-state trends increases predictive accuracy. So our forecast both uses national polls to steer state polling averages and lets polls in one state influence the average in similar states. For instance, if President Joe Biden improves his standing in Nevada, our forecast will also expect him to be polling better in states such as Arizona and New Mexico, which have similar demographics and are part of the same political region.

To measure similarity between states, we look at three factors. The first is how similarly they have voted in presidential elections since 1948. Specifically, we look at the correlation in the two-party vote share across those 19 presidential elections. States where the Democratic or Republican vote share has tended to rise and fall in tandem — such as New York and Connecticut — receive higher similarity scores.

The second factor we take into account is demographic similarity between states. To quantify this, we use the U.S. Census Bureau’s 2018-2022 American Community Survey to collect the share of each state’s voting-eligible population that is Black, Hispanic, white and white without a college degree, as well as the median age of the VEP and the VEP’s median income.** We also factor in the share of the state’s adult population that is white and self-declared evangelical Christian from the Public Religion Research Institute’s American Values Atlas, and we calculate a measure of the urban density of the state using tract-level population from the Census.

Finally, we account for geographic similarity — i.e., whether two states belong to the same 538-defined political region. To define the regions, we begin by conducting fuzzy c-means clustering based on how similarly the states have voted historically. We then manually move states into new clusters when we feel we have a strong qualitative reason to do so.*** That’s because geographic factors provide some residual information that electoral history does not. These residual factors could include things such as the concentration of manufacturing jobs in the Midwest, natural disasters such as Superstorm Sandy affecting the Mid-Atlantic or really anything else that would not be evident in states’ political or demographic similarities.

We then fitted a regression to predict the overall correlation between each state using these three measures of between-state similarity. The resulting correlation matrix is used to project movement in the polls in one state onto similar states. It is also used in a separate regression to make sure potential polling bias is correlated across states — a process detailed in Step 3.

This approach can be thought of as predicting polling averages using regression models with various political and demographic variables. However, it is considerably more robust to overfitting than most common predictive models and means we don’t put ourselves in a situation where we have to assign ad hoc weights to different model components that are inflexible when responding to new data. With this approach, the forecasting model decides how much polls in one state seem to align with polls in other states based on these three measures. If there is little match, we generate a low correlation, and vice versa if things are looking correlated.

There are a few additional minor differences between the polling averages published on our website and the ones our forecast uses. First, our forecast caps a poll’s sample size at 1,500 respondents as a way to decrease the weight of surveys with huge samples, which, in our modern era of increased nonresponse and other poll biases, don’t make polls *that* much more accurate than polls with fewer interviews. Second, the forecast uses 538’s pollster ratings to scale the effective sample size of each poll — whereas the published polling averages use the ratings to weight polls directly in our regression model. These techniques generally yield the same results, but we use the sample-size-scaling approach here because the forecast model sees polls as binomial distributions of raw respondents instead of vote shares.

## 2. Make ‘fundamentals’ predictions

General election polls are not the only information we have about what may happen in November. We also consult a set of economic and political indicators that political scientists and forecasters have collectively dubbed the “fundamentals.” The most familiar of these indicators is yearly real growth in the country’s gross domestic product, which measures the value of all the goods and services produced in America per year. When GDP growth is strong, income usually rises and unemployment falls — and presidents tend to be reelected.

However, we don’t want to build a predictive model based on just one variable. For one thing, GDP updates too slowly for our purposes; for another, what if something happens that causes the metric to become an unreliable predictor? So instead of GDP, our forecast considers a variety of fundamental indicators, which we separate into two buckets.

The first bucket is exclusively related to **economic conditions**. We use 11 indicators that have historically correlated with election outcomes:

One quick comment on this mix of indicators: Usually when political scientists talk about “fundamental” economic indicators they are referring only to objective metrics of the economy. But recently, subjective evaluations of growth, such as the Index of Consumer Sentiment, have differed significantly from actual growth. We don’t know which type will be a stronger predictor of voters’ choices this year, so we are including both for a more robust prediction.

We collect data for each of these indicators as far back as they are available. (Our longest-running series is stock market data, which goes back to January 1938.) Our model transforms each of the national economic variables into a daily measure of change in the indicator over the last year, then standardizes them so all variables have a mean of 0 and a standard deviation of 1. For variables that are not available from January 1940 onward, we impute the standardized growth values on a daily basis by averaging predictions from many multivariate regression models using the values for all other indicators on that day as the predictors. We do not impute any values between the day we are running the model and the last release date for any indicator, instead carrying the last value forward to the present day. For our state-level personal income variable, we just calculate the yearly growth in state personal incomes over the prior year. (So, for 2024, we look at the change in personal income from Q4 2022 to Q4 2023.)

But this raises another question: Are the rates of change of these indicators more predictive than their actual values? If so, over what time period? Political science research suggests that voters are most reactive to recent growth, but they don’t completely forget about change in the not-too-distant past, either. This is an important duality this year: America’s economy has appeared to recover sharply in terms of annual growth, with inflation receding dramatically on a year-over-year basis — but this ignores how much prices rose in 2022 and 2023.

We strike a balance between short- and medium-term economic growth empirically by calculating an aggregate economic index that averages the daily annual growth rates for each of our fundamental economic indicators together over the last two years of a president’s term. On any given day, the annual growth in an indicator over the previous 365 days receives two weights. The first gives a larger weight to variables that have more strongly predicted elections historically. We trim that weight so that no variable receives too little or too much weight in the overall average — the highest-weighted variables get 15 percent of the weight in the overall index; the lowest-weighted, 5 percent. The second weight is based on how far away each day is from the election according to an exponential decay function. The decay rate we selected maximizes the predictiveness of the aggregate economic index for presidential elections since 1952 — equal to giving annual growth on Election Day a weight of 100 percent and a year before Election Day a weight of 33 percent.

Our aggregate economic index today measures weighted-average annual economic growth at -0.05 standard deviations above average — implying historically average (for presidential elections since 1952) economic change in the two years before the election. When the economic fundamentals have been at this level historically, the incumbent party has typically won the national popular vote by about 1 percentage point — but there has been considerable variation. In 1988, with a similar (+0.2) economic growth rating as of early June, Republican Vice President George H. W. Bush bested Democratic Massachusetts Gov. Michael Dukakis by nearly 8 points in the national popular vote. But in 2012, with a 0.0 economic index in late May, President Barack Obama went on to win the popular vote by just 4 points.

This is where the **political fundamentals** come into play. Several political factors also impact the performance of presidential candidates. Incumbents, for example, have tended to enjoy a small boost when running for reelection — though that bonus has shrunk from its historical highs in the 20th century, and we don’t have any cases in recent history of an incumbent president running against a former president (incumbency advantage may be smaller in that case). Additionally, voters may punish the candidate of the party controlling the White House if the president’s approval rating is low, which we account for by including 538’s average presidential approval rating as a fundamental. We apply a smooth rolling average to approval so that our fundamentals index does not jump around too much on a day-to-day basis; after all, it is intended to broadly represent overall conditions during the election, not provide a hyper-accurate prediction of vote choice.

At the state level,**** there are several more political factors to consider. Presidential candidates tend to perform well in their home states — as do candidates for vice president, though their home-state advantage is smaller. Last, but not least, each state tends to vote similarly for the major parties over time, varying (increasingly) little from year to year. So we include the two-party vote share in the last election as a political fundamental as well.

Recently, it has also become clear that political polarization is decreasing the electorate’s responsiveness to external shocks to the system. We see this in two mainstream variables. In the University of Michigan’s Index of Consumer Sentiment, for example, Democrats and especially Republicans have become increasingly reactive not to changes in economic conditions but to which party controls the White House. Presidential approval ratings are also now dragged down by political and ideological polarization, with a shrinking share of the opposing party that any candidate can win over. This may partially explain why presidential approval was a poor predictor of the 2022 midterm elections.*****

To account for this, we allow our model to decrease the effect of presidential approval and economic growth in more polarized elections. We consider all elections after 1996 to be “polarized,” based on both an increase in party polarization in U.S. House roll call votes after the 104th Congress and the decreasing shares of Americans since that time who have reported switching parties between presidential elections, according to a study of polling data from the American National Election Studies.

We include these economic and political fundamentals in a Bayesian regression to predict the two-party vote share in each state from 1952 to 2020. We use a technique called regularization to “shrink” the coefficients for each variable back toward zero, which helps guard against fitting the model too strongly to past data (thus making it bad at predicting the future). We also include a dummy variable for each year, accounting for any residual nationwide impacts of variables not included in our model. This is useful, as it allows us to make predictions with the appropriate amount of uncertainty on future data.

The economic and political fundamentals offer robust measures of the conditions that exert a general gravitational pull on voter behavior in presidential elections. But as a general rule, on their own, they are not super accurate at predicting outcomes, even on Election Day. To wit: We think we have developed just about the best fundamentals model a person can develop, considering all the variables one needs to account for and the acute risk of overfitting that other more conventional methods suffer from — and still, we would expect our fundamentals model to miss the vote margin in the average state by about 6.5 points. That means that a state predicted to vote for Democrats by 3 points would go to Republicans about 32 percent of the time. For reference, if the polls in a state showed Democrats up by 3 points on Election Day, you’d expect the Republicans to win only around 15-20 percent of the time.

## 3. Combine polls and fundamentals and account for uncertainty

Despite the uncertainty in our fundamentals predictions, our historical testing revealed they are still helpful in producing a reliable election forecast. This is due to several factors, one of which is that the fundamentals are still much better than nothing (in other words, a blind guess), and there are often states without much polling. Another is that, as a matter of coincidence, the bias in our fundamentals predictions has tended to counteract bias in the polls; in years like 2020 when the polls underestimated Trump, the fundamentals overestimated his support, generating a combined prediction that was closer to the actual outcome of the election than either component prediction on its own.

For this reason, our forecasting model this year will never fully phase out fundamentals-based predictions, even by Election Day. Instead, the model puts the appropriate amount of weight on each indicator depending on how much noise is in the projections.

When we say “the appropriate amount,” we’re referring to the weight that is derived by applying Bayes’s rule to our predictions and their uncertainty. Formally, what we are doing is using our fundamentals-based prediction as an informative prior for what a polls-based model should predict the Election Day vote share to be. You can think of it sort of as stacking statistical distributions on top of each other: When we have few polls or when it’s early in the campaign, our model’s predictions are mostly based on the fundamentals — with their standard deviation usually around 6 points or so. But when it’s Election Day in a state with a lot of polls, our uncertainty about public opinion is a lot smaller, so we will put a lot more weight on the polling data (and subsequently, our polling averages that feed into the model) when generating our final prediction.

Until now, we have mostly talked about uncertainty in the fundamentals. Let’s now talk about uncertainty in the polling average, since that determines the weight the model puts on the polls.

On any given day we run the model, it considers two ways the polls could be wrong in predicting the election outcome. The first is **temporal drift in public opinion**. Since we’re making forecasts before Election Day, there is time for people to change their minds and polls to change. So we set up the model to explore thousands of different scenarios in which the polls change by a set amount across states. That change is correlated across states as discussed in Step 1.

But we still need to know how much opinion may drift over time. We can calculate this very easily empirically. First, for all of the contests in 538’s state and national presidential general election polling database (which goes back all the way to 1948, thanks to the tireless work of our research team!), we calculate retroactive state polling averages starting 300 days before each election and update them every day until Election Day. Then, we can calculate the average raw change in the polling averages from any day to November historically.

We find that the polling margin in the average state tends to move by 12 points over the entire course of the election. From 160 days out — around when we are launching this forecast — to Election Day, the expected change in margin is closer to 9 points. And by September, there is about 6 points of change left on average in the campaign. (From this, you get a good sense of how much the polls could still change here in 2024.)

The second component of polling error is **industrywide polling bias**, or when surveys from different pollsters miss the outcome in a similar direction. Simulating different levels of polling error lets us look into potential futures in which polls have underestimated support for each candidate by various amounts. As with the process for simulating temporal error, we need both an estimated state-by-state correlation matrix and a value determining the spread of the errors. We extract these numbers from a separate model that predicts historical misses in preelection polls.

This model tells us that the average polling miss for each party’s vote share in a competitive state is a hair over 2 points, or around 3.8 points on the margin between the candidates. (Errors for one party do not trade off 100 percent with each other; some amount of error also comes from voters floating to and from the “not sure” and third-party options.) We draw potential polling errors for the future from a fat-tailed distribution — specifically a Student’s t distribution with five degrees of freedom (a parameter that increases or decreases the likelihood of surprise “tail” events in our simulations). Error is also correlated across states, with the correlations fit using a similar method to the way we get correlations for the polling average and simulate temporal error. These values — expected error, the distribution’s degrees of freedom and the correlations between states — come directly from the historical model we train to predict polling bias.

These three parameters — expected fundamentals uncertainty, temporal drift and polling bias — then get input into our single combined poll-averaging and forecasting model for the current election. That model uses Markov chain Monte Carlo to simulate tens of thousands of different ways the election could go, each time varying the hundreds of parameters in our model. Those parameters include major factors, like how big the incumbency bonus should be or how much a president has historically been punished for a bad approval rating, as well as seemingly tiny factors like the house effect for a pollster who does one poll in Missouri in late July.

For each of these simulations (we usually call them “draws” in statistics), the Markov chain also picks random values from the distributions of fundamentals, temporal drift and polling bias. There may be one simulation with an optimistic fundamentals projection for Biden, owing to the above parameter error and prediction uncertainty, that also explores a world in which he gains 5 points in the polls nationally over the next five months. Then, we might find another simulation in which polls underestimate Trump by 4-5 points on the margin — a repeat of what we saw in the 2020 election. By repeating this process tens of thousands of times, we end up with a list of many different ways the election could unfold — which is what you see in the topline chart of our interactive.

## Conclusion

That’s it for the methodology! If you see any polls that are missing or spot a methodological bug (these things do happen when we launch new models, especially with this many moving parts), hit us up with an email at [email protected]. Otherwise, I’ll conclude with a brief explanation of the technical aspects of the model in more detail and an acknowledgment of past academic work on similar forecasting models.

First, if you’ve read this far, then you might be interested in more details on the actual statistical model powering 538’s forecast this year. It is based on my past forecasting work (you can find some of the original model code online), which created a multilevel dynamic linear model in a program for Bayesian statistical inference called Stan. This year’s model adds lots of bells and whistles — such as bringing the fundamentals model directly into the forecasting model, instead of running it separately and importing predictions — and, importantly, created the ability for 538 to make predictions for major third-party candidates.

I also did not work on this model alone. Holly Fuong, 538’s data editor, spent hundreds of hours reviewing the model code and making valuable methodological suggestions. We also worked with a freelance statistician who helped to massively improve our model’s statistical coherence and computational speed this spring. I owe another debt to the many academics who have worked on similar models previously. (This list is not exhaustive but should give you a good sense of the history of this approach.) Robert Erikson and Christopher Wlezien were the first to model support for candidates with house effects and as a smooth function of time over the campaign. Political scientist Simon Jackman later formalized a Bayesian time-series model of polls with house effects for Australian elections. Statisticians Kari Lock and Andrew Gelman employed a time-series model of polls as part of a paper forecasting election outcomes with polls and other data.

More recently, political scientist/survey statistician/pollster Drew Linzer combined aspects of these and other approaches in a Bayesian dynamic linear model of state and national polls of the 2012 general election. In 2016, Natalie Jackson at The Huffington Post also worked on a statistical forecasting model that ran on the Markov chain. And, as linked above, I worked with Gelman and statistician Merlin Heidemanns to add additional poll-level adjustments as well as other factors to model the 2020 general election in this way.

## Footnotes

*Except insofar as a candidate’s movement in a handful of states budges the national average, which then trickles down to other states.

**The Hispanic percentage is exclusive; regardless of race, respondents who said they were Hispanic were excluded from all other racial groups.

***We move states to and from the following groups:

- Alaska, Arizona, Montana and Texas start in one region. We move Alaska and Montana to the Mountain region, Arizona to the Southwest and Texas to its own region.
- We collapse Maine, Massachusetts, New Hampshire and Vermont into one region, New England.
- Florida and Virginia start in the Southwest region. We move them to a new Southeast region and add Georgia and North Carolina, which both start in the South region.
- Oklahoma starts in the Mountain region; we move it into the Texas group and take Louisiana from the South region to create one Texas mega-state we lovingly call “Tex-ish.”
- We move Illinois from a group with the Pacific states to the Rust Belt region.
- Indiana starts with the Plains states. We move it to the Rust Belt.
- We combine traditional Deep South and Southern border states such as Kentucky and Tennessee into a bigger South region.

****Because Maine and Nebraska split their electoral votes by congressional district — awarding one electoral vote to the winner of each of their districts — we also gather all this information at the district level in these states. Our model treats districts as separate geographic units similar to states, but with larger confidence intervals.

*****Additional research also suggests that voters blame the incumbent party more for a bad economy when the president is running for reelection than when the party runs a new candidate (such as when the incumbent is term-limited). We thought about accounting for that, too, but we think adding too many variables to the model would get us firmly into overfitting territory, so we didn’t end up doing so.