Election Forecasting: Monte Carlo Electoral Vote Simulation

Election Forecasting Methodology

TruthIsAll

This overview contains a brief discussion of the following:

. Time-series regression models vs. Monte Carlo polling simulation

. Final 2004 state and national projections confirmed by the exit polls

. Analysis of 2004 registered voter (RV) and likely voter (LV) polls

Election Model methodology

. Basic Polling Mathematics

. Overview of Monte Carlo Electoral Vote Simulation

The 2004 Election Model

. Final state and national pre-election polls and projections vs. the exit polls

. Real Clear Politics (RCP) 102 RV and 31 LV poll trend and analysis

The 2008 Election Model is updated frequently for the latest state and national polls.

There are two basic methods used to forecast presidential elections:

1) Projections based on state and national polls

2) Time-series regression models.

Statistical polling (state and national) is an indicator of current voter preference. In the Election Model, state poll shares are adjusted for undecided voters and the associated win probabilities are then input to a 5000 election trial Monte Carlo simulation. The goal is to calculate the expected electoral vote shares and the probability of an electoral vote victory. The probability is simply the number of winning election trials divided by 5000. The projection is not a long-term forecast; it assumes the election is held on the day of the projection.

Intuitively, the probability of winning the True (no fraud) popular vote should correlate to the Monte Carlo simulation probability of winning the electoral vote. In fact, if both probabilities are within a percentage point of each other, we can have confidence that they are correct mathematically. Probabilities generated by academics are inconsistent with forecast vote shares (see below) and do not check them against the probability of winning the electoral vote.

The Election Monte Carlo simulation uses individual state vote projections to determine the probability of winning the state. The probability is calculated for all 50 states and 5000 simulated election trials are executed to determine the average electoral vote split and the number of winning trials for each candidate. The probability of winning the electoral vote is just the number of winning trials divided by 5000.

The probability of winning the popular vote is based on the projected aggregate state 2-party vote share and margin of error. These are input to the Excel normal distribution function NORMDIST: Prob (popular vote win) = NORMDIST (vote share, 0.50, MoE/1.96, True)

As of July 1, the Election Model has produced an identical 99.2% probability that Obama would win the popular and electoral vote if the election was held on that day. His projected 53.2% popular vote and 2% MoE are input to NORMDIST (.532. 50, .02/1.96, true), which returns the 99.92% probability.

Obama won 4996 (99.2%) of 5000 Monte Carlo simulated election trials. Coincidentally, the probabilities match to within .01%!

Academics and political scientists create multiple regression models to forecast election vote shares months in advance. The models utilize time-series data as relevant input variables such as economic growth, inflation, job growth, interest rates, foreign policy, historical elections, incumbency, approval rating, etc. Regression modeling is an interesting theoretical exercise which does not account for the daily events which affect voter psychology.

Polling and regression models are analogous to the current market value of a stock and its intrinsic (theoretical) value. The intrinsic value is based on forecast annual cash flows and rarely is equal to market value. The latest poll is to the current stock price as the regression model vote share is to intrinsic value.

Inherent problems exist in election models, the most important of which is never discussed: Election forecasts and media pundits never account for the probability of fraud. The implicit assumption is that the official recorded vote will accurately reflect the True Vote; the election will be fraud-free.

The following 2004 election forecasting models were executed 2-9 months before the election.

The average Bush 53.9% 2-party projection deviated sharply from the aggregate unadjusted state exit poll (47.7%).

None of the models forecast the electoral vote or mentioned the possibility of election fraud.

Except for Beck/Tien, the popular vote win probabilities were incompatible with forecasted vote share.

Assuming a 3.0% margin of error, a 53% vote share implies a 97.5% popular vote win probability.

A 54% vote share implies a 99.99% probability.

Final state and national polls, adjusted for undecided voters and estimated turnout, are superior to time-series models executed months in advance.



Author	Pick	2-pty	Date	Win Prob
Beck/Tien	Kerry	50.1	27-Aug	50
Abramowitz	Bush	53.7	31-Jul
Campbell	Bush	53.8	6-Sep	97
Wlezien/Ericson	Bush	52.9	27-Jul	75
Holbrook	Bush	54.5	30-Aug	92
Lockabie	Bush	57.6	21-May	92
Norpoth	Bush	54.7	29-Jan	95

Recorded Vote	Bush	51.2	2-Nov

Exit Polls
State	Kerry	52.3	2-Nov	100.0	Unadjusted WPE method
Nat EP1	Kerry	51.9	2-Nov	99.9	39 Gore/41 Bush Voted 2k weights
Nat EP2	Kerry	52.9	2-Nov	100.0	37.6/37.4 wtg, 122.3m recorded
Election Model
State Model	Kerry	51.8	1-Nov	99.9	EV Simulation: 4995 wins/5000 trials
National Model	Kerry	51.8	1-Nov	99.9	Final 5 national poll avg. projection
Election Calculator	Kerry	53.7	2-Nov	100.0	Voted 2k, 39.5/37.1 wtg, 125.7m votes cast

Extensive statistical analysis indicates that Kerry defeated Bush in 2004.

But Bush was the official winner by 62 - 59m recorded votes.

Were the above models accurate since they correctly forecast Bush the winner?

Were Zogby and Harris correct in projecting a Kerry victory?

In 2000, Gore had 540,000 more recorded votes than Bush.

Gore won by at least 50,000 votes in Florida (75,000 under-votes and 110,000 over-votes were uncounted).

Were the pollsters who forecast that Bush would win correct?

Were Zogby and Harris wrong in projecting a Gore victory?

Fact: millions of Democratic voters are disenfranchised and never cast a vote.

Fact: millions of mostly Democratic votes (70-80%) are uncounted in every election.

Presidential election forecasting models should have the following disclaimer: The forecast will deviate from the official recorded vote. If they are nearly equal it would indicate one or more of the following: a) input data errors, b) incorrect assumptions, c) faulty model logic and/or methodology.

In 2000, 110.8m votes were cast, but only 105.4m recorded.

Gore won the True Vote by at least 3m. He won officially by 540,000.

In 2004, 125.7m votes were cast, but only 122.3m recorded.

Kerry won the True Vote by 8-10m, but lost the recorded vote by 3m.

Why should we expect 2008 to be any different? Can we be confident that unverifiable DRE touch screens will reflect voter intent? Can we assume that central tabulator software will not be hacked to switch votes?

The True Vote (T) always differs from the official recorded vote (R) due to uncounted (U) and switched votes (S). The recorded vote is given by:

R = T - U - S (formula does not include disenfranchised voters).

Based on what we know from prior elections (especially since 2000) Democrats need a landslide to overcome massive, multi-level of fraud.

Monte Carlo Electoral Vote Simulation

The Election Model tracks state and national polls to project not only the popular vote but the expected electoral vote and win probability.

It actually contains two independent models:

a) Monte Carlo Electoral Vote simulation - based on the latest pre-election state poolls.

b) National average model - based on the latest national polls.

The only assumption in the Model is for the allocation of undecided/other voters. Historically, 70-80% of undecided voters break for the challenger. If the race is tied at 45-45, a 60-40% split of undecided voters results in a 51-49% projected vote share. The win probability is calculated using the projected vote shares as input to the normal distribution function.

In the state model, the average weighted poll share is calculated. The vote shares are projected by adjusting the polls for the allocation of the undecided voters. In the simulation, 5000 election trials are executed to calculate the expected electoral vote and win probability.

The simulation produces an expected electoral vote which is unaffected by minor deviations in the state polls. It is much more accurate than a single poll.

A powerful feature of the model is the built-in sensitivity analysis. Five scenarios of undecided voter allocation project the state and national vote shares, electoral votes and win probability. The winner of the popular vote will almost certainly win the electoral vote if the margin exceeds 0.5%.

A major advantage of national polling is its relative simplicity. If the polling spread exceeds the margin of error (3% for a 1000 sample) then the leader has a minimum 97.5% probability of winning assuming the poll is an unbiased sample. If three independent national polls are done on the same day, it is essentially the equivalent a single poll of 3000 with a 1.8% MoE. Assuming a 52-48% split, the probability is 95% that the leader will receive 50.2-53.8%. The probability is 97.5% that his vote share will exceed 50.2%.

The MoE is 1.96 times the standard deviation, the statistical measure of variability. The standard deviation and projected vote share are input to the normal distribution function in order to determine the probability of winning a vote share majority. To calculate the expected EV from state polling data, the final vote is projected.

Typical state polls sample 600 voters with a 4% MoE. National polls of 1000 sample size have a 3% MoE. The probability of winning a state is based on the poll. For a 50-50 projection, each candidate has a 50% probability of winning the state. For a 51-49 split, the leader has a 69% probability; 83% for 52-48%; 93% for 53-47; 97% for 54-46.

The 2-party vote share for each state is projected by first applying the undecided voter allocation. The win probability is then calculated based on the projected vote shares. A random number (RND) between 0 and 1 is generated and compared to the probability of winning the state. For example, assume the latest poll indicates that Obama has a 90% probability of winning Oregon. If the RND is less than 0.90, Obama wins 7 electoral votes; if the RND is greater than .90, McCain wins.

The procedure is repeated for all 50 states and DC. The election trial winner is the candidate who has at least 270 EV. A total of 5000 election trials are executed, therefore the probability of winning the electoral vote is equal to the number of trial wins divided by 5000. The average (expected) electoral vote is calculated. A major advantage of a simulation is that minor shifts polls have minimal impact. The EV is projected as the average of 5000 simulations - not a single snapshot.

In summary, the Election Model projects the latest national and state polls after adjusting for the allocation of undecided voters. The probability of winning each state is calculated. A Monte Carlo simulation of 5000 election trials is then executed (using the individual state probabilities) to determine the expected final electoral vote and win probability. Independent national and state polling models provide a mathematical confirmation of each method.

The Election Model tracks state and national polls to project the popular vote as well as the expected electoral vote and win probability. It consists of two independent models:

a) Monte Carlo Electoral Vote Simulation - calculates the expected EV and win probability using projections based on the latest state polls.

b) National Model – projects national vote shares from a moving average projection based on the latest national polls.

Based on state polls as of June 22, the simulation determined that if the election were held that day, Obama would win by 351-187 electoral votes with 52.8% of the 2-party vote. Since he won 4997 of 5000 simulated elections, his win probability was virtually 100%.

A caveat: the Election Model assumes that the True Vote will be the same as the official Recorded Vote. It never is. Every election is marred by a combination of uncounted and miscounted votes. That is a historical fact. Nevertheless, we continue to run our models hoping that this time the True Vote will be equal to the Recorded Vote and the election will be fraud-free.

Projecting state and national vote shares

A major advantage of national polls in projecting vote share is their relative simplicity. The poll split represents a snapshot of the total electorate. If the polling spread exceeds the margin of error (3% for a 1000 sample) then the leader has a minimum 97.5% probability of winning - assuming the poll is an unbiased sample. If three independent national polls are done on the same day, that is essentially equivalent to a single 3000 sample with a 1.8% MoE. Assuming a 52-48% split, the probability is 95% that the leader will receive 50.2-53.8%. The probability is 97.5% that his vote share will exceed 50.2%.

In the Monte Carlo model, two-party vote shares are projected for each state. The latest polls are adjusted for an assumed allocation of undecided voters. In the simulation, 5000 election trials are executed to determine the expected (average) electoral vote and win probability.

A major advantage of Monte Carlo is that the results are hardly affected by minor daily deviations in the state polls. On the other hand, electoral vote projections from media pundits and Internet bloggers use a single snapshot of the latest polls to determine a projected electoral vote split. This approach has the advantage of simplicity, but can be very misleading since it often results in wild electoral vote swings. Snapshot projections cannot provide a robust expected electoral vote split and win probability. That’s because unlike the Monte Carlo method, they fail to consider the two bedrocks of statistical analysis: The Law of Large Numbers and the Central Limit Theorem.

For example, assume that Florida's polls shift from 46-45 Obama to 46-45 McCain. This would have a major impact in the electoral vote split. On the other hand, in a Monte Carlo simulation of 5000 election trials, the change would have just a minimal effect on the expected (average) electoral vote and win probability. The 46-45 poll split means that the race is much too close to clearly project a winner; both Obama and McCain have a nearly equal chance.

Typical state polls sample 600 voters with a 4% margin of error (MoE). National polls of 1000-2000 sample size have a 2.5-3% MoE. The probability of winning a state is based on the 2-party poll split and the MoE, after adjusting for undecided voters.

Monte Carlo simulation methodology

1. The 2-party vote share is projected for each state is after allocating undecided voters. The win probability is then calculated based on the projected vote shares. For example, assuming a 50-50 projection and a 4% MoE, each candidate has a 50% probability of winning the state. For a 51-49 split, there is a 69% probability; 83% for 52-48%; 93% for 53-47; 97% for 54-46.

The MoE is 1.96 times the standard deviation, a statistical measure of volatility. The standard deviation and projected vote share are input to the normal distribution function in order to determine the probability of winning at least 50% of the two-party vote.

2. In a simulated election trial, a random number (RND) between 0 and 1 is generated for each state. The RND is compared to the probability of winning the state. The winner is determined by whichever value is higher.

For example, if the latest Oregon poll indicates that Obama has a 90% probability of winning, then if the RND is less than 0.90, Obama wins Oregon’s 7 electoral votes; if the RND is greater than .90, McCain wins. The same test is applied in each state (comparing the RND to the state win probability) determine who wins the state. The winner of this election trial is the candidate who has won least 270 EV.

3. The process is repeated 5000 times (election trials). The probability of winning the electoral vote is just simple division; it’s equal to the number of trial wins divided by 5000. The expected electoral vote for each candidate is the average of the 5000 trials.

To repeat, there are two major advantages of the simulation method:

1) minor shifts in state polls have minimal impact on the expected EV.

2) The probability of winning the electoral vote is a simple calculation: the number of election trial wins/total number of election trials.

Undecided Voter Allocation

The only assumption used in the model is the allocation of undecided/other voters. Historically, 70-80% of undecided voters break for the challenger. For example, if the race is tied at 45-45, a 60-40 split of undecided voters results in a 51-49% projected vote share. The win probability is calculated using the projected vote shares as input to the normal distribution function.

Some may disagree with the base case undecided voter allocation assumption. That's why a sensitivity analysis of five (5) scenarios of undecided voter allocation is executed to project the individual state (and aggregate) vote shares to determine the corresponding electoral vote, aggregate national vote shares and the win probability.

In summary, the Election Model projects the latest national and state polls after adjusting for the allocation of undecided voters. The probability of winning each state is calculated. A Monte Carlo simulation of 5000 election trials is executed using the individual state win probabilities to determine the expected final electoral vote and win probability. If the independent national and state projections are in close agreement, that is a strong confirmation that the models are consistent and are probably representative of the True Vote.