Election Forecasting
Methodology
This overview contains a brief
discussion of the following:
. Time-series
regression models vs.
. Final 2004 state and
national projections confirmed by the exit polls
. Analysis of 2004
registered voter (RV) and likely voter (
Election Model methodology
. Basic Polling
Mathematics
. Overview of
. Final state and
national pre-election polls and projections vs. the exit polls
. Real Clear Politics
(RCP) 102 RV and 31
The 2008
Election Model is updated frequently for the latest state and national
polls.
There are two basic
methods used to forecast presidential elections:
1) Projections based
on state and national polls
2) Time-series
regression models.
Statistical polling
(state and national) is an indicator of current voter preference. In the
Election Model, state poll shares are adjusted for undecided voters and the
associated win probabilities are then input to a 5000 election trial
Intuitively, the
probability of winning the True (no fraud) popular vote should correlate to the
The Election
The probability of
winning the popular vote is based on the projected aggregate state 2-party vote
share and margin of error. These are input to the Excel normal distribution
function NORMDIST: Prob (popular vote win) = NORMDIST
(vote share, 0.50, MoE/1.96, True)
As of July 1, the
Election Model has produced an identical 99.2% probability that Obama would win the popular and electoral vote if the
election was held on that day. His projected 53.2% popular vote and 2% MoE are input to NORMDIST (.532. 50, .02/1.96, true), which
returns the 99.92% probability.
Obama won 4996 (99.2%) of 5000
Academics and political scientists create multiple regression
models to forecast election vote shares months in advance. The models utilize
time-series data as relevant input variables such as economic growth,
inflation, job growth, interest rates, foreign policy, historical elections, incumbency,
approval rating, etc. Regression modeling
is an interesting theoretical exercise which does not account for the daily
events which affect voter psychology.
Polling and regression models are analogous to the current
market value of a stock and its intrinsic (theoretical) value. The intrinsic
value is based on forecast annual cash flows and rarely is equal to market
value. The latest poll is to the current
stock price as the regression model vote share is to intrinsic value.
Inherent problems
exist in election models, the most important of which is never discussed: Election
forecasts and media pundits never account for the probability of fraud. The
implicit assumption is that the official recorded vote will accurately reflect
the True Vote; the election will be fraud-free.
The following 2004 election forecasting
models were executed 2-9 months before the election.
The average Bush 53.9%
2-party projection deviated sharply from the aggregate unadjusted state exit
poll (47.7%).
None of the models
forecast the electoral vote or mentioned the possibility of election fraud.
Except for Beck/Tien, the popular vote win probabilities were incompatible
with forecasted vote share.
Assuming a 3.0% margin
of error, a 53% vote share implies a 97.5% popular vote win probability.
A 54% vote share
implies a 99.99% probability.
Final state and
national polls, adjusted for undecided voters and estimated turnout, are
superior to time-series models executed months in advance.
|
|
|
|
|
|
|
|
|
|
|
|
Author |
Pick |
2-pty |
Date |
Win Prob |
|
Beck/Tien |
Kerry |
50.1 |
27-Aug |
50 |
|
Abramowitz |
Bush |
53.7 |
31-Jul |
|
|
|
Bush |
53.8 |
6-Sep |
97 |
|
Wlezien/Ericson |
Bush |
52.9 |
27-Jul |
75 |
|
Holbrook |
Bush |
54.5 |
30-Aug |
92 |
|
Lockabie |
Bush |
57.6 |
21-May |
92 |
|
Norpoth |
Bush |
54.7 |
29-Jan |
95 |
|
|
|
|
|
|
|
Recorded Vote |
Bush |
51.2 |
2-Nov |
|
|
|
|
|
|
|
|
Exit Polls |
|
|
|
|
|
State |
Kerry |
52.3 |
2-Nov |
100.0 |
Unadjusted
WPE method |
Nat EP1 |
Kerry |
51.9 |
2-Nov |
99.9 |
39
Gore/41 Bush Voted 2k weights |
Nat EP2 |
Kerry |
52.9 |
2-Nov |
100.0 |
37.6/37.4
wtg, 122.3m recorded |
Election Model |
|
|
|
|
|
State Model |
Kerry |
51.8 |
1-Nov |
99.9 |
EV
Simulation: 4995 wins/5000 trials |
National Model |
Kerry |
51.8 |
1-Nov |
99.9 |
Final 5
national poll avg. projection |
Election Calculator |
Kerry |
53.7 |
2-Nov |
100.0 |
Voted
2k, 39.5/37.1 wtg, 125.7m votes cast |
Extensive statistical analysis indicates that Kerry defeated
Bush in 2004.
But Bush was the official winner by 62 - 59m recorded votes.
Were the above models accurate since they correctly forecast
Bush the winner?
Were Zogby and Harris correct in
projecting a Kerry victory?
In 2000, Gore had 540,000 more recorded votes than Bush.
Gore won by at least 50,000 votes in
Were the pollsters who forecast that Bush would win correct?
Were Zogby and Harris wrong in
projecting a Gore victory?
Fact: millions of Democratic voters are disenfranchised and
never cast a vote.
Fact: millions of mostly Democratic votes (70-80%) are uncounted
in every election.
Presidential election
forecasting models should have the following disclaimer: The forecast will
deviate from the official recorded vote. If they are nearly equal it would
indicate one or more of the following: a) input data errors, b) incorrect
assumptions, c) faulty model logic and/or methodology.
In 2000, 110.8m votes were cast, but only 105.4m recorded.
Gore won the True Vote by at least 3m. He won officially by
540,000.
In 2004, 125.7m votes were cast, but only 122.3m recorded.
Kerry won the True Vote by 8-10m, but lost the recorded vote by
3m.
Why should we expect 2008 to be any different? Can we be
confident that unverifiable DRE touch screens will reflect voter intent? Can we
assume that central tabulator software will not be hacked to switch votes?
The True Vote (T) always differs from the official recorded vote
(R) due to uncounted (U) and switched votes (S). The recorded vote is given by:
R = T - U - S (formula does not include disenfranchised voters).
Based on what we know from prior elections (especially since
2000) Democrats need a landslide to overcome massive, multi-level of fraud.
The Election Model tracks state and national polls to project
not only the popular vote but the expected electoral vote and win probability.
It actually contains two independent models:
a)
b) National average model - based on the latest national polls.
The only assumption in the Model is for the allocation of
undecided/other voters. Historically, 70-80% of undecided voters break for the
challenger. If the race is tied at 45-45, a 60-40% split of undecided voters
results in a 51-49% projected vote share. The win probability is calculated
using the projected vote shares as input to the normal distribution function.
In the state model, the average weighted poll share is calculated. The
vote shares are projected by adjusting the polls for the allocation of the
undecided voters. In the simulation, 5000 election trials are executed to
calculate the expected electoral vote and win probability.
The simulation produces
an expected electoral vote which is unaffected by minor deviations in the state
polls. It is much more accurate than a single poll.
A
powerful feature of the model is the built-in sensitivity analysis. Five
scenarios of undecided voter allocation project the state and national vote
shares, electoral votes and win probability. The
winner of the popular vote will almost certainly win the electoral vote if the
margin exceeds 0.5%.
A major advantage of national polling is its relative
simplicity. If the polling spread exceeds the margin of error (3% for a 1000
sample) then the leader has a minimum 97.5% probability of winning assuming the
poll is an unbiased sample. If three independent national polls are done on the
same day, it is essentially the equivalent a single poll of 3000 with a 1.8% MoE. Assuming a 52-48% split, the probability is 95% that
the leader will receive 50.2-53.8%. The probability is 97.5% that his vote
share will exceed 50.2%.
The MoE is 1.96 times the standard
deviation, the statistical measure of variability. The standard deviation and
projected vote share are input to the normal distribution function in order to
determine the probability of winning a vote share majority. To calculate the
expected EV from state polling data, the final vote is projected.
Typical state polls sample 600 voters with a 4% MoE. National polls of 1000 sample size have a 3% MoE. The probability of winning a state is based on the
poll. For a 50-50 projection, each candidate has a 50% probability of winning
the state. For a 51-49 split, the leader has a 69% probability; 83% for 52-48%;
93% for 53-47; 97% for 54-46.
The 2-party vote share for each state is projected by first applying the
undecided voter allocation. The win probability is then calculated based on the
projected vote shares. A random number (RND) between 0 and 1 is generated and
compared to the probability of winning the state. For example, assume the latest
poll indicates that Obama has a 90% probability of
winning
The procedure is repeated for all 50 states and DC. The election
trial winner is the candidate who has at least 270 EV. A total of 5000 election
trials are executed, therefore the probability of winning the electoral vote is
equal to the number of trial wins divided by 5000. The average (expected) electoral
vote is calculated. A major advantage of a simulation is that minor shifts
polls have minimal impact. The EV is projected as the average of 5000
simulations - not a single snapshot.
In summary, the Election Model projects the latest national and
state polls after adjusting for the allocation of undecided voters. The
probability of winning each state is calculated. A
The Election Model
tracks state and national polls to project the popular vote as well as the
expected electoral vote and win probability. It consists of two independent models:
a)
b) National Model – projects
national vote shares from a moving average projection based on the latest
national polls.
Based on state polls
as of June 22, the simulation determined that if the election were held that
day, Obama would win by 351-187 electoral votes with
52.8% of the 2-party vote. Since he won 4997 of 5000 simulated elections, his
win probability was virtually 100%.
A caveat: the Election Model assumes that the True Vote will be the
same as the official Recorded Vote. It never is. Every election is marred by a
combination of uncounted and miscounted votes. That is a historical fact. Nevertheless,
we continue to run our models hoping that this time the True Vote will be equal
to the Recorded Vote and the election will be fraud-free.
Projecting state and national vote shares
A major advantage of
national polls in projecting vote share is their relative simplicity. The poll split represents a snapshot of the
total electorate. If the polling spread exceeds the margin of error (3% for a
1000 sample) then the leader has a minimum 97.5% probability of winning - assuming
the poll is an unbiased sample. If three independent national polls are done on
the same day, that is essentially equivalent to a single 3000 sample with a
1.8% MoE.
Assuming a 52-48% split, the probability is 95% that the leader will
receive 50.2-53.8%. The probability is 97.5% that his vote share will exceed
50.2%.
In the
A major advantage of
For example, assume
that
Typical state polls
sample 600 voters with a 4% margin of error (MoE).
National polls of 1000-2000 sample size have a 2.5-3% MoE.
The probability of winning a state is based on the 2-party poll split and the MoE, after adjusting for undecided voters.
1. The 2-party vote
share is projected for each state is after allocating undecided voters. The win
probability is then calculated based on the projected vote shares. For example,
assuming a 50-50 projection and a 4% MoE, each
candidate has a 50% probability of winning the state. For a 51-49 split, there
is a 69% probability; 83% for 52-48%; 93% for 53-47; 97% for 54-46.
The MoE is 1.96 times the standard deviation, a statistical
measure of volatility. The standard deviation and projected vote share are
input to the normal distribution function in order to determine the probability
of winning at least 50% of the two-party vote.
2. In a simulated election
trial, a random number (RND) between
0 and 1 is generated for each state. The
RND is compared to the probability of
winning the state. The winner is determined by whichever value is higher.
For example, if the
latest Oregon poll indicates that Obama has a 90%
probability of winning, then if the RND is less than 0.90, Obama
wins Oregon’s 7 electoral votes; if the RND is greater than .90, McCain wins.
The same test is applied in each state (comparing the RND to the state win probability)
determine who wins the state. The winner of this election trial is the
candidate who has won least 270 EV.
3. The process is
repeated 5000 times (election trials).
The probability of winning the electoral
vote is just simple division; it’s equal to the number of trial wins divided by 5000. The expected electoral vote for each candidate is the average of the
5000 trials.
To repeat, there are
two major advantages of the simulation method:
1) minor shifts in
state polls have minimal impact on the expected EV.
2) The probability of
winning the electoral vote is a simple calculation: the number of election trial
wins/total number of election trials.
Undecided Voter Allocation
The only assumption used
in the model is the allocation of undecided/other voters. Historically, 70-80%
of undecided voters break for the challenger. For example, if the race is tied
at 45-45, a 60-40 split of undecided voters results in a 51-49% projected vote
share. The win probability is calculated using the projected vote shares as
input to the normal distribution function.
Some may disagree
with the base case undecided voter
allocation assumption. That's why a sensitivity
analysis of five (5) scenarios of undecided voter allocation is executed to
project the individual state (and aggregate) vote shares to determine the
corresponding electoral vote, aggregate national vote shares and the win
probability.
In summary, the
Election Model projects the latest national and state polls after adjusting for
the allocation of undecided voters. The probability of winning each state is
calculated. A