Sample Sizes to Detect Election Errors

We especially want to detect errors which change an election winner. The same arithmetic applies whether errors are accidental or hacks.

If a winning margin is 2%, an error of just 1% could have caused it: taking 1% of the total votes from candidate A and adding them to candidate B, so it changes the winning margin by 2 percentage points.

A. Random samples of individual ballots.

How many ballots do we need to sample individually, to make it likely we'll find any errors which changed outcomes?

 Winning margin of each contest, as % of all ballots 1.25% 2.50% 5% 10% 20% 40% With that margin, outcome could be wrong if there were this many erroneous ballots 0.63% 1.25% 2.50% 5% 10% 20% You can word that as 1 error in this many records 1 in 160 1 in 80 1 in 40 1 in 20 1 in 10 1 in 5 This random sample of ballots gives 63.2% probability of detecting that level of error (sample = 2 divided by the winning margin) 160 80 40 20 10 5 A bigger random sample of ballots gives 90% probability of detecting that level of error (sample ≈ 4.5 divided by the winning margin) 368 184 91 45 22 11 Tables are calculated in a spreadsheet

So a sample of 184 individual ballots has 90% chance of detecting errors, if 1.25% of all ballots were counted erroneously. This sample reassures the public that contests with winning margins of 2.5% or more weren't created by erroneous counts.

This same sample is also more than enough to show that winning margins down to 1.25% weren't created by errors. That needs a sample of 160. However the probability that the sample will find such a small error level is only 63%, not 90%, which is not as reassuring.

The only way to be 100% sure of finding small levels of error is to check all ballots, not a sample. Image audits check them all for errors in the software which interprets and tallies the votes. These still need a sample to check for some kinds of scanner errors which are not visible on the scanned image. Fortunately that kind of scanner error has not been found in elections so far.

How likely is it that each sample of individual ballots finds error, if present?

 Winning margin of each contest, as % of all ballots 1.25% 2.50% 5% 10% 20% 40% With that margin, outcome could be wrong if there were this many erroneous ballots 0.63% 1.25% 2.50% 5% 10% 20% Size of random sample Probability that sample size at left will detect the error rate above 1 0.63% 1.25% 2.5% 5.0% 10.0% 20.0% 2 1.25% 2.5% 4.9% 9.8% 19.0% 36.0% 11 6.7% 12.9% 24.3% 43.1% 68.6% 91.4% 22 12.9% 24.2% 42.7% 67.6% 90.2% 99.3% 45 24.6% 43.2% 68.0% 90.1% 99.1% 100.0% 91 43.5% 68.2% 90.0% 99.1% 100.0% 100.0% 184 68.5% 90.1% 99.1% 100.0% 100.0% 100.0% 368 90.0% 99.0% 100.0% 100.0% 100.0% 100.0% 500 95.6% 99.8% 100.0% 100.0% 100.0% 100.0% 1,000 99.8% 100.0% 100.0% 100.0% 100.0% 100.0%

As in the table above, a sample of 184 individual ballots has 90% chance of detecting errors, if 1.25% of all ballots were counted erroneously. This table shows the probabilities for bigger and smaller samples. A sample of 500 has 99.8% chance of catching this 1.25% error level.

B. Random samples of precincts, voting machines, or other batches of ballots.

How many batches do we need to sample, to make it likely we'll find any errors which changed outcomes?

Many states don't sample individual ballots, They keep ballots in batches. Each batch may be a precinct, voting machine, or a group of ballots which went through a scanner together. Each batch usually has a few hundred ballots.

These states choose a sample of batches. They tally all ballots in each sampled batch, and compare to the original election machine's tally of the same batch. This will find all batches with errors. Tallying hundreds of ballots in each batch is costly and time-consuming, So sampling individual ballots can save work where it's possible.

If we do re-tally a sample of batches, how big a sample do we need? The hardest errors to find are where a hack or error happened in a few scanners, precincts, etc, so it might affect all or a big fraction of ballots in some batches, and no ballots in other batches, and we need to find the few erroneous batches.

 Winning margin of each contest, as % of all ballots 1.25% 2.50% 5% 10% 20% 40% With that margin, outcome could be wrong if there were this many erroneous ballots 0.63% 1.25% 2.50% 5% 10% 20% If worst batches are 100% wrong, this random sample of batches gives 90% chance of detecting error 368 184 91 45 22 11 If worst batches are 50% wrong, this random sample of batches gives 90% chance of detecting error 184 91 45 22 11 5 If worst batches are 25% wrong, this random sample of batches gives 90% chance of detecting error 91 45 22 11 5 2

So a sample of 45 batches has 90% chance of detecting errors, if 1.25% of all ballots were counted erroneously, and if error levels averaged 25% in the worst batches. This sample reassures the public that contests with winning margins of 2.5% or more weren't created by errors.

If some batches were entirely erroneous, by mis-programming or bad scanners, a 45-batch sample has 90% chance of detecting error levels of 5%, so smaller errors could sneak through. A sample of 184 batches has 90% chance of detecting errors, if 1.25% of all ballots were counted erroneously, and if error levels averaged 100% in the worst batches.

Having batches 100% wrong is rare. It could happen for example if ballots from a drop box in a 100% Democratic area go through an erroneous election scanner whose programming  is off a line and tallies Democratic votes as Republican, in one or more contests. No one labels which drop box a batch comes from, so this batch would look as if it came from an all-Republican area and would not necessarily be investigated.

Even looking for batches which are 25% erroneous, requires re-tallying 45 batches to check winning margins of 2.5% or more, and 91 batches for margins of 1.25% or more.

C. Formulas

There are formulas connecting sample size, error level, and risk limit or confidence level in detecting those errors. The following formulas apply to simple random samples.

e = erroneous items (ballots or batches) as fraction  of all items

e = probability that one random item in the sample contains error

(1-e) = probability that one random item in the sample is accurate. Multiply this n times for sample of n:

(1-e)n = r = risk that n random items will all be accurate, i.e. all n will miss the erroneous items

That final formula can be rearranged to calculate error rate and sample size:

1-r1/n = e = error rate which a sample of n can detect, with only r chance of missing it (fractional exponent means the nth root).

e multiplied by total number of items in the jurisdictions (ballots or batches) = number of erroneous items which a sample of n can detect, with only r chance of missing it

log(r) / log(1-e) = n = sample size needed, so the chance of missing error level e is only r (logarithms can be natural or any base, as long as they have the same base as each other).

Errors in batches depend on how concentrated erroneous ballots are:

f = erroneous ballots as fraction of all ballots

w = erroneous ballots as fraction of worst batches

f / w = Erroneous batches as fraction of all batches. This can be used as e in the formulas above.

For example, this is the table of batch sample sizes used above, with w and f/w explicitly shown.

 Winning margin of each contest, as % of all ballots 1.25% 2.50% 5% 10% 20% 40% With that margin, outcome could be wrong if there were this many erroneous ballots 0.63% 1.25% 2.50% 5% 10% 20% f If worst batches are 100% wrong (w=1), this many batches would be erroneous 0.63% 1.25% 2.50% 5% 10% 20% f/w If worst batches are 50% wrong (w=0.5), this many batches would be erroneous 1.25% 2.50% 5% 10% 20% 40% f/w If worst batches are 25% wrong (w=0.25), this many batches would be erroneous 2.50% 5% 10% 20% 40% 80% f/w If worst batches are 100% wrong, this random sample of batches gives 90% chance of detecting error 368 184 91 45 22 11 If worst batches are 50% wrong, this random sample of batches gives 90% chance of detecting error 184 91 45 22 11 5 If worst batches are 25% wrong, this random sample of batches gives 90% chance of detecting error 91 45 22 11 5 2

The table shows smaller samples needed if we assume the worst batches are 25% erroneous, not 100% erroneous.