[EM] Calculated failure/success rates using randomized ballots and candidates

classic Classic list List threaded Threaded
6 messages Options
Reply | Threaded
Open this post in threaded view
|

[EM] Calculated failure/success rates using randomized ballots and candidates

VoteFair-2
I've written a C++ program that generates randomized ballots, feeds
these ballots to a separate program that calculates a winner according
to various vote-counting methods, compares the results, and calculates
failure/success rates.

The program does two kinds of tests:

* IIA: Tests successes/failures according to the Independence of
Irrelevant Alternatives (IIA) criterion.  Specifically it calculates
which candidate would win with all the candidates, and then it removes
each of the non-winning candidates, one by one, to test whether a
different candidate would win.  If any of the comparisons yield a
different result -- such as a 6-candidate contest giving a different
winner compared to the 7-candidate contest that uses the same ballots
(with one candidate omitted in the 6-candidate contest) -- then that's
counted as one failure.

* "Agree/Disagree": Tests how often one counting method yields a winner
who is the same as, or different from, the winner according to another
vote-counting method.

ASSUMPTIONS/CONDITIONS:

The randomized ballots assume that the voters are expressing their
sincere preferences, without any tactical voting, and without ranking or
rating two candidates at the same preference level.

For both tests, when a tie occurs for the winner of either test case,
the tied case is ignored.  This means ties do not count as a failure or
a success.

For my purposes I've used 4,000 randomized cases per test, and each test
uses 17 ballots.  Unless otherwise specified, the test (or full test in
the case of IIA) uses 7 candidates.

IIA RESULTS:

The Independence of Irrelevant Alternatives (IIA) success rate of the
Condorcet-Kemeny method is 79%, which is a failure rate of 21%.

The IIA success rates of the following methods are all about 1% or 2%,
which is a failure rate of 99% or 98%:

* IRV: Instant Runoff Voting

* IPE: Instant Pairwise Elimination (described in ElectoWiki)

* IRMPL: Instant Runoff Minus Pairwise Loser, which uses PLE (see below)
as a safety net under IRV.

* STAR: Score Then Automatic Runoff

Another method, PLE, which is an abbreviation for Pairwise Loser
Elimination, has a calculated 100% success rate, which is zero failures.
  This method successively eliminates the Condorcet loser, one round at
a time, and stops with a tie when it encounters a Condorcet
(rock-paper-scissors-like) cycle.  This perfect success rate occurs
because tied cases are ignored, which leaves only cases that have no
cycles at any level, which means the method finds the Condorcet winner
in both the 7-candidate case and the 6-candidate case.

Conclusion: Methods that eliminate one candidate at a time frequently
fail the Independence of Irrelevant Alternatives (IIA) criterion.  In
contrast, the Condorcet-Kemeny method, which uses all the pairwise
counts (not just the biggest or smallest pairwise counts or differences
between pairwise counts), yields a dramatically better IIA success rate.

Important: In real elections the success rates would be higher -- there
would be fewer failures -- because real elections have meaningful
differences between candidates.  Remember that this software randomizes
the ballots without any bias.  This means that almost all the test cases
are "semi-balanced" or "sitting on the fence" (or maybe "finding the
highest sand dune rather than finding the highest mountain") kinds of cases.

Advocates of STAR voting may claim these numbers are not meaningful for
STAR voting because STAR voting uses Score ballots, not ranked ballots.
(On Score ballots the gap between preference levels is significant.)
Yet fans of STAR voting also claim that its use of a top-two runoff
discourages tactical voting, particularly the tactic of favoring the use
of high and low preference levels, and avoiding the use of middle
preference levels.  Keeping in mind that these tests assume the voters
are voting sincerely, I believe these two claims are contradictory.
(Feedback on this or any other part of this message is welcome.)

AGREE/DISAGREE RESULTS:

Below are the results from the "Agree/Disagree" test, which in my tests
compare the Condorcet-Kemeny winner with the winner from each of the
indicated vote-counting methods.  Specifically the "agree" percentages
refer to matches with the Condorcet-Kemeny winner, and the "disagree"
percentages apply when the method identifies a different winner.  (By
definition the Condorcet-Kemeny method would yield 100% agreement.)

About the "ties" numbers specified in parentheses:  They are counts out
of 4,000 cases, not percentages.  These tied cases are not counted in
either the success or failure percentages.

Note that when there are only two candidates, all the methods always agree.

number of candidates: 2
IPE agree/disagree: 100%  0%  (0 ties)
IRMPL agree/disagree: 100.0%  0%  (0 ties)
STAR agree/disagree: 100.0%  0%  (0 ties)
IRV agree/disagree: 100.0%  0%  (0 ties)
PLE agree/disagree: 100.0%  0%  (0 ties)

number of candidates: 3
IPE agree/disagree: 95.1%  4.8%  (0 ties)
IRMPL agree/disagree: 95.7%  4.2%  (0 ties)
STAR agree/disagree: 95.2%  4.7%  (296 ties)
IRV agree/disagree: 93.0%  6.9%  (643 ties)
PLE agree/disagree: 100.0%  0%  (286 ties)

number of candidates: 4
IPE agree/disagree: 92.5%  7.4%  (59 ties)
IRMPL agree/disagree: 91.3%  8.6%  (9 ties)
STAR agree/disagree: 94.8%  5.1%  (440 ties)
IRV agree/disagree: 84.0%  15.9%  (1582 ties)
PLE agree/disagree: 100.0%  0%  (943 ties)

number of candidates: 5
IPE agree/disagree: 92.3%  7.6%  (103 ties)
IRMPL agree/disagree: 88.9%  11.0%  (14 ties)
STAR agree/disagree: 93.6%  6.3%  (435 ties)
IRV agree/disagree: 77.7%  22.2%  (2485 ties)
PLE agree/disagree: 100.0%  0%  (1724 ties)

number of candidates: 6
IPE agree/disagree: 90.6%  9.3%  (172 ties)
IRMPL agree/disagree: 84.9%  15.0%  (27 ties)
STAR agree/disagree: 91.4%  8.5%  (420 ties)
IRV agree/disagree: 69.7%  30.2%  (3203 ties)
PLE agree/disagree: 100.0%  0%  (2513 ties)

number of candidates: 7
IPE agree/disagree: 88.7%  11.2%  (219 ties)
IRMPL agree/disagree: 81.6%  18.3%  (67 ties)
STAR agree/disagree: 89.4%  10.5%  (441 ties)
IRV agree/disagree: 59.5%  40.4%  (3517 ties)
PLE agree/disagree: 100.0%  0%  (3063 ties)

As the number of candidates increases, the methods more often disagree
with the Condorcet-Kemeny method.  So the bottom numbers, where there
are 7 candidates, are the most revealing.

The bottom numbers show that IRV -- Instant Runoff Voting -- is the
worst of these methods.  It agrees in about 60 percent of the non-tie
cases.  The other three methods -- IPE, IRMPL, and STAR -- have similar
success rates of about 80 or 90 percent.

Of course this result -- that IRV is not a good vote-counting method --
is not surprising.  Yet it's nice to have numeric confirmation.

LINKS:

Here are links to the two programs used in these tests:

https://github.com/cpsolver/VoteFair-ranking-cpp/blob/master/generate_random_ballots.cpp

https://github.com/cpsolver/VoteFair-ranking-cpp/blob/master/votefair_ranking.cpp

GOAL:

My hope is that this software takes us a step closer to yielding numbers
to better characterize HOW OFTEN each vote-counting method passes or
fails each of the "fairness" criteria, the ones that are currently
flagged as "yes" or "no" in this comparison table:

https://en.wikipedia.org/wiki/Comparison_of_electoral_systems#Compliance_of_selected_single-winner_methods

I realize the numbers calculated by my software are not suitable as
estimates for real-life elections -- because randomized ballots and
randomized candidates do not match real-life elections.  Yet these
calculated numbers provide a peek at ways to compare methods more
meaningfully than just flagging methods as pass-or-fail.

THANKS:

If you find any software bugs, please tell me, either here or on GitHub.

Feedback is welcome.  That's why I've posted these results here.

Richard Fobes
----
Election-Methods mailing list - see https://electorama.com/em for list info
Reply | Threaded
Open this post in threaded view
|

Re: [EM] Calculated failure/success rates using randomized ballots and candidates

Jan Kok
Please consider merging your code with Warren D. Smith's election simulator available at https://www.rangevoting.org/IEVS/IEVS.c

On Mon, Jan 4, 2021 at 12:15 AM VoteFair <[hidden email]> wrote:
I've written a C++ program that generates randomized ballots, feeds
these ballots to a separate program that calculates a winner according
to various vote-counting methods, compares the results, and calculates
failure/success rates.

The program does two kinds of tests:

* IIA: Tests successes/failures according to the Independence of
Irrelevant Alternatives (IIA) criterion.  Specifically it calculates
which candidate would win with all the candidates, and then it removes
each of the non-winning candidates, one by one, to test whether a
different candidate would win.  If any of the comparisons yield a
different result -- such as a 6-candidate contest giving a different
winner compared to the 7-candidate contest that uses the same ballots
(with one candidate omitted in the 6-candidate contest) -- then that's
counted as one failure.

* "Agree/Disagree": Tests how often one counting method yields a winner
who is the same as, or different from, the winner according to another
vote-counting method.

ASSUMPTIONS/CONDITIONS:

The randomized ballots assume that the voters are expressing their
sincere preferences, without any tactical voting, and without ranking or
rating two candidates at the same preference level.

For both tests, when a tie occurs for the winner of either test case,
the tied case is ignored.  This means ties do not count as a failure or
a success.

For my purposes I've used 4,000 randomized cases per test, and each test
uses 17 ballots.  Unless otherwise specified, the test (or full test in
the case of IIA) uses 7 candidates.

IIA RESULTS:

The Independence of Irrelevant Alternatives (IIA) success rate of the
Condorcet-Kemeny method is 79%, which is a failure rate of 21%.

The IIA success rates of the following methods are all about 1% or 2%,
which is a failure rate of 99% or 98%:

* IRV: Instant Runoff Voting

* IPE: Instant Pairwise Elimination (described in ElectoWiki)

* IRMPL: Instant Runoff Minus Pairwise Loser, which uses PLE (see below)
as a safety net under IRV.

* STAR: Score Then Automatic Runoff

Another method, PLE, which is an abbreviation for Pairwise Loser
Elimination, has a calculated 100% success rate, which is zero failures.
  This method successively eliminates the Condorcet loser, one round at
a time, and stops with a tie when it encounters a Condorcet
(rock-paper-scissors-like) cycle.  This perfect success rate occurs
because tied cases are ignored, which leaves only cases that have no
cycles at any level, which means the method finds the Condorcet winner
in both the 7-candidate case and the 6-candidate case.

Conclusion: Methods that eliminate one candidate at a time frequently
fail the Independence of Irrelevant Alternatives (IIA) criterion.  In
contrast, the Condorcet-Kemeny method, which uses all the pairwise
counts (not just the biggest or smallest pairwise counts or differences
between pairwise counts), yields a dramatically better IIA success rate.

Important: In real elections the success rates would be higher -- there
would be fewer failures -- because real elections have meaningful
differences between candidates.  Remember that this software randomizes
the ballots without any bias.  This means that almost all the test cases
are "semi-balanced" or "sitting on the fence" (or maybe "finding the
highest sand dune rather than finding the highest mountain") kinds of cases.

Advocates of STAR voting may claim these numbers are not meaningful for
STAR voting because STAR voting uses Score ballots, not ranked ballots.
(On Score ballots the gap between preference levels is significant.)
Yet fans of STAR voting also claim that its use of a top-two runoff
discourages tactical voting, particularly the tactic of favoring the use
of high and low preference levels, and avoiding the use of middle
preference levels.  Keeping in mind that these tests assume the voters
are voting sincerely, I believe these two claims are contradictory.
(Feedback on this or any other part of this message is welcome.)

AGREE/DISAGREE RESULTS:

Below are the results from the "Agree/Disagree" test, which in my tests
compare the Condorcet-Kemeny winner with the winner from each of the
indicated vote-counting methods.  Specifically the "agree" percentages
refer to matches with the Condorcet-Kemeny winner, and the "disagree"
percentages apply when the method identifies a different winner.  (By
definition the Condorcet-Kemeny method would yield 100% agreement.)

About the "ties" numbers specified in parentheses:  They are counts out
of 4,000 cases, not percentages.  These tied cases are not counted in
either the success or failure percentages.

Note that when there are only two candidates, all the methods always agree.

number of candidates: 2
IPE agree/disagree: 100%  0%  (0 ties)
IRMPL agree/disagree: 100.0%  0%  (0 ties)
STAR agree/disagree: 100.0%  0%  (0 ties)
IRV agree/disagree: 100.0%  0%  (0 ties)
PLE agree/disagree: 100.0%  0%  (0 ties)

number of candidates: 3
IPE agree/disagree: 95.1%  4.8%  (0 ties)
IRMPL agree/disagree: 95.7%  4.2%  (0 ties)
STAR agree/disagree: 95.2%  4.7%  (296 ties)
IRV agree/disagree: 93.0%  6.9%  (643 ties)
PLE agree/disagree: 100.0%  0%  (286 ties)

number of candidates: 4
IPE agree/disagree: 92.5%  7.4%  (59 ties)
IRMPL agree/disagree: 91.3%  8.6%  (9 ties)
STAR agree/disagree: 94.8%  5.1%  (440 ties)
IRV agree/disagree: 84.0%  15.9%  (1582 ties)
PLE agree/disagree: 100.0%  0%  (943 ties)

number of candidates: 5
IPE agree/disagree: 92.3%  7.6%  (103 ties)
IRMPL agree/disagree: 88.9%  11.0%  (14 ties)
STAR agree/disagree: 93.6%  6.3%  (435 ties)
IRV agree/disagree: 77.7%  22.2%  (2485 ties)
PLE agree/disagree: 100.0%  0%  (1724 ties)

number of candidates: 6
IPE agree/disagree: 90.6%  9.3%  (172 ties)
IRMPL agree/disagree: 84.9%  15.0%  (27 ties)
STAR agree/disagree: 91.4%  8.5%  (420 ties)
IRV agree/disagree: 69.7%  30.2%  (3203 ties)
PLE agree/disagree: 100.0%  0%  (2513 ties)

number of candidates: 7
IPE agree/disagree: 88.7%  11.2%  (219 ties)
IRMPL agree/disagree: 81.6%  18.3%  (67 ties)
STAR agree/disagree: 89.4%  10.5%  (441 ties)
IRV agree/disagree: 59.5%  40.4%  (3517 ties)
PLE agree/disagree: 100.0%  0%  (3063 ties)

As the number of candidates increases, the methods more often disagree
with the Condorcet-Kemeny method.  So the bottom numbers, where there
are 7 candidates, are the most revealing.

The bottom numbers show that IRV -- Instant Runoff Voting -- is the
worst of these methods.  It agrees in about 60 percent of the non-tie
cases.  The other three methods -- IPE, IRMPL, and STAR -- have similar
success rates of about 80 or 90 percent.

Of course this result -- that IRV is not a good vote-counting method --
is not surprising.  Yet it's nice to have numeric confirmation.

LINKS:

Here are links to the two programs used in these tests:

https://github.com/cpsolver/VoteFair-ranking-cpp/blob/master/generate_random_ballots.cpp

https://github.com/cpsolver/VoteFair-ranking-cpp/blob/master/votefair_ranking.cpp

GOAL:

My hope is that this software takes us a step closer to yielding numbers
to better characterize HOW OFTEN each vote-counting method passes or
fails each of the "fairness" criteria, the ones that are currently
flagged as "yes" or "no" in this comparison table:

https://en.wikipedia.org/wiki/Comparison_of_electoral_systems#Compliance_of_selected_single-winner_methods

I realize the numbers calculated by my software are not suitable as
estimates for real-life elections -- because randomized ballots and
randomized candidates do not match real-life elections.  Yet these
calculated numbers provide a peek at ways to compare methods more
meaningfully than just flagging methods as pass-or-fail.

THANKS:

If you find any software bugs, please tell me, either here or on GitHub.

Feedback is welcome.  That's why I've posted these results here.

Richard Fobes
----
Election-Methods mailing list - see https://electorama.com/em for list info

----
Election-Methods mailing list - see https://electorama.com/em for list info
Reply | Threaded
Open this post in threaded view
|

Re: [EM] Calculated failure/success rates using randomized ballots and candidates

VoteFair-2
On 1/4/2021 3:03 PM, Jan Kok wrote:
> Please consider merging your code with Warren D. Smith's election
> simulator available at https://www.rangevoting.org/IEVS/IEVS.c

Thanks for the suggestion.  I share your desire for compatible software
tools for analyzing and implementing better election methods.

However, I looked at the IEVS.c code and see that it serves a different
purpose from what my code is intended to accomplish.  And it's
incompatible with what I think we need.  Specifically:

IEVS does not include the STAR (Score Then Automatic Runoff) voting
method, which is surprising since the STAR method is based on Score
ballots, which is what Warren Smith favors.  I'll let someone else who
is already familiar with IEVS conventions add that missing method.

IEVS does not include the Condorcet-Kemeny method, and adding that
method to the IEVS code would not be simple.

I believe that separating ballot-generation software from vote-counting
software is very important for these reasons:

* Allows tech-savvy voters to test vote-counting software for the
purpose of deciding whether the vote-counting software is ready to be
used for real surveys, elections, etc.  The IEVS program does not have
this separation between method calculations and the generation of test
cases.

* My vote-counting code accepts ballot codes that indicate actual
ballots from an actual survey, election, etc., which contrasts with the
IEVS code that generates ballots internally.

* IEVS does not allow a ballot to rank multiple candidates at the same
preference level, yet this is needed for real voting situations.  My
vote-counting software handles these ballots, so I don't see value in
stripping out this important code to fit into the IEVS framework.

* IEVS code does not fully handle resolving ties, which is yet another
important part of calculating results for real surveys, elections, etc.

* IEVS can simulate dishonest voters by using the candidate input order
(from most popular to least popular) to decide how to "mark" ballots
dishonestly.  This is great for academic analysis.  But this feature
destroys the trustworthiness of the IEVS vote-counting code for use in
real surveys, elections, etc.

Yes, the IEVS code is very useful for academic analysis.

Yet I believe we need more ready-to-use "real-life" vote-counting
software -- and external/separate testing software -- because that
combination provides an important bridge to getting better methods
adopted in real surveys, elections, etc.

Richard Fobes
----
Election-Methods mailing list - see https://electorama.com/em for list info
Reply | Threaded
Open this post in threaded view
|

Re: [EM] Calculated failure/success rates using randomized ballots and candidates

Kristofer Munsterhjelm-3
On 05/01/2021 06.12, VoteFair wrote:

> On 1/4/2021 3:03 PM, Jan Kok wrote:
>> Please consider merging your code with Warren D. Smith's election
>> simulator available at https://www.rangevoting.org/IEVS/IEVS.c
>
> Thanks for the suggestion.  I share your desire for compatible software
> tools for analyzing and implementing better election methods.
>
> However, I looked at the IEVS.c code and see that it serves a different
> purpose from what my code is intended to accomplish.  And it's
> incompatible with what I think we need.  Specifically:
>
> IEVS does not include the STAR (Score Then Automatic Runoff) voting
> method, which is surprising since the STAR method is based on Score
> ballots, which is what Warren Smith favors.  I'll let someone else who
> is already familiar with IEVS conventions add that missing method.
>
> IEVS does not include the Condorcet-Kemeny method, and adding that
> method to the IEVS code would not be simple.

My voting simulator, Quadelect (https://github.com/kristomu/quadelect)
supports Kemeny. It doesn't currently support STAR, but it does support
Smith,Range. It also supports equal-rank ballots.

-km
----
Election-Methods mailing list - see https://electorama.com/em for list info
Reply | Threaded
Open this post in threaded view
|

Re: [EM] Calculated failure/success rates using randomized ballots and candidates

VoteFair-2
On 1/6/2021 4:23 AM, Kristofer Munsterhjelm wrote:
> My voting simulator, Quadelect (https://github.com/kristomu/quadelect)
> supports Kemeny. It doesn't currently support STAR, but it does support
> Smith,Range. It also supports equal-rank ballots.

I looked at this software and I like it!

I was not able to find the portions of code that would indicate which
tests it does (besides monotonicity, which I did find), and which other
methods it calculates.  Nor could I find documentation that would point
out which folders contain that code.

It looks like it reads ballot info (which Warren's software does not do)
so if I discover that it does some tests of interest I would consider
looking at your code for further possibilities.

Thank you for bringing it to our attention.

Richard Fobes
----
Election-Methods mailing list - see https://electorama.com/em for list info
Reply | Threaded
Open this post in threaded view
|

Re: [EM] Calculated failure/success rates using randomized ballots and candidates

Kristofer Munsterhjelm-3
On 08/01/2021 21.36, VoteFair wrote:

> On 1/6/2021 4:23 AM, Kristofer Munsterhjelm wrote:
>> My voting simulator, Quadelect (https://github.com/kristomu/quadelect)
>> supports Kemeny. It doesn't currently support STAR, but it does support
>> Smith,Range. It also supports equal-rank ballots.
>
> I looked at this software and I like it!
>
> I was not able to find the portions of code that would indicate which
> tests it does (besides monotonicity, which I did find), and which other
> methods it calculates.  Nor could I find documentation that would point
> out which folders contain that code.

The tests system isn't fully implemented yet; the one thing it can do
currently is test for strategic susceptibility (as defined in my
previous posts), but adding methods requires a recompile. I should fix
that when I have the time.

Just calling quadelect without any parameters will give you a usage
screen, which states that the -M parameter displays a list of every
method that's available. However, it can get pretty cluttered because it
implements set,method and set//method for any set and method I can think
of (e.g. Smith,Plurality or Copeland//DSC).

I should probably put the usage screen up on the repo intro readme as
well as say what functions quadelect has (reading ballots, graphing Yee
diagrams, determining Bayesian regret, and displaying barycentric
fingerprints).

> It looks like it reads ballot info (which Warren's software does not do)
> so if I discover that it does some tests of interest I would consider
> looking at your code for further possibilities.

Yes. For instance, you can do the following after cloning the repo:

cmake .
make
echo "Kemeny(wv)" >methods.txt
echo "Plurality" >>methods.txt
echo "Eliminate-[Plurality]/fd" >>methods.txt
./quadelect -i -if examples/ballots/Burlington2009.txt -m methods.txt

and it will give you the results for Kemeny, Plurality, and IRV:

Kemeny(wv)
Plurality
Eliminate-[Plurality]/fd
Method constraint: loaded 3 methods.
Random number generator: using seed 3632605727039915387
Setting up ballot interpretation...
                - input file: examples/ballots/Burlington2009.txt
                - number of interpreters: 1
                - number of methods: 3
*     0: obj =   1.297800000e+04 inf =   0.000e+00 (12)
*    23: obj =   7.636800000e+04 inf =   0.000e+00 (0)
Done getting results for Kemeny(wv)
Done getting results for Plurality
Done getting results for Eliminate-[Plurality]/fd
Result for Kemeny(wv) : Montroll > Kiss > Wright > Smith > Simpson >
Write-In
Result for Plurality : Wright > Kiss > Montroll > Smith > Write-In > Simpson
Result for Eliminate-[Plurality]/fd : Kiss > Wright > Montroll > Smith >
Write-In > Simpson

> Thank you for bringing it to our attention.

And thank you for pointing out what's missing, from the perspective of
someone who isn't familiar with the program.

-km
----
Election-Methods mailing list - see https://electorama.com/em for list info