Re: [EM] Ballot Data Format

classic Classic list List threaded Threaded
12 messages Options
Reply | Threaded
Open this post in threaded view
|

Re: [EM] Ballot Data Format

John Karr
As the author of Vote::Count, a standardized format for ballots would be
a big plus. When I've been able to collect sample data, the first thing
I need to do is convert it to my format. Currently Vote::Count has two
formats, a text one for ranked ballots and a json/yaml format for range
ballots. The documentation on my formats is here:
https://metacpan.org/pod/Vote::Count::ReadBallots

I'm not on Reddit, but I think creating a working group of people with
an interest to propose a standard would be  a great idea, and I'm
interested in helping.

A standard format would allow creation of a library of data for which
electowiki would seem to be a natural home.

On 5/27/21 4:02 PM, [hidden email] wrote:

> Send Election-Methods mailing list submissions to
> [hidden email]
>
> To subscribe or unsubscribe via the World Wide Web, visit
> http://lists.electorama.com/listinfo.cgi/election-methods-electorama.com
>
> or, via email, send a message with subject or body 'help' to
> [hidden email]
>
> You can reach the person managing the list at
> [hidden email]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Election-Methods digest..."
>
>
> Today's Topics:
>
>     1. (no subject) (Rob Lanphier)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 26 May 2021 23:38:14 -0700
> From: Rob Lanphier <[hidden email]>
> To: [hidden email]
> Subject: [EM] (no subject)
> Message-ID:
> <CAK9hOYn2T=ympC7gEd8wS_8S8yjzK==[hidden email]>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi folks,
>
> There's an interesting discussion happening on reddit about ASCII
> formats for aggregated ballot images.  I'll provide a deep link to my
> comment here:
>
> <https://www.reddit.com/r/EndFPTP/comments/nkm2cd/standardizing_cardinal_ballot_notation/gzls6pj/>
>
> What the original reddit poster (/user/jman722) made me realize is
> that it's possible to come up with a format that works for both range
> ballots and ranked ballots.  The range ballots can be on a scale of
> 0-5, where 5 is "awesome", and 0 is "awful".  The ranked ballots can
> be A>B>C.
>
> I'm going to use the example that the original reddit poster made:
>
> 12: Allie/5, Billy/5, Candace/4, Dennis/3, Edith/3, Frank/2, Georgie/1, Harold/0
> 7: Allie/4, Billy/0, Candace/2, Dennis/3, Edith/1, Frank/0, Georgie/5, Harold/3
> 5: Allie/0, Billy/3, Candace/2, Dennis/3, Edith/4, Frank/5, Georgie/3, Harold/4
>
> That format is good but not great.  It takes a careful eye to see that
> Allie, Billy, Frank, and Georgie are the passionate favorites (earning
> a "5" score), and another close look to see that Allie, Billy, Frank,
> and Harold are listed as completely unacceptable (earning a "0" score)
>
> My old format that I used for my 1996 Perl script that I wrote and
> published in The Perl Journal would express those ballots this way:
>
> 12: Allie=Billy>Candace>Dennis=Edith>Frank>Georgie>Harold
> 7: Georgie>Allie>Dennis=Harold>Candace>Edith>Billy=Frank
> 5: Frank>Edith=Harold>Billy=Dennis=Georgie>Candace>Allie
>
> With this format, it becomes clear that 12 voters really like Allie
> and Billy and really don't like Harold.  The next 7 voters really like
> Georgie, and really don't like Billy and Frank.  The remaining 5
> voters really like Frank, but really dislike Allie.  One has to add up
> 12+7+5 to realize there are 24 voters in this election.
>
> The ratings are stripped from my old 1996-ish format.  It only
> provides the following parse tokens:
>
> [quantity]: [cand5yay] [> or =] [cand4good] [> or =] ... [cand0boo]
>
> It seems as though it would be possible to come up with a merged
> format that would express the range ballots above like this:
>
> 12: Allie/5 =Billy/5 >Candace/4 >Dennis/3 =Edith/3 >Frank/2 >Georgie/1 >Harold/0
> 7: Georgie/5 >Allie/4 >Dennis/3 =Harold/3 >Candace/2 >Edith/1 >Billy/0 =Frank/0
> 5: Frank/5 >Edith/4 =Harold/4 >Billy/3 =Dennis/3 =Georgie/3 >Candace/2 >Allie/0
>
> The ">", "=", and "," characters could all be optional delimiters
> between the candidate/score tuples on each line (though at least one
> of those three delimiters WOULD be required). If ">" or "=" is used as
> a delimiter, then the candidates MUST be ordered by score (highest
> score first). Candidate tokens can be one or more ASCII characters
> ([A-Z] or [a-z]) OR the candidate token MUST start with a square
> bracket ([) and end with the closing square bracket (]), and the
> intervening text can be any unicode character (e.g. [Do?a Garc?a
> M?rquez] or [Ximena Pe?a] or [???]) . Whitespace can be discarded, but
> SHOULD be included for legibility.
>
> Linters could be created to deduplicate ballot lines, sort the
> candidate by score on each line, convert commas to ">" and "=" (for
> ranked ballot equivalents), and add whitespace for readability. They
> could optionally normalize the candidates to a range of ASCII letters
> (e.g. changing "Allie" to "A", "Billy" to "B", etc).
>
> The goal would be to make it useful for two people debating whether
> the Condorcet criterion or the Monotonicity criterion is more
> important. They could both easily crank out a set of ballots that
> could be fed into either a ranked-ballot counter or a rated-ballot
> counter. Having the candidate tuples sorted in each line makes it
> clearer what the preferences were of the set of voters represented by
> the given line.
>
> I think that parsers could be written for this format such that they
> follow Postel's Law (a.k.a the "robustness principle"):
> https://en.wikipedia.org/wiki/Robustness_principle
>
> To quote that^: "be conservative in what you do, be liberal in what
> you accept from others"
>
> People trying to express ranked ballots could drop the scores, and
> ONLY include ">" and "=" as a delimiter between candidates,  People
> trying to express rated ballots could use commas (",") instead of ">"
> and "=". Programmers trying to parse handcrafted scenarios could
> figure out how to fill in the blanks.
>
> I'm tempted to write a reference parser for this, but first, what do
> you all think?  Let the list know!  Let me know!  Let reddit know!
> :-D
>
> Thanks
> Rob
>
> p.s.  I'm thinking of calling my version "ABIF", standing for
> "Aggregated Ballot Image Format".  I may just document it here:
> https://electowiki.org/wiki/User:RobLa/ABIF
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Election-Methods mailing list
> [hidden email]
> http://lists.electorama.com/listinfo.cgi/election-methods-electorama.com
>
>
> ------------------------------
>
> End of Election-Methods Digest, Vol 202, Issue 7
> ************************************************

----
Election-Methods mailing list - see https://electorama.com/em for list info
Reply | Threaded
Open this post in threaded view
|

Re: [EM] Ballot Data Format

Carl Schroedl
Hi All,

Another source of inspiration could be the Pivot Libre project's Ballot File Format. Documentation is hosted on GitHub pages.


Issues and contributions can be made on GitHub.


I'm interested in staying in the loop on any ranked ballot community standards efforts. I don't have time to drive the conversation right now.

Whether or not the specifics of BFF are to the community's liking, I recommend considering using a similar approach with GitHub as a collaboration and website site hosting platform. I believe several organizations have used this approach successfully for some fairly complex standards.

Example:


All the best,

Carl

On Thu, May 27, 2021, 3:34 PM John Karr <[hidden email]> wrote:
As the author of Vote::Count, a standardized format for ballots would be
a big plus. When I've been able to collect sample data, the first thing
I need to do is convert it to my format. Currently Vote::Count has two
formats, a text one for ranked ballots and a json/yaml format for range
ballots. The documentation on my formats is here:
https://metacpan.org/pod/Vote::Count::ReadBallots

I'm not on Reddit, but I think creating a working group of people with
an interest to propose a standard would be  a great idea, and I'm
interested in helping.

A standard format would allow creation of a library of data for which
electowiki would seem to be a natural home.

On 5/27/21 4:02 PM, [hidden email] wrote:

> Send Election-Methods mailing list submissions to
>       [hidden email]
>
> To subscribe or unsubscribe via the World Wide Web, visit
>       http://lists.electorama.com/listinfo.cgi/election-methods-electorama.com
>
> or, via email, send a message with subject or body 'help' to
>       [hidden email]
>
> You can reach the person managing the list at
>       [hidden email]
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Election-Methods digest..."
>
>
> Today's Topics:
>
>     1. (no subject) (Rob Lanphier)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Wed, 26 May 2021 23:38:14 -0700
> From: Rob Lanphier <[hidden email]>
> To: [hidden email]
> Subject: [EM] (no subject)
> Message-ID:
>       <CAK9hOYn2T=ympC7gEd8wS_8S8yjzK==[hidden email]>
> Content-Type: text/plain; charset="UTF-8"
>
> Hi folks,
>
> There's an interesting discussion happening on reddit about ASCII
> formats for aggregated ballot images.  I'll provide a deep link to my
> comment here:
>
> <https://www.reddit.com/r/EndFPTP/comments/nkm2cd/standardizing_cardinal_ballot_notation/gzls6pj/>
>
> What the original reddit poster (/user/jman722) made me realize is
> that it's possible to come up with a format that works for both range
> ballots and ranked ballots.  The range ballots can be on a scale of
> 0-5, where 5 is "awesome", and 0 is "awful".  The ranked ballots can
> be A>B>C.
>
> I'm going to use the example that the original reddit poster made:
>
> 12: Allie/5, Billy/5, Candace/4, Dennis/3, Edith/3, Frank/2, Georgie/1, Harold/0
> 7: Allie/4, Billy/0, Candace/2, Dennis/3, Edith/1, Frank/0, Georgie/5, Harold/3
> 5: Allie/0, Billy/3, Candace/2, Dennis/3, Edith/4, Frank/5, Georgie/3, Harold/4
>
> That format is good but not great.  It takes a careful eye to see that
> Allie, Billy, Frank, and Georgie are the passionate favorites (earning
> a "5" score), and another close look to see that Allie, Billy, Frank,
> and Harold are listed as completely unacceptable (earning a "0" score)
>
> My old format that I used for my 1996 Perl script that I wrote and
> published in The Perl Journal would express those ballots this way:
>
> 12: Allie=Billy>Candace>Dennis=Edith>Frank>Georgie>Harold
> 7: Georgie>Allie>Dennis=Harold>Candace>Edith>Billy=Frank
> 5: Frank>Edith=Harold>Billy=Dennis=Georgie>Candace>Allie
>
> With this format, it becomes clear that 12 voters really like Allie
> and Billy and really don't like Harold.  The next 7 voters really like
> Georgie, and really don't like Billy and Frank.  The remaining 5
> voters really like Frank, but really dislike Allie.  One has to add up
> 12+7+5 to realize there are 24 voters in this election.
>
> The ratings are stripped from my old 1996-ish format.  It only
> provides the following parse tokens:
>
> [quantity]: [cand5yay] [> or =] [cand4good] [> or =] ... [cand0boo]
>
> It seems as though it would be possible to come up with a merged
> format that would express the range ballots above like this:
>
> 12: Allie/5 =Billy/5 >Candace/4 >Dennis/3 =Edith/3 >Frank/2 >Georgie/1 >Harold/0
> 7: Georgie/5 >Allie/4 >Dennis/3 =Harold/3 >Candace/2 >Edith/1 >Billy/0 =Frank/0
> 5: Frank/5 >Edith/4 =Harold/4 >Billy/3 =Dennis/3 =Georgie/3 >Candace/2 >Allie/0
>
> The ">", "=", and "," characters could all be optional delimiters
> between the candidate/score tuples on each line (though at least one
> of those three delimiters WOULD be required). If ">" or "=" is used as
> a delimiter, then the candidates MUST be ordered by score (highest
> score first). Candidate tokens can be one or more ASCII characters
> ([A-Z] or [a-z]) OR the candidate token MUST start with a square
> bracket ([) and end with the closing square bracket (]), and the
> intervening text can be any unicode character (e.g. [Do?a Garc?a
> M?rquez] or [Ximena Pe?a] or [???]) . Whitespace can be discarded, but
> SHOULD be included for legibility.
>
> Linters could be created to deduplicate ballot lines, sort the
> candidate by score on each line, convert commas to ">" and "=" (for
> ranked ballot equivalents), and add whitespace for readability. They
> could optionally normalize the candidates to a range of ASCII letters
> (e.g. changing "Allie" to "A", "Billy" to "B", etc).
>
> The goal would be to make it useful for two people debating whether
> the Condorcet criterion or the Monotonicity criterion is more
> important. They could both easily crank out a set of ballots that
> could be fed into either a ranked-ballot counter or a rated-ballot
> counter. Having the candidate tuples sorted in each line makes it
> clearer what the preferences were of the set of voters represented by
> the given line.
>
> I think that parsers could be written for this format such that they
> follow Postel's Law (a.k.a the "robustness principle"):
> https://en.wikipedia.org/wiki/Robustness_principle
>
> To quote that^: "be conservative in what you do, be liberal in what
> you accept from others"
>
> People trying to express ranked ballots could drop the scores, and
> ONLY include ">" and "=" as a delimiter between candidates,  People
> trying to express rated ballots could use commas (",") instead of ">"
> and "=". Programmers trying to parse handcrafted scenarios could
> figure out how to fill in the blanks.
>
> I'm tempted to write a reference parser for this, but first, what do
> you all think?  Let the list know!  Let me know!  Let reddit know!
> :-D
>
> Thanks
> Rob
>
> p.s.  I'm thinking of calling my version "ABIF", standing for
> "Aggregated Ballot Image Format".  I may just document it here:
> https://electowiki.org/wiki/User:RobLa/ABIF
>
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> Election-Methods mailing list
> [hidden email]
> http://lists.electorama.com/listinfo.cgi/election-methods-electorama.com
>
>
> ------------------------------
>
> End of Election-Methods Digest, Vol 202, Issue 7
> ************************************************

----
Election-Methods mailing list - see https://electorama.com/em for list info

----
Election-Methods mailing list - see https://electorama.com/em for list info
Reply | Threaded
Open this post in threaded view
|

Re: [EM] Ballot Data Format

James Gilmour

There is a recognised format for the publication of the full ballot data for STV-PR elections, i.e. BLT format, first described in this paper (at section 3.1):

Hill, Wichmann & Woodall (1987) "Algorithm 123 - Single Transferable Vote by Meek's method"  http://www.dia.govt.nz/diawebsite.NSF/Files/meekm/$file/meekm.pdf

 

The format is quite “primitive” (stream of numbers, with minimal text and no explanation) because it predates even the DOS operating system.

 

The full ballot data, in this preference profile format, have been published by the Returning Officers for every multi-member ward in every one of the 32 Local Authorities in Scotland for the STV-PR elections held in 2012 and 2017.  The data for the wards in the City of Glasgow were published following the 2007 elections although that was not then a legal requirement.

 

Unfortunately, there is (as yet) no central depository of these STV-PR ballot data, but the BLT files are available among the “election results” on the respective 32 Council websites.

 

James Gilmour

Edinburgh, Scotland

 

From: Election-Methods [mailto:[hidden email]] On Behalf Of Carl Schroedl
Sent: 27 May 2021 23:16
To: John Karr <[hidden email]>
Cc: EM <[hidden email]>
Subject: Re: [EM] Ballot Data Format

 

Hi All,

 

Another source of inspiration could be the Pivot Libre project's Ballot File Format. Documentation is hosted on GitHub pages.

Issues and contributions can be made on GitHub.

 

I'm interested in staying in the loop on any ranked ballot community standards efforts. I don't have time to drive the conversation right now.

 

Whether or not the specifics of BFF are to the community's liking, I recommend considering using a similar approach with GitHub as a collaboration and website site hosting platform. I believe several organizations have used this approach successfully for some fairly complex standards.

 

Example:

 

All the best,

Carl

 

On Thu, May 27, 2021, 3:34 PM John Karr <[hidden email]> wrote:

As the author of Vote::Count, a standardized format for ballots would be
a big plus. When I've been able to collect sample data, the first thing
I need to do is convert it to my format. Currently Vote::Count has two
formats, a text one for ranked ballots and a json/yaml format for range
ballots. The documentation on my formats is here:
https://metacpan.org/pod/Vote::Count::ReadBallots

I'm not on Reddit, but I think creating a working group of people with
an interest to propose a standard would be  a great idea, and I'm
interested in helping.

A standard format would allow creation of a library of data for which
electowiki would seem to be a natural home.

On 5/27/21 4:02 PM, [hidden email] wrote:


Virus-free. www.avast.com

----
Election-Methods mailing list - see https://electorama.com/em for list info
Reply | Threaded
Open this post in threaded view
|

Re: [EM] Ballot Data Format

Rob Lanphier-3
In reply to this post by John Karr
Hi everyone

John, thanks for putting the subject on this thread and for the
pointer to Vote::Count.  Carl and James, thanks for the pointers.
I'll endeavor to keep all of you (and everyone on this mailing list)
in the loop on the progress we're making.  The reddit thread
continued, which I'll repeat the important stuff that I said there.

The name (and file extension) for the format that I'm gravitating
toward is ABIF (".abif"), which stands for "aggregated ballot image
format".  I'm using the term "ballot image" because that seems to be
the term of art for publishing real-world electoral results.  Once
upon a time, "ballot image" meant "a picture of the ballot", but now
just means a crude ASCII representation in a line of text.

I did some processing of the ballot images from San Francisco's 2018
mayoral election, which involved some coding and some manual shell
processing with grep and friends.  My work was ugly the way that all
manual futzing in bash is ugly, but I got a few regexps and some test
data (and some experience) that I'm applying here.  As I was
processing the results, I had wished the results were aggregated in an
easier to process manner.  I would love to finish my processing work
and publish it in a sane format that other programmers can use, which
I'm hoping ABIF can become.

I've been working for WAY TOO MANY years on text formats for
aggregated results.  The format that I used in my Perl script
published in the Perl Journal in 1996 was a noble attempt, and was
better (in many ways) than the revised JSON-based format I published
in 2005 with electowidget.  I think my obsession with JSON was an
unfortunate detour in coming up with a good format.  I've been
studying text formats for structured data for a very long time, ever
since I learned Perl (in 1994 or so) and the work intensified with the
work on RTSP in 1996 through 1998.  We briefly dabbled with making
RTSP a binary protocol using ASN.1, but Dr. Henning Schulzrinne from
Columbia University convinced us (by publishing his "RTSP prime"
draft) that a text-based HTTP knockoff (with MIME headers) was the way
to go.  As we worked on RTSP (and SMIL), we saw the rise of XML, and
the slow steady fading of XML as a data format with the advent of
JSON, and YAML, and TOML, and many other simpler formats.

That's my longwinded way of saying that the test cases that I've
published on the ABIF electowiki page are (I think) a respectable
start for a flexible text format that a wide variety of programmers
can get their heads around:
https://electowiki.org/wiki/ABIF

IThe thing that I love about ABIF (as it's shaping up) is that it
solves several big problems which my 2005 electowidget format didn't.
It goes back to the roots of the 1996 Perl script, back before I "knew
what I was doing", and seems simpler to work with for a reasonably new
programmer (as I was in '96).  My electowidget format REQUIRED
ratings, which I would normalize to rankings as appropriate.   But I
was often trying to express elections that only had ranked ballots
available (e.g. the 2003 Debian election, and the 2009 Burlington
mayoral race).  Having a format that ALLOWS for ratings, but doesn't
require ratings seems appropriate given that IRV/RCV is much more
common in municipal use right now than rating systems.  I love that
the format is very similar to the ad hoc format many have used here on
the EM list for expressing rankings.  I'm reasonably sure that writing
a parser in any language that has reasonable regular expression
support will be easy, and can probably be done with a single-pass
parser.  I haven't really started it yet, but I know how to write a
spec that many programmers can look at, nod their heads, and say
"yeah, I can work with that".  I think having a test suite with
well-specified expected output is going to be a key part of solving
the interoperability problem, and it will be helpful for others to
inspect in a piecemeal fashion rather than feeling obligated to read a
ponderously long specification.

In the next few days, I'll take a look at the implementations that
have been mentioned in this thread.  For example: Pivot's format that
Carl pointed to.  It looks to me that the format that Pivot uses is
very similar to this proposed ABIF format, with the only difference I
see (at first glance) is that this ABIF:

   27: A > B > C > D
   26: B > A = C > D
   24: C > A = D > B
   23: D > C > A > B

...would become this in Pivot:
   27 * A > B > C > D
   26 * B > A = C > D
   24 * C > A = D > B
   23 * D > C > A > B

Changing colon (":") to asterisk ("*") is an interesting change to
consider.  I suspect that as we all look more closely at Pivot and
other formats, there's going to be other incompatibilities and mindset
differences to hash out.  These all seem like easily solved problems,
because I get the sense that many programmers are hungry for
compatible solutions in this space, and are willing to write
converters to be part of the compatibility party.

At any rate, I think (if others on the mailing list don't mind) that
we should just use this mailing list and electowiki as places to hash
out the format.  If we do this right, it will be easy enough to use
that people on this mailing list (and over on reddit, and many other
places) will keep ABIF compatibility in mind when they write examples
of elections to consider.  Hopefully having more software
compatibility in our ecosystem will make it easier for us to
collaborate on analysis and speed up reform efforts.

Rob

On Thu, May 27, 2021 at 1:34 PM John Karr <[hidden email]> wrote:

>
> As the author of Vote::Count, a standardized format for ballots would be
> a big plus. When I've been able to collect sample data, the first thing
> I need to do is convert it to my format. Currently Vote::Count has two
> formats, a text one for ranked ballots and a json/yaml format for range
> ballots. The documentation on my formats is here:
> https://metacpan.org/pod/Vote::Count::ReadBallots
>
> I'm not on Reddit, but I think creating a working group of people with
> an interest to propose a standard would be  a great idea, and I'm
> interested in helping.
>
> A standard format would allow creation of a library of data for which
> electowiki would seem to be a natural home.
>
> On 5/27/21 4:02 PM, [hidden email] wrote:
>
> > Send Election-Methods mailing list submissions to
> >       [hidden email]
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >       http://lists.electorama.com/listinfo.cgi/election-methods-electorama.com
> >
> > or, via email, send a message with subject or body 'help' to
> >       [hidden email]
> >
> > You can reach the person managing the list at
> >       [hidden email]
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Election-Methods digest..."
> >
> >
> > Today's Topics:
> >
> >     1. (no subject) (Rob Lanphier)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Wed, 26 May 2021 23:38:14 -0700
> > From: Rob Lanphier <[hidden email]>
> > To: [hidden email]
> > Subject: [EM] (no subject)
> > Message-ID:
> >       <CAK9hOYn2T=ympC7gEd8wS_8S8yjzK==[hidden email]>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Hi folks,
> >
> > There's an interesting discussion happening on reddit about ASCII
> > formats for aggregated ballot images.  I'll provide a deep link to my
> > comment here:
> >
> > <https://www.reddit.com/r/EndFPTP/comments/nkm2cd/standardizing_cardinal_ballot_notation/gzls6pj/>
> >
> > What the original reddit poster (/user/jman722) made me realize is
> > that it's possible to come up with a format that works for both range
> > ballots and ranked ballots.  The range ballots can be on a scale of
> > 0-5, where 5 is "awesome", and 0 is "awful".  The ranked ballots can
> > be A>B>C.
> >
> > I'm going to use the example that the original reddit poster made:
> >
> > 12: Allie/5, Billy/5, Candace/4, Dennis/3, Edith/3, Frank/2, Georgie/1, Harold/0
> > 7: Allie/4, Billy/0, Candace/2, Dennis/3, Edith/1, Frank/0, Georgie/5, Harold/3
> > 5: Allie/0, Billy/3, Candace/2, Dennis/3, Edith/4, Frank/5, Georgie/3, Harold/4
> >
> > That format is good but not great.  It takes a careful eye to see that
> > Allie, Billy, Frank, and Georgie are the passionate favorites (earning
> > a "5" score), and another close look to see that Allie, Billy, Frank,
> > and Harold are listed as completely unacceptable (earning a "0" score)
> >
> > My old format that I used for my 1996 Perl script that I wrote and
> > published in The Perl Journal would express those ballots this way:
> >
> > 12: Allie=Billy>Candace>Dennis=Edith>Frank>Georgie>Harold
> > 7: Georgie>Allie>Dennis=Harold>Candace>Edith>Billy=Frank
> > 5: Frank>Edith=Harold>Billy=Dennis=Georgie>Candace>Allie
> >
> > With this format, it becomes clear that 12 voters really like Allie
> > and Billy and really don't like Harold.  The next 7 voters really like
> > Georgie, and really don't like Billy and Frank.  The remaining 5
> > voters really like Frank, but really dislike Allie.  One has to add up
> > 12+7+5 to realize there are 24 voters in this election.
> >
> > The ratings are stripped from my old 1996-ish format.  It only
> > provides the following parse tokens:
> >
> > [quantity]: [cand5yay] [> or =] [cand4good] [> or =] ... [cand0boo]
> >
> > It seems as though it would be possible to come up with a merged
> > format that would express the range ballots above like this:
> >
> > 12: Allie/5 =Billy/5 >Candace/4 >Dennis/3 =Edith/3 >Frank/2 >Georgie/1 >Harold/0
> > 7: Georgie/5 >Allie/4 >Dennis/3 =Harold/3 >Candace/2 >Edith/1 >Billy/0 =Frank/0
> > 5: Frank/5 >Edith/4 =Harold/4 >Billy/3 =Dennis/3 =Georgie/3 >Candace/2 >Allie/0
> >
> > The ">", "=", and "," characters could all be optional delimiters
> > between the candidate/score tuples on each line (though at least one
> > of those three delimiters WOULD be required). If ">" or "=" is used as
> > a delimiter, then the candidates MUST be ordered by score (highest
> > score first). Candidate tokens can be one or more ASCII characters
> > ([A-Z] or [a-z]) OR the candidate token MUST start with a square
> > bracket ([) and end with the closing square bracket (]), and the
> > intervening text can be any unicode character (e.g. [Do?a Garc?a
> > M?rquez] or [Ximena Pe?a] or [???]) . Whitespace can be discarded, but
> > SHOULD be included for legibility.
> >
> > Linters could be created to deduplicate ballot lines, sort the
> > candidate by score on each line, convert commas to ">" and "=" (for
> > ranked ballot equivalents), and add whitespace for readability. They
> > could optionally normalize the candidates to a range of ASCII letters
> > (e.g. changing "Allie" to "A", "Billy" to "B", etc).
> >
> > The goal would be to make it useful for two people debating whether
> > the Condorcet criterion or the Monotonicity criterion is more
> > important. They could both easily crank out a set of ballots that
> > could be fed into either a ranked-ballot counter or a rated-ballot
> > counter. Having the candidate tuples sorted in each line makes it
> > clearer what the preferences were of the set of voters represented by
> > the given line.
> >
> > I think that parsers could be written for this format such that they
> > follow Postel's Law (a.k.a the "robustness principle"):
> > https://en.wikipedia.org/wiki/Robustness_principle
> >
> > To quote that^: "be conservative in what you do, be liberal in what
> > you accept from others"
> >
> > People trying to express ranked ballots could drop the scores, and
> > ONLY include ">" and "=" as a delimiter between candidates,  People
> > trying to express rated ballots could use commas (",") instead of ">"
> > and "=". Programmers trying to parse handcrafted scenarios could
> > figure out how to fill in the blanks.
> >
> > I'm tempted to write a reference parser for this, but first, what do
> > you all think?  Let the list know!  Let me know!  Let reddit know!
> > :-D
> >
> > Thanks
> > Rob
> >
> > p.s.  I'm thinking of calling my version "ABIF", standing for
> > "Aggregated Ballot Image Format".  I may just document it here:
> > https://electowiki.org/wiki/User:RobLa/ABIF
> >
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > Election-Methods mailing list
> > [hidden email]
> > http://lists.electorama.com/listinfo.cgi/election-methods-electorama.com
> >
> >
> > ------------------------------
> >
> > End of Election-Methods Digest, Vol 202, Issue 7
> > ************************************************
>
> ----
> Election-Methods mailing list - see https://electorama.com/em for list info
----
Election-Methods mailing list - see https://electorama.com/em for list info
Reply | Threaded
Open this post in threaded view
|

Re: [EM] Ballot Data Format

Jan Šimbera
Hi Rob and all,
I like the simultaneous readability, versatility and robustness of ABIF.
If the multiplier separator is made variant (asterisk / colon / both),
I think the only difference between Pivot and ABIF would be the
significance of whitespace in candidate tokens, something that the
parser could also be liberal in accepting - I see the [] bracketed form
as more readable (and thus default) but no problem in making
the non-bracketed form acceptable.
Also, if leaving out the scores for the unordered variant is allowed,
the format will be able to record approval votes in a very readable
form, which I would support.
If the community eventually arrives at some sort of consensus,
I'll put the ABIF read/write implementation into the backlog of
votelib (which can already read BLT and STV formats) so that
it has an interchange format for ordinal ballots as well.
The implementation would benefit from a formal parsing definition
(BNF or similar); if not, I might create it in the process.
All the best,
Jan


On Fri, May 28, 2021 at 11:32 AM Rob Lanphier <[hidden email]> wrote:
Hi everyone

John, thanks for putting the subject on this thread and for the
pointer to Vote::Count.  Carl and James, thanks for the pointers.
I'll endeavor to keep all of you (and everyone on this mailing list)
in the loop on the progress we're making.  The reddit thread
continued, which I'll repeat the important stuff that I said there.

The name (and file extension) for the format that I'm gravitating
toward is ABIF (".abif"), which stands for "aggregated ballot image
format".  I'm using the term "ballot image" because that seems to be
the term of art for publishing real-world electoral results.  Once
upon a time, "ballot image" meant "a picture of the ballot", but now
just means a crude ASCII representation in a line of text.

I did some processing of the ballot images from San Francisco's 2018
mayoral election, which involved some coding and some manual shell
processing with grep and friends.  My work was ugly the way that all
manual futzing in bash is ugly, but I got a few regexps and some test
data (and some experience) that I'm applying here.  As I was
processing the results, I had wished the results were aggregated in an
easier to process manner.  I would love to finish my processing work
and publish it in a sane format that other programmers can use, which
I'm hoping ABIF can become.

I've been working for WAY TOO MANY years on text formats for
aggregated results.  The format that I used in my Perl script
published in the Perl Journal in 1996 was a noble attempt, and was
better (in many ways) than the revised JSON-based format I published
in 2005 with electowidget.  I think my obsession with JSON was an
unfortunate detour in coming up with a good format.  I've been
studying text formats for structured data for a very long time, ever
since I learned Perl (in 1994 or so) and the work intensified with the
work on RTSP in 1996 through 1998.  We briefly dabbled with making
RTSP a binary protocol using ASN.1, but Dr. Henning Schulzrinne from
Columbia University convinced us (by publishing his "RTSP prime"
draft) that a text-based HTTP knockoff (with MIME headers) was the way
to go.  As we worked on RTSP (and SMIL), we saw the rise of XML, and
the slow steady fading of XML as a data format with the advent of
JSON, and YAML, and TOML, and many other simpler formats.

That's my longwinded way of saying that the test cases that I've
published on the ABIF electowiki page are (I think) a respectable
start for a flexible text format that a wide variety of programmers
can get their heads around:
https://electowiki.org/wiki/ABIF

IThe thing that I love about ABIF (as it's shaping up) is that it
solves several big problems which my 2005 electowidget format didn't.
It goes back to the roots of the 1996 Perl script, back before I "knew
what I was doing", and seems simpler to work with for a reasonably new
programmer (as I was in '96).  My electowidget format REQUIRED
ratings, which I would normalize to rankings as appropriate.   But I
was often trying to express elections that only had ranked ballots
available (e.g. the 2003 Debian election, and the 2009 Burlington
mayoral race).  Having a format that ALLOWS for ratings, but doesn't
require ratings seems appropriate given that IRV/RCV is much more
common in municipal use right now than rating systems.  I love that
the format is very similar to the ad hoc format many have used here on
the EM list for expressing rankings.  I'm reasonably sure that writing
a parser in any language that has reasonable regular expression
support will be easy, and can probably be done with a single-pass
parser.  I haven't really started it yet, but I know how to write a
spec that many programmers can look at, nod their heads, and say
"yeah, I can work with that".  I think having a test suite with
well-specified expected output is going to be a key part of solving
the interoperability problem, and it will be helpful for others to
inspect in a piecemeal fashion rather than feeling obligated to read a
ponderously long specification.

In the next few days, I'll take a look at the implementations that
have been mentioned in this thread.  For example: Pivot's format that
Carl pointed to.  It looks to me that the format that Pivot uses is
very similar to this proposed ABIF format, with the only difference I
see (at first glance) is that this ABIF:

   27: A > B > C > D
   26: B > A = C > D
   24: C > A = D > B
   23: D > C > A > B

...would become this in Pivot:
   27 * A > B > C > D
   26 * B > A = C > D
   24 * C > A = D > B
   23 * D > C > A > B

Changing colon (":") to asterisk ("*") is an interesting change to
consider.  I suspect that as we all look more closely at Pivot and
other formats, there's going to be other incompatibilities and mindset
differences to hash out.  These all seem like easily solved problems,
because I get the sense that many programmers are hungry for
compatible solutions in this space, and are willing to write
converters to be part of the compatibility party.

At any rate, I think (if others on the mailing list don't mind) that
we should just use this mailing list and electowiki as places to hash
out the format.  If we do this right, it will be easy enough to use
that people on this mailing list (and over on reddit, and many other
places) will keep ABIF compatibility in mind when they write examples
of elections to consider.  Hopefully having more software
compatibility in our ecosystem will make it easier for us to
collaborate on analysis and speed up reform efforts.

Rob

On Thu, May 27, 2021 at 1:34 PM John Karr <[hidden email]> wrote:
>
> As the author of Vote::Count, a standardized format for ballots would be
> a big plus. When I've been able to collect sample data, the first thing
> I need to do is convert it to my format. Currently Vote::Count has two
> formats, a text one for ranked ballots and a json/yaml format for range
> ballots. The documentation on my formats is here:
> https://metacpan.org/pod/Vote::Count::ReadBallots
>
> I'm not on Reddit, but I think creating a working group of people with
> an interest to propose a standard would be  a great idea, and I'm
> interested in helping.
>
> A standard format would allow creation of a library of data for which
> electowiki would seem to be a natural home.
>
> On 5/27/21 4:02 PM, [hidden email] wrote:
>
> > Send Election-Methods mailing list submissions to
> >       [hidden email]
> >
> > To subscribe or unsubscribe via the World Wide Web, visit
> >       http://lists.electorama.com/listinfo.cgi/election-methods-electorama.com
> >
> > or, via email, send a message with subject or body 'help' to
> >       [hidden email]
> >
> > You can reach the person managing the list at
> >       [hidden email]
> >
> > When replying, please edit your Subject line so it is more specific
> > than "Re: Contents of Election-Methods digest..."
> >
> >
> > Today's Topics:
> >
> >     1. (no subject) (Rob Lanphier)
> >
> >
> > ----------------------------------------------------------------------
> >
> > Message: 1
> > Date: Wed, 26 May 2021 23:38:14 -0700
> > From: Rob Lanphier <[hidden email]>
> > To: [hidden email]
> > Subject: [EM] (no subject)
> > Message-ID:
> >       <CAK9hOYn2T=ympC7gEd8wS_8S8yjzK==[hidden email]>
> > Content-Type: text/plain; charset="UTF-8"
> >
> > Hi folks,
> >
> > There's an interesting discussion happening on reddit about ASCII
> > formats for aggregated ballot images.  I'll provide a deep link to my
> > comment here:
> >
> > <https://www.reddit.com/r/EndFPTP/comments/nkm2cd/standardizing_cardinal_ballot_notation/gzls6pj/>
> >
> > What the original reddit poster (/user/jman722) made me realize is
> > that it's possible to come up with a format that works for both range
> > ballots and ranked ballots.  The range ballots can be on a scale of
> > 0-5, where 5 is "awesome", and 0 is "awful".  The ranked ballots can
> > be A>B>C.
> >
> > I'm going to use the example that the original reddit poster made:
> >
> > 12: Allie/5, Billy/5, Candace/4, Dennis/3, Edith/3, Frank/2, Georgie/1, Harold/0
> > 7: Allie/4, Billy/0, Candace/2, Dennis/3, Edith/1, Frank/0, Georgie/5, Harold/3
> > 5: Allie/0, Billy/3, Candace/2, Dennis/3, Edith/4, Frank/5, Georgie/3, Harold/4
> >
> > That format is good but not great.  It takes a careful eye to see that
> > Allie, Billy, Frank, and Georgie are the passionate favorites (earning
> > a "5" score), and another close look to see that Allie, Billy, Frank,
> > and Harold are listed as completely unacceptable (earning a "0" score)
> >
> > My old format that I used for my 1996 Perl script that I wrote and
> > published in The Perl Journal would express those ballots this way:
> >
> > 12: Allie=Billy>Candace>Dennis=Edith>Frank>Georgie>Harold
> > 7: Georgie>Allie>Dennis=Harold>Candace>Edith>Billy=Frank
> > 5: Frank>Edith=Harold>Billy=Dennis=Georgie>Candace>Allie
> >
> > With this format, it becomes clear that 12 voters really like Allie
> > and Billy and really don't like Harold.  The next 7 voters really like
> > Georgie, and really don't like Billy and Frank.  The remaining 5
> > voters really like Frank, but really dislike Allie.  One has to add up
> > 12+7+5 to realize there are 24 voters in this election.
> >
> > The ratings are stripped from my old 1996-ish format.  It only
> > provides the following parse tokens:
> >
> > [quantity]: [cand5yay] [> or =] [cand4good] [> or =] ... [cand0boo]
> >
> > It seems as though it would be possible to come up with a merged
> > format that would express the range ballots above like this:
> >
> > 12: Allie/5 =Billy/5 >Candace/4 >Dennis/3 =Edith/3 >Frank/2 >Georgie/1 >Harold/0
> > 7: Georgie/5 >Allie/4 >Dennis/3 =Harold/3 >Candace/2 >Edith/1 >Billy/0 =Frank/0
> > 5: Frank/5 >Edith/4 =Harold/4 >Billy/3 =Dennis/3 =Georgie/3 >Candace/2 >Allie/0
> >
> > The ">", "=", and "," characters could all be optional delimiters
> > between the candidate/score tuples on each line (though at least one
> > of those three delimiters WOULD be required). If ">" or "=" is used as
> > a delimiter, then the candidates MUST be ordered by score (highest
> > score first). Candidate tokens can be one or more ASCII characters
> > ([A-Z] or [a-z]) OR the candidate token MUST start with a square
> > bracket ([) and end with the closing square bracket (]), and the
> > intervening text can be any unicode character (e.g. [Do?a Garc?a
> > M?rquez] or [Ximena Pe?a] or [???]) . Whitespace can be discarded, but
> > SHOULD be included for legibility.
> >
> > Linters could be created to deduplicate ballot lines, sort the
> > candidate by score on each line, convert commas to ">" and "=" (for
> > ranked ballot equivalents), and add whitespace for readability. They
> > could optionally normalize the candidates to a range of ASCII letters
> > (e.g. changing "Allie" to "A", "Billy" to "B", etc).
> >
> > The goal would be to make it useful for two people debating whether
> > the Condorcet criterion or the Monotonicity criterion is more
> > important. They could both easily crank out a set of ballots that
> > could be fed into either a ranked-ballot counter or a rated-ballot
> > counter. Having the candidate tuples sorted in each line makes it
> > clearer what the preferences were of the set of voters represented by
> > the given line.
> >
> > I think that parsers could be written for this format such that they
> > follow Postel's Law (a.k.a the "robustness principle"):
> > https://en.wikipedia.org/wiki/Robustness_principle
> >
> > To quote that^: "be conservative in what you do, be liberal in what
> > you accept from others"
> >
> > People trying to express ranked ballots could drop the scores, and
> > ONLY include ">" and "=" as a delimiter between candidates,  People
> > trying to express rated ballots could use commas (",") instead of ">"
> > and "=". Programmers trying to parse handcrafted scenarios could
> > figure out how to fill in the blanks.
> >
> > I'm tempted to write a reference parser for this, but first, what do
> > you all think?  Let the list know!  Let me know!  Let reddit know!
> > :-D
> >
> > Thanks
> > Rob
> >
> > p.s.  I'm thinking of calling my version "ABIF", standing for
> > "Aggregated Ballot Image Format".  I may just document it here:
> > https://electowiki.org/wiki/User:RobLa/ABIF
> >
> >
> > ------------------------------
> >
> > Subject: Digest Footer
> >
> > _______________________________________________
> > Election-Methods mailing list
> > [hidden email]
> > http://lists.electorama.com/listinfo.cgi/election-methods-electorama.com
> >
> >
> > ------------------------------
> >
> > End of Election-Methods Digest, Vol 202, Issue 7
> > ************************************************
>
> ----
> Election-Methods mailing list - see https://electorama.com/em for list info
----
Election-Methods mailing list - see https://electorama.com/em for list info

----
Election-Methods mailing list - see https://electorama.com/em for list info
Reply | Threaded
Open this post in threaded view
|

Re: [EM] Ballot Data Format

Neal McBurnett
In reply to this post by Rob Lanphier-3
I too am an aficionado of election data formats, and have been active e.g. on the working groups run by NIST for the Election Assistance Commission, and previous related efforts by OASIS and IEEE years before that.

I appreciate that the goal here is a format that is
  * human-readable and easy to enter (e.g. for mailing list discussions), and
  * easily parsed (e.g. for software imulations).

The other working groups have been focused on standards for voting system use, and only recently have even handled ranked and rated methods.  E.g. the comprehensive, but verbose and not-very-human-readable CVR standard documented at:

 Cast Vote Records Common Data Format Specification Version 1.0 | NIST
  https://www.nist.gov/publications/cast-vote-records-common-data-format-specification-version-10

Thankfully we did at least get both ratings and rankings supported there.

At the moment, hopefully early enough in this discussion, I want to push back on the use of the term "ballot image" to mean something other than a graphical representation. The latter is how it is defined in the latest Voluntary Voting System Guidelines 2.0.
 https://www.eac.gov/sites/default/files/TestingCertification/Voluntary_Voting_System_Guidelines_Version_2_0.pdf

> ballot image: Archival digital image (e.g. JPEG, PDF, etc.) captured from one or more sides of a paper ballot cast by an individual voter.

I know all too well that there was a time (before scanners produced images) that some folks and even standards decided to use the term "ballot image" to mean what we now usually call a "cast vote record". But calling that an "image" is counterintuitivefor the general public, and that usaage has been on the way out for a long time. At the same time, all the recent talk of "ballot images" is about making the graphical representations that all modern voting systems use internally available for quality control and other purposes.

So I'm delighted to see a more convenient format for rated methods, but I'd ask that you not put the word "image" in the name.

Cheers,

Neal McBurnett                 http://neal.mcburnett.org/

On Fri, May 28, 2021 at 02:31:25AM -0700, Rob Lanphier wrote:

> The name (and file extension) for the format that I'm gravitating
> toward is ABIF (".abif"), which stands for "aggregated ballot image
> format".  I'm using the term "ballot image" because that seems to be
> the term of art for publishing real-world electoral results.  Once
> upon a time, "ballot image" meant "a picture of the ballot", but now
> just means a crude ASCII representation in a line of text.
>
> I did some processing of the ballot images from San Francisco's 2018
> mayoral election, which involved some coding and some manual shell
> processing with grep and friends.  My work was ugly the way that all
> manual futzing in bash is ugly, but I got a few regexps and some test
> data (and some experience) that I'm applying here.  As I was
> processing the results, I had wished the results were aggregated in an
> easier to process manner.  I would love to finish my processing work
> and publish it in a sane format that other programmers can use, which
> I'm hoping ABIF can become.
----
Election-Methods mailing list - see https://electorama.com/em for list info
Reply | Threaded
Open this post in threaded view
|

Re: [EM] Ballot Data Format

Rob Lanphier-3
Jan and Neal,

Thank you for reviving this thread!  This is awesome!  I owe each of you a more detailed reply, which I will do under separate cover.

Rob

----
Election-Methods mailing list - see https://electorama.com/em for list info
Reply | Threaded
Open this post in threaded view
|

Re: [EM] Ballot Data Format

John Karr
In reply to this post by John Karr

After creating two native formats for a Vote Counting program (both designed to be the minimum that could implement the ballots I needed at the moment, not suitable as a standard), what I would propose is an expressive format that can be implemented identically in either JSON or YAML.

A record would need two top level keys HEADER and BALLOTS.

The HEADER would contain fields used to describe the election. These might include an alphanumeric identifier for the ballot set itself, keys for date and location of the election, division (if a partial) and a candidate key that would allow assigning a short identifier to each choice. If a Choice is "Maine Blueberry Sorbet, Maggie\'s Ice Cream and Confectionery, Collinsport, Maine, USA" or "抹茶アイスクリーム、京都ナンバーワンアイスクリーム、京都、日本" a shorter Numeric or ASCII Alpha-Numeric identifier is desirable for the BALLOTS section, the long field could default to the short one when the long was omitted.

The BALLOTS section would have several different structures defined. regular RCV ballots are easily represented with a simple array, Range Ballots need to be key value pairs, RCV ballots allowing equal preferences are different, etc. The ballot format would be indicated in the HEADER. This would also help future proof the format, permitting revision to add new ballot structures if they are ever needed.

If anyone on this list has experience with the procedures of bodies (like ietf) that set standards is willing to volunteer to help organize a working group, I would be eager to participate.

ABIF is a good acronym / short name for such a standard.

----------------------------------------------------------------------
Date: Sun, 6 Jun 2021 13:29:46 +0200
From: Jan ?imbera [hidden email]
Subject: Re: [EM] Ballot Data Format

Hi Rob and all,
I like the simultaneous readability, versatility and robustness of ABIF.
If the multiplier separator is made variant (asterisk / colon / both),
I think the only difference between Pivot and ABIF would be the
significance of whitespace in candidate tokens, something that the
parser could also be liberal in accepting - I see the [] bracketed form
as more readable (and thus default) but no problem in making
the non-bracketed form acceptable.
Also, if leaving out the scores for the unordered variant is allowed,
the format will be able to record approval votes in a very readable
form, which I would support.
If the community eventually arrives at some sort of consensus,
I'll put the ABIF read/write implementation into the backlog of
votelib (which can already read BLT and STV formats) so that
it has an interchange format for ordinal ballots as well.
The implementation would benefit from a formal parsing definition
(BNF or similar); if not, I might create it in the process.
All the best,
Jan

----
Election-Methods mailing list - see https://electorama.com/em for list info
Reply | Threaded
Open this post in threaded view
|

Re: [EM] Ballot Data Format

VoteFair-2
In reply to this post by Neal McBurnett
I agree with Neal that the word "image" should not be part of the ABIF name.

I suggest that ABIF can stand for:

"Aggregated Ballot Information Format"

(And perhaps another appropriate "A" word can be used instead of
"aggregated," if needed.)

I also suggest adding a case number.

A case number allows the ballot data to be processed through separate
vote-counting software while the metadata -- such as precinct number,
political-party affiliations, etc. -- can follow a different path and be
re-joined to produce the published results.

In particular, my vote-counting software focuses on the numbers/counts,
and I use different software (written in my Dashrep programming
language) to process the text info.

The use of a case number also has other benefits.

Otherwise, bravo for making it easier to share and transfer data between
vote-counting programs.

Richard Fobes


On 6/6/2021 1:00 PM, Neal McBurnett wrote:
 > At the moment, hopefully early enough in this discussion, I want
 > to push back on the use of the term "ballot image" to mean
 > something other than a graphical representation. The latter is
 > how it is defined in the latest Voluntary Voting System
 > Guidelines 2.0.
 > ...

 > On Fri, May 28, 2021 at 02:31:25AM -0700, Rob Lanphier wrote:
 >> The name (and file extension) for the format that I'm gravitating
 >> toward is ABIF (".abif"), which stands for "aggregated ballot image
 >> format".  I'm using the term "ballot image" because that seems to be
 >> the term of art for publishing real-world electoral results.  Once
 >> upon a time, "ballot image" meant "a picture of the ballot", but now
 >> just means a crude ASCII representation in a line of text.
 >> ...

On 6/6/2021 1:00 PM, Neal McBurnett wrote:

> I too am an aficionado of election data formats, and have been active e.g. on the working groups run by NIST for the Election Assistance Commission, and previous related efforts by OASIS and IEEE years before that.
>
> I appreciate that the goal here is a format that is
>   * human-readable and easy to enter (e.g. for mailing list discussions), and
>   * easily parsed (e.g. for software imulations).
>
> The other working groups have been focused on standards for voting system use, and only recently have even handled ranked and rated methods.  E.g. the comprehensive, but verbose and not-very-human-readable CVR standard documented at:
>
>  Cast Vote Records Common Data Format Specification Version 1.0 | NIST
>   https://www.nist.gov/publications/cast-vote-records-common-data-format-specification-version-10
>
> Thankfully we did at least get both ratings and rankings supported there.
>
> At the moment, hopefully early enough in this discussion, I want to push back on the use of the term "ballot image" to mean something other than a graphical representation. The latter is how it is defined in the latest Voluntary Voting System Guidelines 2.0.
>  https://www.eac.gov/sites/default/files/TestingCertification/Voluntary_Voting_System_Guidelines_Version_2_0.pdf
>
>> ballot image: Archival digital image (e.g. JPEG, PDF, etc.) captured from one or more sides of a paper ballot cast by an individual voter.
>
> I know all too well that there was a time (before scanners produced images) that some folks and even standards decided to use the term "ballot image" to mean what we now usually call a "cast vote record". But calling that an "image" is counterintuitivefor the general public, and that usaage has been on the way out for a long time. At the same time, all the recent talk of "ballot images" is about making the graphical representations that all modern voting systems use internally available for quality control and other purposes.
>
> So I'm delighted to see a more convenient format for rated methods, but I'd ask that you not put the word "image" in the name.
>
> Cheers,
>
> Neal McBurnett                 http://neal.mcburnett.org/
>
> On Fri, May 28, 2021 at 02:31:25AM -0700, Rob Lanphier wrote:
>> The name (and file extension) for the format that I'm gravitating
>> toward is ABIF (".abif"), which stands for "aggregated ballot image
>> format".  I'm using the term "ballot image" because that seems to be
>> the term of art for publishing real-world electoral results.  Once
>> upon a time, "ballot image" meant "a picture of the ballot", but now
>> just means a crude ASCII representation in a line of text.
>>
>> I did some processing of the ballot images from San Francisco's 2018
>> mayoral election, which involved some coding and some manual shell
>> processing with grep and friends.  My work was ugly the way that all
>> manual futzing in bash is ugly, but I got a few regexps and some test
>> data (and some experience) that I'm applying here.  As I was
>> processing the results, I had wished the results were aggregated in an
>> easier to process manner.  I would love to finish my processing work
>> and publish it in a sane format that other programmers can use, which
>> I'm hoping ABIF can become.
> ----
> Election-Methods mailing list - see https://electorama.com/em for list info
>
----
Election-Methods mailing list - see https://electorama.com/em for list info
Reply | Threaded
Open this post in threaded view
|

[EM] ABIF: Optional use of asterisk instead of colon (Re: Ballot Data Format)

Rob Lanphier-3
In reply to this post by Jan Šimbera
Hi Jan,

Thanks again for reviving this thread.  I agree with you that ABIF has
potential to be simultaneously readable, versatile, and robust, as
well as easy to type for people discussing elections in online forums
(and reasonably easy to embed in an email). We need to be ruthless
about cutting features to keep it simple (more on this in a bit).

There's other stuff to cover in your email, but I'm going to focus on
the asterisk vs colon issue in this email, because I think it's
important to get it right now (before too many parsers and writers get
written).  More below:

On Sun, Jun 6, 2021 at 4:30 AM Jan Šimbera <[hidden email]> wrote:
> If the multiplier separator is made variant (asterisk / colon / both), [...]

My initial instinct is to say "sure!  let's allow many delimiters!",
because that seems easy enough at this stage, and getting
implementations like Pivot on board will help adoption.  But the more
optional representations that we add, the more complicated that
writing parsers for the format is going to be.  Perhaps we should
replace the colon with an asterisk.  I'm not sure.  We should decide
whether we're going to have an optional (non-preferred) delimiter, and
a delimiter that is preferred for software emitting this format.
Small decisions now will have a big impact later.

So let's talk about asterisks.  What you seem to be suggesting is that
ALL of the following test cases should be valid:

A - inlined labels with colon
   27: [Doña García Márquez]/5, [Steven B. Jensen]/2, [Sue Ye (蘇業)]/1,
[Adam Muñoz]/0
   26: [Doña García Márquez]/3, [Steven B. Jensen]/5, [Sue Ye (蘇業)]/3,
[Adam Muñoz]/1
   24: [Doña García Márquez]/2, [Steven B. Jensen]/1, [Sue Ye (蘇業)]/5,
[Adam Muñoz]/2
   23: [Doña García Márquez]/1, [Steven B. Jensen]/0, [Sue Ye (蘇業)]/3,
[Adam Muñoz]/5

B - inlined labels with asterisk
   27 * [Doña García Márquez]/5, [Steven B. Jensen]/2, [Sue Ye
(蘇業)]/1, [Adam Muñoz]/0
   26 * [Doña García Márquez]/3, [Steven B. Jensen]/5, [Sue Ye
(蘇業)]/3, [Adam Muñoz]/1
   24 * [Doña García Márquez]/2, [Steven B. Jensen]/1, [Sue Ye
(蘇業)]/5, [Adam Muñoz]/2
   23 * [Doña García Márquez]/1, [Steven B. Jensen]/0, [Sue Ye
(蘇業)]/3, [Adam Muñoz]/5

C - labels in header with colon delimiter   [Doña García Márquez]: DGM
   [Steven B. Jensen]: SBJ
   [Sue Ye (蘇業)]: SY
   [Adam Muñoz]: AM

   27: DGM/5, SBJ/2, SY/1, AM/0
   26: DGM/3, SBJ/5, SY/3, AM/1
   24: DGM/2, SBJ/1, SY/5, AM/2
   23: DGM/1, SBJ/0, SY/3, AM/5

D - labels in header with asterisk delimiter
   [Doña García Márquez]: DGM
   [Steven B. Jensen]: SBJ
   [Sue Ye (蘇業)]: SY
   [Adam Muñoz]: AM

   27 * DGM/5, SBJ/2, SY/1, AM/0
   26 * DGM/3, SBJ/5, SY/3, AM/1
   24 * DGM/2, SBJ/1, SY/5, AM/2
   23 * DGM/1, SBJ/0, SY/3, AM/5

(Examples A, C, and D above are equivalent to test cases 4, 5, and 9
on the ABIF wiki page: <https://electowiki.org/wiki/ABIF>)

All of the examples above (A, B, C, and D) seem easy enough to write
parsing code for.  Still, we need to be opinionated about what format
new ABIF writers should emit, so that we can steer people toward a
single, simple format, and possibly deprecate unused aspects of the
format.  We also want to reduce the complexity of the format now so
that we can add stuff after the format is more widely-used.

You'll note that there's a whitespace difference between examples C
and D.  When using colon, it seems more natural to put the colon
directly after the number.  For asterisk, it seems better to put a
space in to make it clear that it's not intended to be used as a
footnote (like the dagger † and double-dagger ‡ frequently are as
well).  Note that many fonts display asterisks in superscripted
format, which makes it a confusing replacement for the multiplication
symbol ("×") for non-programmers.  Of course programmers look at
asterisk and immediately think of multiplication, but then also start
thinking about the order of operations, and want to see a parenthesis
or two to make it clear that each line isn't being multiplied by the
first candidate.

I'd love to have Pivot on board for ABIF compatibility, so (for now)
I'm publishing a test case on the ABIF page for asterisk being used
instead of colon.  As of right now, I personally prefer colon, but
could be convinced that asterisk is a better choice.  Regardless, we
should decide which one is the preferred delimiter and which one is
the optional delimiter.

I filed this as an issue in the Electorama github issue tracker for ABIF here:
<https://github.com/electorama/abif/issues/3>

We can continue discussing this on the mailing list, but we can also
discuss it on GitHub.  I'm fine with either for now, since the ABIF
repository on GitHub is new (I made it a few hours ago).

Rob
----
Election-Methods mailing list - see https://electorama.com/em for list info
Reply | Threaded
Open this post in threaded view
|

[EM] ABIF: Naming (Re: Ballot Data Format)

Rob Lanphier-3
In reply to this post by Neal McBurnett
Hi Neal,

It would be delightful to schedule a phone call or something to talk
about your experience working with NIST/OASIS/IEEE/etc on this.  I've
worked with all of them as well (a little bit), but my focus (when I
was doing standards stuff) was IETF and W3C, and it was unrelated to
voting systems.  I was paying attention to this stuff when W3C
splintered off of IETF, and it's been interesting watching WHATWG's
on-again/off-again relationship with W3C.  But we should take that
discussion off of this list.

There's a lot to talk about in your email, but I'm going to zero in on
one particular item for this email:

On Sun, Jun 6, 2021 at 1:01 PM Neal McBurnett <[hidden email]> wrote:
> At the moment, hopefully early enough in this discussion, I want to push
> back on the use of the term "ballot image" to mean something other than
> a graphical representation.

As much as I enjoy naming discussions (ALMOST more than I enjoy eating
glass), you're right.  Now is the time to change the name if it's
going to get changed.  As the old joke goes, there are only two hard
problems in computer science: cache invalidation, naming things, and
off-by-one errors.  We'll focus on the second one for now. :-)

Speaking of "two hard problems", I've given this the honorary "#2"
spot in the issue tracker:
https://github.com/electorama/abif/issues/2

Something tells me we're going to have more than two hard problems
with ABIF.  I still want to call it ABIF even if we decide that "I"
doesn't stand for "Image".  We have other alternatives (more on that
as a bit).

The reason for sticking with "ballot image" in the name:  it seems
likely that some jurisdictions (like San Francisco) may have codified
the term "ballot image" to refer to a single line of ASCII text
representing a ballot, based on this:
https://sfelections.sfgov.org/june-5-2018-election-results-detailed-reports

...and this:
https://www.sfelections.org/results/20180605/data/BallotImageRCVhelp.pdf

It seems as though it's a term of art that has been used in many
places, even if it doesn't make sense.  Given that San Francisco was
arguably where the term "Ranked Choice Voting" was invented (and they
got FairVote to change from "Instant Runoff Voting" to that name), I'm
not sure I want to try to fight the powers that be in SF on a naming
dispute.

In fairness to your point of view, it would seem that the "Voluntary
Voting System Guidelines" that you cited (published by Election
Assistance Commission - EAC) seems to draw a distinction between
"ballot image" and "ballot record":
 https://www.eac.gov/sites/default/files/TestingCertification/Voluntary_Voting_System_Guidelines_Version_2_0.pdf

That's a pretty compelling reason to stop using "ballot image" for
anything that isn't a scan or photograph of some variety.

This could become a BRAF (Ballot Record Aggregation Format).  I like
that name too, but I'll still call it ABIF unless there's a flurry of
BRAF support.  I like having "Aggregation" in the name to make it
clear that this format is for (optionally) expressing the stated
preferences of many voters (hundreds, thousands, more) in a single,
readable line of ASCII text.

It seems like we can make "ABIF" work. Renaming ABIF at this point is
going to create some busywork for me; not nearly as much as it would
if we wait, but I've already have many little pockets of things with
"ABIF" on it (e.g. https://github.com/electorama/abif ).  Let's not
rename this thing a lot. I'm okay with changing what the "I" stands
for (if we really must)  "Ballot Information", "Ballot Inventory", and
"Ballot Igloo" all work for me (well, maybe only two of those three,
but you'll need to guess which one)  :-D

Assuming you're signed up on GitHub, let's continue the conversation there:
https://github.com/electorama/abif/issues/2

We can also use the mailing list, but I suspect that other mailing
list members will appreciate this list not becoming the ABIF naming
mailing list (or the BRAF mailing list, or whatever we call it).

Rob
----
Election-Methods mailing list - see https://electorama.com/em for list info
Reply | Threaded
Open this post in threaded view
|

[EM] Metadata in ABIF (Re: Ballot Data Format)

Rob Lanphier-3
In reply to this post by VoteFair-2
Hi Richard,

Thanks for weighing in on this thread, and more inline below:

On Sun, Jun 6, 2021 at 5:20 PM VoteFair <[hidden email]> wrote:
> I agree with Neal that the word "image" should not be part of the ABIF name.
>
> I suggest that ABIF can stand for:
>
> "Aggregated Ballot Information Format"

That's what we're calling it now!  See:
https://github.com/electorama/abif/issues/2

...which is now "closed" (and represents the first closed issue in our
little GitHub repo)

> (And perhaps another appropriate "A" word can be used instead of
> "aggregated," if needed.)

I still like "aggregated", but "A" is a versatile letter.  We'll be
able to bacronym all sorts of words into the ABIF abbreviation.  I'm
starting to like "Aggravated" already.  :-D  Also, do you like how I
verbed the word "bacronym"?  Englishing is fun!!
<https://www.gocomics.com/calvinandhobbes/1993/01/25>

> I also suggest adding a case number.
>
> A case number allows the ballot data to be processed through separate
> vote-counting software while the metadata -- such as precinct number,
> political-party affiliations, etc. -- can follow a different path and be
> re-joined to produce the published results.

I didn't realize what you were talking about at first by "case
number", but are you suggesting "case" as in "a case of ballots",
which can be stacked on a pallet to create a "pallet of ballots", and
then moved around a "warehouse of ballots"?

I've been referring to each line in an ABIF file that starts with the
quantity field as a "bundle of ballots".  Something tells me that
English is becoming an obstacle to understanding in this conversation
:-)

Regardless, we need to come up with a standard way of embedding
metadata into ABIF files (like some of the extended data that you need
that most ABIF users won't need at first).  I've filed an issue in the
ABIF issue tracker to discuss this further:
https://github.com/electorama/abif/issues/6

> In particular, my vote-counting software focuses on the numbers/counts,
> and I use different software (written in my Dashrep programming
> language) to process the text info.

Oh, interesting!  I wonder if we should have an optional metadata line
at the top of an ABIF file that would work look something like this:

@ABIF - {"SerialNumber" : 001}

The rules for parsing the second line and beyond could be different
than the first line.  This seems to be something we should consider.

Anyway, thanks for weighing in on this!  Let's keep discussing this
over in the issue tracker for this project
(<https://github.com/electorama/abif/issues>)

Rob
p.s. I've been getting an ABIF test suite stubbed out, and I'm
starting to flesh out a few Python-based tests (in pytest) for a
ballot-counting script.
----
Election-Methods mailing list - see https://electorama.com/em for list info