Nationalratswahlen Basel 2023

On 2023-10-22 Nationalratswahlen took place in Switzerland, so also in Basel. The results can be seen on https://www.staatskanzlei.bs.ch/politische-rechte/wahlen-abstimmungen/wahlen-2023.html including the protocols how the 4 Basel seats in the Nationalrat were assigned.

A nice open government data service are the downloadable vote counts from a link on this page. As the Swiss election system is far from simple I thought it informativ to look into the vote counts and see whether I could compute the election results myself and see whether I arrive at the same results as the Basel Staatskanzlei.

Show the code

library(data.tree)
library(dplyr, warn.conflicts = FALSE)
library(philentropy)
library(readxl, warn.conflicts = FALSE)

distinctness <- function(x) philentropy::H(table(x, useNA = 'ifany')/length(x)) / log(length(x), 2)

Data download

If not already in the active R session, I download the Excel file containing the vote counts. Excel file, because it is type safe (“01” != 1) and provides a familiar table format.

Show the code

# read in data if not already present
if (!exists('dat0')) {
  download.file('https://data.bs.ch/api/explore/v2.1/catalog/datasets/100281/exports/xlsx', tf <- tempfile(fileext = '.xlsx'))
  dat0 <- readxl::read_xlsx(tf)
}
dim(dat0)

[1] 610  90

Data scrubbing

This is a pretty wide data table with 90 columns. A description of the columns can be found as “Datensatzschema” in the tab “Information” on https://data.bs.ch/explore/dataset/100281/information. The first columns contain constant values, like wahlbezeichnung “Wahl von vier Mitgliedern des Nationalrates”.

To find and remove such columns I employ the entropy of a column’s values. The distinctness() function defined above returns 0 for columns with constant values and 1 for columns that have a different value for each row in the table (row identifying columns).

Show the code

# determine constant columns, move them to "general" dataframe
entropy0 <- sapply(dat0, distinctness)
general <- unique(dat0[,names(entropy0[entropy0==0])])
general

Show the code

dat1 <- dat0[, setdiff(names(dat0), names(general))]
dim(dat1)

[1] 610  85

Wahlkreise

Among the remaining 85 columns there are 3 that identify Wahlkreise. I am not interested in the individual Wahlkreise (electoral district), but in the “Kanton Basel-Stadt”. So I split dat1 accordingly and concentrate on this pseudo Wahlkreis.

Show the code

## find right Wahlkreis for Kanton
table(dat1$wahlkreisbezeichnung)


  Auslandschweizer Gemeinde Bettingen    Gemeinde Riehen Kanton Basel-Stadt        Stadt Basel 
               122                122                122                122                122

Show the code

# split dat1 by Wahlkreise
wahlkreise <- split(dat1, dat1$wahlkreisbezeichnung)
# concentrate on Kanton
kanton0 <- wahlkreise[['Kanton Basel-Stadt']]
dim(kanton0)

[1] 122  85

scrub kanton

There are a number of constant columns in the kanton data table. Like with the original data table I move them out to a separate table.

Show the code

entropy_kanton <- sapply(kanton0, distinctness)
kanton_general <- unique(kanton0[,names(entropy_kanton[entropy_kanton==0])])

kanton <- kanton0[, setdiff(names(kanton0), names(kanton_general))]
dim(kanton)

[1] 122  67

The entropy shows that the candidates’ name and its equivalents kandidaten_nr and personen_id uniquely determine a single row in the kanton table.

excursion: distribution of candidate ages

Just for fun let us have a look at the distribution of the candidates’ ages:

Show the code

## age distribution of candidates
age_distr <- function(dat, dataspec=NULL) {
  hist(dat$alter_am_jahresende_2023, freq = FALSE,
       main = paste('Altersverteilung der Kandidaten', dataspec), xlab = 'Kandidatenalter',
       breaks = seq(15,90,5))
  lines(density(dat$alter_am_jahresende_2023, adjust=.5), col='blue')
  lines(density(dat$alter_am_jahresende_2023), col='red')
}
age_distr(kanton)

Show the code

# by gender
par(mfrow=c(2,1))
age_distr(subset(kanton, geschlecht=='F'), 'weiblich')
age_distr(subset(kanton, geschlecht=='M'), 'männlich')

Show the code

par(mfrow=c(1,1))

candidate votes

Back to the main topic: number of votes. How many did each candidate collect?

Show the code

kanton %>% 
  dplyr::select(kandidaten_nr, parteikurzbezeichnung, bisher, gewahlt, ganzer_name, stimmen_total_aus_wahlzettel, hlv_nr) %>% 
  dplyr::arrange(desc(stimmen_total_aus_wahlzettel))

Naively one could imagine the seats go to the 4 candidates with the most votes - but that ignores the intention to try to assign seats in a fashion that mirrors the votes for lists; here in Switzerland the most decisive seat distribution factor are the votes for a “Listenverbindung”.

Listen and Listenverbindungen

We have 32 lists, most of them sent in 4 candidates:

Show the code

table(kanton$listen_nr, useNA = 'ifany')  # 32 lists run


01 03 04 05 06 07 08 09 10 11 12 14 17 18 20 21 22 23 24 25 26 27 28 30 31 32 33 34 35 36 37 38 
 4  4  4  4  4  4  4  3  4  3  4  2  4  4  4  4  4  4  4  4  4  4  3  4  3  4  4  4  4  4  4  4

Show the code

(lists <- kanton %>% 
  group_by(listen_nr, hlv_nr, ulv_nr, partei_id, parteikurzbezeichnung, parteibezeichnung) %>% 
  summarise(listenstimmen=mean(kandidatenstimmen_unveranderte_wahlzettel+kandidatenstimmen_veranderte_wahlzettel+
                                zusatzstimmen_unveranderte_wahlzettel+zusatzstimmen_veranderte_wahlzettel),
            kandidates=n()))

`summarise()` has grouped output by 'listen_nr', 'hlv_nr', 'ulv_nr', 'partei_id',
'parteikurzbezeichnung'. You can override using the `.groups` argument.

Looking at the distinctiveness of the columns in table lists we see:

Show the code

sort(sapply(lists, distinctness))

           kandidates                hlv_nr                ulv_nr             partei_id 
            0.1476127             0.3406070             0.5857458             0.7751614 
            listen_nr parteikurzbezeichnung     parteibezeichnung         listenstimmen 
            1.0000000             1.0000000             1.0000000             1.0000000

listen_nr, parteikurzbezeichnung, and parteibezeichnung all identify uniquely the lists. What partei_id is supposed to designate escapes me, the “Datensatzschema” from https://data.bs.ch/explore/dataset/100281/information does not help.

tree structure of Listenverbindungen

There are Hauptlistenverbindungen and Unterlistenverbindungen, though these are not mandatory for lists, they generate a tree structure with 3 major blocks competing for 4 seats.

Analysis of a values in tree structures is not mainstream in R. I employ the package data.tree to do so and first need to construct another column in table lists that indicates the position in the tree:

Show the code

# tree structure of the lists:
lists$pathString <- paste('basel',
                          dplyr::coalesce(lists$hlv_nr, ''),
                          dplyr::coalesce(lists$ulv_nr, ''),
                          lists$listen_nr,
                          sep = '/'
                          )
zz <- data.tree::as.Node(lists)
print(zz, 'parteikurzbezeichnung')

        levelName parteikurzbezeichnung
1  basel                               
2   ¦--1                               
3   ¦   ¦--1                           
4   ¦   ¦   ¦--01                   FDP
5   ¦   ¦   ¦--06                  JFDP
6   ¦   ¦   °--26                  FDP+
7   ¦   ¦--2                           
8   ¦   ¦   ¦--03                   LDP
9   ¦   ¦   ¦--23                   JLB
10  ¦   ¦   °--34               LDP-Gew
11  ¦   ¦--04                       EVP
12  ¦   ¦--4                           
13  ¦   ¦   ¦--07                 Mitte
14  ¦   ¦   ¦--22                JMITTE
15  ¦   ¦   °--36              Mitte60+
16  ¦   °--6                           
17  ¦       ¦--10                   GLP
18  ¦       ¦--25                  JGLP
19  ¦       ¦--32               GLP-K-U
20  ¦       ¦--33                 GLP-B
21  ¦       ¦--35               GLP-R-E
22  ¦       ¦--37               GLP-KMU
23  ¦       °--38                 GLP-I
24  ¦--2                               
25  ¦   ¦--3                           
26  ¦   ¦   ¦--05                    SP
27  ¦   ¦   ¦--20                  JUSO
28  ¦   ¦   °--24                 SP60+
29  ¦   °--5                           
30  ¦       ¦--08                   BGB
31  ¦       ¦--17              Basta-rm
32  ¦       ¦--21                   jgb
33  ¦       °--27              BastA-jA
34  ¦--3                               
35  ¦   ¦--09                       EDU
36  ¦   °--7                           
37  ¦       ¦--12                   SVP
38  ¦       ¦--18                  JSVP
39  ¦       ¦--30               SVP-Gew
40  ¦       °--31                SVP60+
41  ¦--11                           PdA
42  ¦--14                            VA
43  °--28                            MV

Three lists, PdA, VA, and MV, do not belong to any Listenverbindung. There are three Hauptlistenverbindungen:

1 conservatives
2 moderate left
3 SVP

among which the seats have to be distributed.

candidates per Hauptlistenverbindung

Show the code

# number of candidates per Hauptliste
zz$Do( function(x) {
  x$kandidaten_total <- Aggregate(node = x,
                       attribute = "kandidates",
                       aggFun = sum)
                  }
     )
data.tree::Get(zz$children, 'kandidaten_total')

 1  2  3 11 14 28 
68 28 18  3  2  3

Extensive use of Listenverbindungen blows up the number of candidates; Hauptlistenverbindung 1 comes with 68 of these!

top level votes

How many votes did the Hauptlistenverbindungen and the Einzellisten come up with?

Show the code

## top-level votes
(hl_votes <- sort(data.tree::Get(zz$children, function(x) data.tree::Aggregate(x, 'listenstimmen', sum)), decreasing = TRUE))

     2      1      3     28     11     14 
109934  77354  31600   2163   1933   1690

seats per Hauptlistenverbindung

First step in seat distribution is to determine the quorum: the number of votes is divided by number of seats plus 1, then the next higher integer number is set as quorum. The integer part of the division of the votes of Hauptlistenverbindungen by the quorum indicates number of seats given to the Hauplistenverbindung.

It is possible that this first round does not distribute all seats - as is the case here. Votes of the Hauptlistenverbindung then are divided by 1 plus number of seats received and the remaining seat goes to the Hauptlistenverbindung with the highest quotient, here 1.

Show the code

# 1st seat distribution
(quorum1 <- ceiling(sum(hl_votes)/(4+1)))

[1] 44935

Show the code

(seats1 <- floor(hl_votes/quorum1))

 2  1  3 28 11 14 
 2  1  0  0  0  0

Show the code

# 2nd seat distribution
sort(hl_votes/(1+seats1), decreasing = TRUE)

       1        2        3       28       11       14 
38677.00 36644.67 31600.00  2163.00  1933.00  1690.00

Within the Hauptlistenverbindung the seats get distributed according to the same multi-step process first to Unterlistenverbindung, then Liste.

Detailed protocols of this process are provided by the Basel Staatskanzlei, see https://www.staatskanzlei.bs.ch/dam/jcr:302a9016-ef8c-41b8-888a-f6d7332ff1ab/Ausw_5a.pdf.

For a general picture what Swiss voters can do with their ballot paper, see https://www.srf.ch/news/schweiz/wahlhilfe/kumulieren-und-panaschieren-so-waehlt-man-mit-listen.