| Title: | German Election Database (GERDA) |
|---|---|
| Description: | Provides tools to download datasets of German elections covering local, state, federal, mayoral, European Parliament, and county (Kreistag) elections, with federal county-level coverage from 1953 and other families extending through 2025. The package supplies turnout, vote shares, and derived indicators at the municipal and county level, including geographically harmonized datasets that account for changes in municipal boundaries over time and incorporate mail-in voting districts. Bundled data includes county-level INKAR covariates (1995-2022) and municipality-level Zensus 2022 indicators. Data is sourced from <https://github.com/awiedem/german_election_data>. |
| Authors: | Hanno Hilbig [aut, cre]
|
| Maintainer: | Hanno Hilbig <[email protected]> |
| License: | MIT + file LICENSE |
| Version: | 0.6.0 |
| Built: | 2026-05-15 21:25:40 UTC |
| Source: | https://github.com/hhilbig/gerda |
Convenience function to merge Zensus 2022 municipality-level data with GERDA election data. The census provides a cross-sectional snapshot (2022), so the same values are attached to all election years.
The function works with both municipality-level and county-level election data:
Municipality-level data: Direct merge using 8-digit AGS codes
County-level data: Census data is aggregated to the county level (population-weighted means for shares, sums for counts) before merging
add_gerda_census(election_data)add_gerda_census(election_data)
election_data |
A data frame containing GERDA election data. Must contain
either an |
The input data must contain one of:
ags: 8-digit municipal code for municipality-level data
county_code: 5-digit county code for county-level data
Since the census is a 2022 cross-section, census values are the same for all election years. The merge is on geography only (no year join).
For county-level data, municipality-level census data is first aggregated:
Share variables: Population-weighted means
Count variables (population_census22, total_dwellings_census22): Sums
Other variables (avg_household_size_census22, avg_rent_per_m2_census22): Population-weighted means
The input data frame with additional census columns appended. The number of rows remains unchanged (left join).
gerda_census for direct access to the census data
gerda_census_codebook for variable descriptions
## Not run: library(gerda) # Municipality-level merge muni_data <- load_gerda_web("federal_muni_harm_21") |> add_gerda_census() # County-level merge (aggregated from municipalities) county_data <- load_gerda_web("federal_cty_harm") |> add_gerda_census() ## End(Not run)## Not run: library(gerda) # Municipality-level merge muni_data <- load_gerda_web("federal_muni_harm_21") |> add_gerda_census() # County-level merge (aggregated from municipalities) county_data <- load_gerda_web("federal_cty_harm") |> add_gerda_census() ## End(Not run)
Convenience function to merge INKAR county-level (Kreis) covariates with GERDA election data. This is the recommended way to add covariates, as it automatically uses the correct join keys and prevents common merge errors.
The function works with both county-level and municipal-level election data:
County-level data: Direct merge using county codes
Municipal-level data: Automatically extracts county code from municipal AGS (first 5 digits) and merges
Important: Covariates are always at the county level. When merging with municipal data, all municipalities within the same county will receive identical covariate values.
The function performs a left join, keeping all rows from the election data and adding covariates where available. This automatically retains only election years.
add_gerda_covariates(election_data)add_gerda_covariates(election_data)
election_data |
A data frame containing GERDA election data. Must
contain a column with county or municipal codes (see Details) and
|
The input data must contain election_year and one of:
county_code: 5-digit county code (AGS) for county-level data
ags: 8-digit municipal code (AGS) for municipal-level data
The function automatically detects which column is present and performs the appropriate merge. For municipal data, the county code is extracted from the first 5 digits of the AGS.
Covariates are at the county (Kreis) level:
County-level merge: One-to-one match, each county gets its covariates
Municipal-level merge: Many-to-one match, all municipalities in the same county receive identical covariate values
Covariates are available from 1995-2022. For GERDA federal elections:
Elections 1990, 1994: No covariates (before 1995)
Elections 1998-2021: Covariates available
Some covariates have missing values. Use gerda_covariates_codebook()
to check data availability for specific variables.
The input data frame with additional columns for all 30 county-level covariates. The number of rows remains unchanged (left join).
gerda_covariates for direct access to the covariate data
gerda_covariates_codebook for variable descriptions
load_gerda_web for loading GERDA election data
## Not run: library(gerda) library(dplyr) # Example 1: County-level election data county_data <- load_gerda_web("federal_cty_harm") %>% add_gerda_covariates() # Check the result names(county_data) # See new covariate columns table(county_data$election_year) # Only election years # Example 2: Municipal-level election data # Note: All municipalities in the same county will get identical covariates muni_data <- load_gerda_web("federal_muni_harm_21") %>% add_gerda_covariates() # Verify: municipalities in same county have same covariate values. # The county code is the first 5 digits of the 8-digit municipal AGS. muni_data %>% mutate(county_code = substr(ags, 1, 5)) %>% group_by(county_code, election_year) %>% summarize( n_munis = n(), unemp_range = max(unemployment_rate) - min(unemployment_rate) ) # Analyze with covariates county_data %>% filter(election_year == 2021) %>% filter(!is.na(unemployment_rate)) %>% summarize(cor_unemployment_afd = cor(unemployment_rate, afd)) ## End(Not run)## Not run: library(gerda) library(dplyr) # Example 1: County-level election data county_data <- load_gerda_web("federal_cty_harm") %>% add_gerda_covariates() # Check the result names(county_data) # See new covariate columns table(county_data$election_year) # Only election years # Example 2: Municipal-level election data # Note: All municipalities in the same county will get identical covariates muni_data <- load_gerda_web("federal_muni_harm_21") %>% add_gerda_covariates() # Verify: municipalities in same county have same covariate values. # The county code is the first 5 digits of the 8-digit municipal AGS. muni_data %>% mutate(county_code = substr(ags, 1, 5)) %>% group_by(county_code, election_year) %>% summarize( n_munis = n(), unemp_range = max(unemployment_rate) - min(unemployment_rate) ) # Analyze with covariates county_data %>% filter(election_year == 2021) %>% filter(!is.na(unemployment_rate)) %>% summarize(cor_unemployment_afd = cor(unemployment_rate, afd)) ## End(Not run)
Returns municipality-level demographic and socioeconomic data from the German Census 2022 (Zensus 2022). This is a cross-sectional snapshot covering all German municipalities.
For most users, we recommend using add_gerda_census instead,
which automatically merges census data with GERDA election data.
gerda_census()gerda_census()
The dataset includes:
Demographics: Population, age structure
Migration: Migration background, foreign nationals
Households: Average household size
Housing: Dwellings, vacancy, ownership, rents, building types
Municipality codes are 8-digit AGS codes. Since the census is a single 2022 snapshot, there is no year dimension.
A data frame with approximately 10,800 rows (one per municipality)
and 16 columns containing census indicators. See
gerda_census_codebook for variable descriptions.
add_gerda_census for automatic merging with election data
gerda_census_codebook for variable descriptions
# Get the census data census <- gerda_census() head(census) # Check available municipalities nrow(census)# Get the census data census <- gerda_census() head(census) # Check available municipalities nrow(census)
Returns the data dictionary for municipality-level Census 2022 indicators. Provides variable names, labels, units, and data sources.
gerda_census_codebook()gerda_census_codebook()
A data frame with 16 rows documenting all variables in the census dataset.
gerda_census for the actual census data
# View the codebook codebook <- gerda_census_codebook() print(codebook)# View the codebook codebook <- gerda_census_codebook() print(codebook)
Returns county-level socioeconomic and demographic covariates from INKAR. This function provides flexible access to the raw covariate data for advanced users who want to inspect or manipulate it before merging with county-level election data.
For most users, we recommend using add_gerda_covariates instead,
which automatically performs the merge with correct join keys.
Note: These covariates are at the county (Kreis) level and should be
merged with county-level GERDA data (e.g., federal_cty_harm).
gerda_covariates()gerda_covariates()
The dataset includes 30 socioeconomic and demographic variables:
Demographics: Age structure, foreign population, gender
Economy: GDP, sectoral composition, enterprise structure
Labor Market: Unemployment rates (overall, youth, long-term)
Education: School completion rates, students, apprentices
Income: Purchasing power, low-income households
Healthcare: Physician density, hospital beds, GP density
Childcare: Coverage rates for under-3 and 3-6 age groups
Housing: Building permits, rent levels, living space
Transport: Cars per capita
Public Finances: Municipal debt, tax revenue
County codes are formatted as 5-digit AGS codes matching GERDA's harmonized county codes (2021 boundaries).
A data frame with 11,200 rows and 32 columns containing county-level
covariates for 400 German counties from 1995 to 2022. See
gerda_covariates_codebook for variable descriptions.
add_gerda_covariates for automatic merging (recommended)
gerda_covariates_codebook for variable descriptions
# Get the covariates data (bundled, no network call) covs <- gerda_covariates() # Inspect the data head(covs) summary(covs) # Manual merge (advanced) — downloads election data from GitHub library(dplyr) elections <- load_gerda_web("federal_cty_harm") merged <- elections %>% left_join(covs, by = c("county_code" = "county_code", "election_year" = "year"))# Get the covariates data (bundled, no network call) covs <- gerda_covariates() # Inspect the data head(covs) summary(covs) # Manual merge (advanced) — downloads election data from GitHub library(dplyr) elections <- load_gerda_web("federal_cty_harm") merged <- elections %>% left_join(covs, by = c("county_code" = "county_code", "election_year" = "year"))
Returns the data dictionary for county-level (Kreis) covariates from INKAR. Provides variable names, labels, units, categories, original INKAR codes, and missing data information for all county-level socioeconomic and demographic indicators.
gerda_covariates_codebook()gerda_covariates_codebook()
A data frame with 32 rows documenting all variables in the county covariates dataset.
gerda_covariates for the actual covariate data
# View the full codebook codebook <- gerda_covariates_codebook() print(codebook) # Find variables by category library(dplyr) codebook %>% filter(category == "Demographics") # Find variables with good coverage codebook %>% filter(missing_pct < 5)# View the full codebook codebook <- gerda_covariates_codebook() print(codebook) # Find variables by category library(dplyr) codebook %>% filter(category == "Demographics") # Find variables with good coverage codebook %>% filter(missing_pct < 5)
This function lists the available GERDA data sets. The purpose of this function is to quickly provide a list of available data sets and their descriptions.
gerda_data_list(print_table = TRUE)gerda_data_list(print_table = TRUE)
print_table |
A logical value indicating whether to print the table in the console (TRUE) or return the data as a tibble (FALSE). Default is TRUE. |
In addition to downloadable datasets, the package includes bundled covariate data accessible via dedicated functions:
gerda_covariates: County-level INKAR covariates (1995-2022)
gerda_census: Municipality-level Census 2022 data
A tibble containing the available GERDA data with descriptions. When print_table = TRUE, the function prints a formatted table to the console and invisibly returns the data tibble. When print_table = FALSE, the function directly returns the data tibble.
gerda_data_list()gerda_data_list()
This function loads GERDA data from a web source.
load_gerda_web( file_name, verbose = FALSE, file_format = "rds", on_error = getOption("gerda.on_error", "warn") )load_gerda_web( file_name, verbose = FALSE, file_format = "rds", on_error = getOption("gerda.on_error", "warn") )
file_name |
A character string specifying the name of the file to load. For a list of available data, see |
verbose |
A logical value indicating whether to print additional messages to the console. Default is FALSE. |
file_format |
A character string specifying the format of the file. Must be either "csv" or "rds". Default is "rds". |
on_error |
How to handle errors (unknown dataset name, failed download, corrupt file, invalid |
A tibble containing the loaded data, or NULL if the data could not be loaded.
Election datasets expose one column per party (e.g. cdu, spd, gruene, afd).
These columns hold the party's share of valid votes and are expressed as
fractions of 1. They do not sum to 1 across the named major parties:
the remainder is held by smaller parties with their own columns and, at
the tail, an other category. For example, in federal_cty_harm for 2021,
cdu + csu + spd + gruene + fdp + linke_pds + afd is typically around
0.91 and ranges roughly 0.78 to 0.97 across counties. To reconstruct a
full 1.0 share, include every party column or use other together with
turnout and invalid-vote columns.
# Load harmonized municipal elections data data_municipal_harm <- load_gerda_web("municipal_harm", verbose = TRUE, file_format = "rds") # Load federal election data harmonized to 2025 boundaries (includes 2025 election) data_federal_2025 <- load_gerda_web("federal_muni_harm_25", verbose = TRUE, file_format = "rds")# Load harmonized municipal elections data data_municipal_harm <- load_gerda_web("municipal_harm", verbose = TRUE, file_format = "rds") # Load federal election data harmonized to 2025 boundaries (includes 2025 election) data_federal_2025 <- load_gerda_web("federal_muni_harm_25", verbose = TRUE, file_format = "rds")
Creates a crosswalk between GERDA party names and ParlGov's view_party attributes.
If a party name is not found, the corresponding output element is NA. This function
expects GERDA party names (lowercase, underscores); other naming schemes will mostly
return NA.
party_crosswalk(party_gerda, destination)party_crosswalk(party_gerda, destination)
party_gerda |
A character vector containing the GERDA party names to be converted. |
destination |
A single string naming the target column. Available destinations:
|
A vector of the same length as party_gerda with the mapped values.
party_crosswalk(c("cdu", "spd", "linke_pds", NA), "left_right") party_crosswalk(c("cdu", "afd"), "family_name_short")party_crosswalk(c("cdu", "spd", "linke_pds", NA), "left_right") party_crosswalk(c("cdu", "afd"), "family_name_short")