panelbuild provides tools for auditing, validating, and preparing panel datasets before statistical analysis.
Installation
You can install the development version of panelbuild from GitHub:
# install.packages("remotes")
remotes::install_github("desirajulavanya/panelbuild")Quick start
library(panelbuild)
audit <- audit_panel(example_panel, id = id, time = year)
audit## Panel audit
##
## Data: example_panel
## Unit variable: id
## Time variable: year
##
## Units: 3
## Time periods: 4
## Observed rows: 9
## Observed id-time cells: 8
## Expected id-time cells: 12
## Missing id-time cells: 4
## Duplicate id-time cells: 1
## Balanced panel: No
audit_report(audit)## panelbuild Panel Audit Report
## ==========================
##
## Dataset
## -------
## Data: example_panel
## Unit variable: id
## Time variable: year
##
## Panel structure
## ---------------
## Units: 3
## Time periods: 4
## Observed rows: 9
## Observed unit-time cells: 8
## Expected unit-time cells: 12
## Missing unit-time cells: 4
## Duplicate unit-time cells: 1
## Balanced panel: No
##
## Recommended next steps
## ----------------------
## * Resolve duplicate unit-time observations before completing the panel.
## * Use `duplicate_cells(audit)` or `duplicate_summary()` to inspect duplicates.
## * Inspect missing unit-time cells before estimation.
## * Use `missing_cells(audit)` or `gap_summary()` to review panel gaps.
## * Use `complete_panel()` only after duplicate unit-time cells are resolved.Why panelbuild?
Panel datasets often contain missing unit-time cells, duplicate observations, irregular time gaps, and imbalance. These issues can affect fixed effects models, difference-in-differences designs, event studies, and other panel-data methods.
panelbuild helps researchers identify and document these problems before estimation.
Basic example
library(panelbuild)
data(example_panel)
example_panelAudit a panel dataset
audit_panel(example_panel, id = id, time = year)Find duplicate unit-time observations
duplicate_summary(example_panel, id = id, time = year)Summarize panel gaps
gap_summary(example_panel, id = id, time = year)Flag row-level panel issues
flag_panel_issues(example_panel, id = id, time = year)## # A tibble: 9 × 7
## id year outcome treatment panelbuild_row_id panelbuild_id_time_n
## <dbl> <dbl> <dbl> <dbl> <int> <int>
## 1 1 2020 10 0 1 1
## 2 1 2021 12 1 2 2
## 3 1 2021 13 1 3 2
## 4 2 2020 20 0 4 1
## 5 2 2022 25 1 5 1
## 6 3 2020 30 0 6 1
## 7 3 2021 31 0 7 1
## 8 3 2022 32 1 8 1
## 9 3 2023 33 1 9 1
## # ℹ 1 more variable: panelbuild_duplicate_cell <lgl>Complete the panel grid
complete_panel() creates a full unit-time grid while preserving observed values. It does not impute missing outcomes.
Because complete_panel() requires unique id-time cells, we first create a version of the example data without duplicates.
example_panel_unique <- example_panel |>
dplyr::distinct(id, year, .keep_all = TRUE)
complete_panel(example_panel_unique, id = id, time = year)## # A tibble: 12 × 7
## id year outcome treatment panelbuild_original_row panelbuild_completed_…¹
## <dbl> <dbl> <dbl> <dbl> <lgl> <lgl>
## 1 1 2020 10 0 TRUE FALSE
## 2 1 2021 12 1 TRUE FALSE
## 3 1 2022 NA NA FALSE TRUE
## 4 1 2023 NA NA FALSE TRUE
## 5 2 2020 20 0 TRUE FALSE
## 6 2 2021 NA NA FALSE TRUE
## 7 2 2022 25 1 TRUE FALSE
## 8 2 2023 NA NA FALSE TRUE
## 9 3 2020 30 0 TRUE FALSE
## 10 3 2021 31 0 TRUE FALSE
## 11 3 2022 32 1 TRUE FALSE
## 12 3 2023 33 1 TRUE FALSE
## # ℹ abbreviated name: ¹panelbuild_completed_cell
## # ℹ 1 more variable: panelbuild_audit_action <chr>Main functions
-
audit_panel()gives a full panel diagnostic summary. -
duplicate_summary()finds duplicate unit-time observations. -
gap_summary()summarizes missing time periods by unit. -
flag_panel_issues()flags row-level panel problems. -
complete_panel()creates a complete panel grid without imputing observed variables.