Data Science

Hosted datasets and visual analysis.

A place for empirical projects that pair reproducible data with direct, inspectable presentation. The featured analysis now ships as an auditable hosted bundle: raw inputs, processed outputs, checksums, a machine-readable manifest, and a report that links back to the canonical GitHub repository.

Unionization and Income

This project combines a BLS state union membership table with the Census ACS 2024 median household income estimate and the BEA 2024 Regional Price Parities to create a 50-state dataset with both nominal and cost-of-living-adjusted income, two simple descriptive regressions, and a hosted audit bundle that mirrors the companion repository.

Slope (nominal) $1,256

Per +1 pp of union membership, before adjusting for state cost of living.

Slope (RPP-adjusted) $537

Same regression after dividing income by the BEA Regional Price Parity (US = 100). The positive link survives, compressed by roughly 57%.

R-squared 0.292 → 0.124

Share of cross-state income variation explained by union density alone; shrinks substantially once RPP removes state-level price differences.

Caveat Descriptive

Both regressions are ecological and should not be interpreted as causal.

What’s included

  • A standalone auditable HTML report with side-by-side scatter plots for nominal and RPP-adjusted income, a regression comparison table, and a rank-shift panel.
  • All three raw inputs: the BLS extract, the Census ACS snapshot, and the unmodified BEA SARPP state dump.
  • A merged 50-state CSV with both nominal and cost-of-living-adjusted income columns, plus machine-readable and human-readable regression summaries for both models.
  • A machine-readable audit manifest and a checksum file for validating the hosted copies.
  • Human-readable audit docs plus a canonical GitHub repository with the scripts and commit history.

Why this is useful

  • Publishing nominal and RPP-adjusted views side by side exposes how much of the headline union–income gradient reflects expensive coastal labor markets.
  • The hosted directory exposes the exact source files behind the charts, not just a summary page.
  • The checksum file and manifest make it easier to verify that the hosted artifacts are internally consistent.
  • The GitHub repository remains the canonical source for scripts, provenance, and future updates.

Auditable report

The hosted report page, now wired to local audit artifacts and the canonical repository.

Read report

Merged dataset

The joined state-level file used for the regression and hosted visuals.

Download CSV

Census snapshot

The checked-in ACS 2024 state income snapshot used for the default offline reproduction path.

Download CSV

BLS source extract

The state union membership table preserved alongside the hosted report.

Download CSV

BEA RPP source dump

The unmodified BEA Regional Price Parities state table (SARPP, 2008-2024). The pipeline reads the latest year's "All items" line directly from this file.

Download CSV

Audit manifest

A machine-readable inventory of the hosted bundle files and their SHA-256 hashes, plus the canonical repository URL.

Open JSON

Checksums and audit docs

Hosted checksums plus the audit and source notes used to explain how the bundle was assembled.

Open SHA256SUMS
Open AUDIT.md
Open SOURCES.md

Regression outputs

Machine-readable JSON plus a short Markdown summary of the OLS fit.

Open JSON
Open Markdown

GitHub repository

Exact provenance for every hosted file, including official URLs and the local scripts used to produce the outputs.

Review provenance

This website hosts a static mirror of the artifact bundle for convenience. The canonical source for scripts, provenance notes, and future revisions is SamuelSchlesinger/us-state-union-income-analysis. The fitted line is intentionally simple and should be read as a descriptive relationship rather than a causal claim.