4 min read

Sonoma County School Accountability Dashboard

In this blog post, I describe the data architecture behind the Sonoma County School Accountability Dashboard, including the data preparation in DuckDB and data visualization in Evidence.

Last month, I released the Sonoma County School Accountability Dashboard app. This app takes the district and school accountability data released by the California Department of Education (CDE) in November 2024 and displays it for districts in Sonoma County.

The app uses custom-made SQL and Python scripts to download the raw data from CDE and save it to a DuckDB database. The data visualization is done in Evidence. All source code is available on GitHub and data is accessible through Motherduck.

In this blog post, I am going to describe the data architecture of this project, include the data preparation in DuckDB, and data visualization in Evidence.

District and School Accountability in California

The Every Student Succeeds Act (ESSA) is the federal law that governs K-12 education policy in the United States. As part of this bill, states are directed to develop district and school accountability measures and to release these results annually. After some regulatory delays due to the 2017 US Department of Education administration transition, California introduced a new ESSA-aligned accountability system in 2018.

This accountability system includes six indicators: (1) Academic (reported separately for English language arts/literacy (ELA) and mathematics assessments), (2) English Learner Progress, (3) Chronic Absenteeism, (4) Graduation Rate, (5) Suspension Rate, (6) College/Career, scored on a five-color scale, from highest to lowest: Blue, Green, Yellow, Orange, and Red, corresponding to levels: Very High, High, Medium, Low, and Very Low.

Performance levels are a composite score that combines yearly Status levels and year-to-year Change levels.

Status levels are calculated by comparing the distance from expected standard-based performance, with scores around the expected performance threshold receiving a rating of Yellow/Medium, scores above receiving Green/High or Blue/Very High, and scores below receiving Orange/Low or Red/Very Low.

Change levels are similarly scores, but they compare year-to-year performance, assigning Yellow/Medium level if performance was maintained, Green/High or Blue/Very High if performance improved, and Orange/Low or Red/Very Low if performance declined.

These two metrics are then combined into a five-by-five matrix that identifies twenty-five potential cells, which identify the performance color for a specific indicator. The figure below shows an example of how to use the five-by-five matrix to identify the performance color for a sample indicator.

Five-by-Five Colored Table
Five-by-Five Colored Table
Source: 2024 Dashboard Technical Guide, p. 11

Schools Eligible for Support and Improvement

ESSA also requires states to identify schools and districts for additional support based on specific criteria. California has three separate support programs called Comprehensive Support and Improvement (CSI), Additional Targeted Support and Improvement (ATSI), and Targeted Support and Improvement (TSI). These programs come with additional funding to facilitate school improvement and additional supervision from County Offices of Education. In general, CSI eligibility is based on a low graduation rate for high schools or being in the lowest-performing five percent of Title I-funded schools. ATSI and TSI eligibility is based on having one or more student groups that meet any of the criteria used to determine CSI−Low Performing, while qualifying for CSI assistance.

CSI and ATSI determinations are made on a three-year cycle, beginning with the 2023 Dashboard (i.e., the last school year). TSI is considered on non-CSI and non-ATSI entry years (i.e., this school year).

Data Processing

Accountability data is released annually during the fall. All data files are made public and available for download through CDE's website.

To facilitate data processing, I developed an Extract, Transform, and Load (ETL) pipeline in Python and DuckDB. DuckDB is an in-process, analytical database that I use to download the raw data CDE released and to prepare analytics-ready data tables used in the data visualization app.

The process is outlined in the etl.py script. This script follows four design decisions.

First, I offload downloading and processing of two data files to their respective helper functions. The ESSA assistance file comes in an Excel spreadsheet that isn't ingestible right away in DuckDB, so I need to do some minimal header cleaning with pandas. The fileserver hosting the school directory doesn't allow range requests, which breaks native processing in DuckDB.

Second, the script processes each indicator and prepares a combined table for all available dashboard years. All data is directly processed within DuckDB, including downloading of the CSV files and preparation of the combined, indicator-level tables. This includes directly downloading the comma-separated files with

create or replace table CCI_2024 as 
select * 
from read_csv(
  'https://www3.cde.ca.gov/researchfiles/cadashboard/ccidownload2024.txt',
  delim = '\t',
  header = true
);

and then stacking all years into a single table with a series of UNION statements. As the files have included different columns over the years, I only include a subset of columns that are consistent across the years.

Third, I calculate whether a district is eligible for differentiated assistance. While actual eligibility takes into consideration multiple measures and offers different levels of assistance, I base eligibility on whether a district is scoring in Red/Very Low on one indicator district-wide or on two student subgroups. This should identify districts that could potentially be eligible for differentiated assistance.

Finally, I combine all these data into a single view called dashboard that I use in Evidence as the main data source for analysis. This view combines all indicators and differentiated assistance eligibility into a single data source and does some minimal data cleaning to the original raw data.

Data Presentation

I use Evidence as the analysis platform in this project. Evidence is an open-source framework for building data products with SQL and Node.js.

The data visualization app is built around two principles.

First, I wanted it to be simple, with a focus on just displaying the dashboard data. This is accomplished by the first table you see on the dashboard home page. It just lists all school districts in Sonoma County, alongside their differentiated assistance status.

Second, each page presents data for a single district or school for a single school year. These pages are accessible by clicking on a row on the county (or district) page and display the overall results for each indicator as color gauges, sub-group results or equity overview as a table, and historical overview as a table. Finally, historical, year-level results are accessible by clicking the top menu.

These two design decisions constrain what is possible to do on the app. For example, it is not possible to compare year-to-year changes for student sub-groups or compare yearly performance across districts or schools. These limitations mirror the official California School Dashboard, which also doesn't have a way to display year-to-year changes or between-district comparisons.