Skip to main content
info

Data Confidence Scores are considered a Beta release while we continue to fine-tune the scoring logic.

Data Confidence

SkaldMaps pulls in over 400 attributes from 20+ different sources. We have different refresh schedules and source geometries available. This means that depending on your selected level (ZIP, county, tract) and attribute, some data might be more accurate than others.

What is a Data Confidence Medal

We aim to classify and thereby quantify this accuracy automatically, based on a wealth of metadata we collect about our data sources and our processing methods.

For instance, census data on a tract level is going to be as accurate as the source data with very high confidence, as are relatively static values, such as land area in square miles.

On the other hand, trying to map weather data to a ZIP code is inherently imprecise, as weather data is not usually mapped to the ZIP level (but rather a custom aggregate metric based on available weather staitons). The data is still useful - it'll be in the right ballpark - but it is not to be confused with a weather report.

SkaldMaps shows a data confidence medal to help you audit whether a field or rating model is built from fresh, complete, data that matches to your selected geography.

The medal is context-specific. The same field can have different confidence for ZIPs, counties, and tracts, and it can vary by state because coverage is measured after the current data build.

assets/gold_example.png

Medal Levels

  • Gold - Fresh enough, well covered, and not materially modeled for the selected geography.
  • Silver - Generally well usable, but exercise caution.
  • Bronze - Use carefully. Key confidence inputs are missing or the field has meaningful age, coverage, or modeling limitations.

Missing information counts against confidence. A field cannot be Gold when source age or coverage is unavailable, and missing both forces Bronze.

A field being Bronze isn't inherently bad; it simply means that you may want to rate it a bit lower than a Gold level field.

The current breakdown of attributes looks like this:

assets/data_quality_by_tier.png

What Lowers Confidence

The confidence score combines:

  • Age - Older source years lower confidence. Mixed source ages also lower confidence when a field combines inputs from years that are far apart.
  • Coverage - Low state/geography coverage lowers confidence. Coverage is measured from the built data tables, not estimated in the browser.
  • Modeled geography - Fields can be rolled up, allocated, interpolated, or otherwise modeled when the source data is not native to the selected geography.

The UI shows the medal and age in compact places. Hover the help icon or open the rating modal to see the coverage and short reason labels such as Older data, Mixed ages, Modeled, or Coverage gaps. On mobile, tap the ? icon.

Examples

The score starts at 100 and subtracts penalties for age, coverage gaps, mixed source years, and modeled geography. The final score becomes Gold at 80+, Silver at 60-79, and Bronze below 60.

Gold example: Land Area (sq mi), Colorado ZIPs

This is a very straightforward field.

  • Age: 2025; the scoring year is 2025, so there is no age penalty.
  • Coverage: 527 of 527 ZIPs, so there is no coverage penalty.
  • Source ages: one source period, so there is no mixed-age penalty.
  • Geography: native ZIP geometry, so there is no modeled-geography penalty.
  • Calculation: 100 - 0 age - 0 coverage - 0 mixed-age - 0 modeled = 100, so the field is Gold.

assets/co_land_area.jpg

Silver example: School Quality Index, Georgia ZIPs

This is a custom attribute we calculate.

  • Age: Avg 2023; the component source years average to 2023, so older data subtracts 12.
  • Coverage: 749 of 751 ZIPs, about 99.7%, so there is no coverage penalty.
  • Source ages: components span 2021 to 2025, so mixed ages subtract 10.
  • Geography: school-district outcomes are area-weighted to ZIPs, so modeled geography subtracts 10.
  • Calculation: 100 - 12 age - 0 coverage - 10 mixed-age - 10 modeled = 68, so the field is Silver.

Silver does not mean the field is bad. It means the field is useful, but the rating should be auditable because it combines several source years and a modeled geography match.

Bronze example: Median Real Estate Taxes, Alaska ZIPs

Alaska is an intersting case, since it is very sparesly populated.

  • Age: 2020-2024; the scoring year is 2024, so there is no age penalty.
  • Coverage: 62 of 245 ZIPs, about 25.3%, so severe coverage gaps subtract 40.
  • Source ages: one source period, so there is no mixed-age penalty.
  • Geography: native to ZIPs, so there is no modeled-geography penalty.
  • Calculation before cap: 100 - 0 age - 40 coverage - 0 mixed-age - 0 modeled = 60.
  • Severe coverage gaps cap the score below Silver, so the final score is 59, which is Bronze.

Please note that Alaska has low overall ZIP code coverage in general:

assets/zip_code_ak_coverage.png

This does not count against coverage - ZIP based land coverage area (the gold example) is still gold in AK.

Rating Models

Rating model confidence is based on the active criteria in your model. More important criteria count more heavily. After results are calculated, result coverage also affects the model confidence summary.

Use confidence as an audit signal, not as an automatic yes/no rule. A Bronze field may still be useful when it measures something rare or hard to observe, but it should not carry a model without review.

Coverage

Coverage answers: "How many areas in this selected state and geography level have usable values for this field?"

For example, a ZIP field with 90% coverage in Georgia means roughly 9 out of 10 Georgia ZIPs in the current app dataset have a usable value for that field. Areas without a usable value do not contribute that criterion to their rating score.