Transparency

How the benchmarks are built

Every statistic in a RateScope report is the output of a documented pipeline with seven explicit decisions. This page describes each decision, the alternatives considered and rejected, and the thresholds that govern what gets published. If you want to verify the methodology before purchasing, this is the right place.

Source files

Benchmarks are derived exclusively from Machine-Readable Files (MRFs) published by health insurers under 45 CFR § 147.210 — the CMS Transparency in Coverage rule. No estimates, no scraping, no secondary sources.

Phase 1 File
UHC OHBS-5 (Optum Health Behavioral Services)
UHC publishes ~85,000 employer-indexed JSON blobs that reference shared in-network rate files. For Texas mental health benchmarks, the operative file is OHBS-5-UHC-Texas: the Optum Health Behavioral Services professional network. It is 5.9 MB compressed, fully streaming-parseable, and contains CPT codes directly (unlike UHC's general commercial files which begin with facility revenue codes requiring seeking past gigabytes of irrelevant data).
Files considered and skipped for Phase 1: Choice-Plus-POS (9.9 GB), Core-POS (9.3 GB), Behavior-Health-P3 (15.3 GB) — all start with facility revenue codes (0xxx series). CPT data is present but requires streaming through billions of bytes of facility rates first. These are marked phase1-skip and will be incorporated in Phase 2 when chunked streaming is implemented.
Provider Registry
NPPES March 2026 Bulk Dissemination File
Provider names, states, taxonomy codes, and credential classifications are sourced from the NPPES monthly bulk dissemination CSV (~9.4M rows, ~11 GB), filtered to Texas individual providers (Entity Type 1) with mental health taxonomy codes. This yields 31,492 Texas MH providers; 6,903 are present in the OHBS-5 network. The NPPES bulk file is used rather than the real-time NPPES API because the API caps pagination at 102K results per query — insufficient to enumerate all Texas therapists.

Pipeline steps

The pipeline runs in four discrete stages. Each produces a versioned artifact. No stage mutates its inputs.

1

Fetch: Download and parse MRF

The UHC table-of-contents (85,321 blobs) is fetched and the OHBS-5 download URL extracted. The MRF is streamed with ijson using the Schema 2.0 provider reference structure: NPIs live at provider_references → provider_groups → npi (not directly at provider_references → npi). Output: Parquet of NPI×rate rows for CPT 90837.

Script: pipeline/fetch/save_ohbs_parquet.py

2

Registry: Build Texas MH provider list

NPPES bulk CSV is filtered to Texas individual providers with MH taxonomy codes. Each NPI is assigned one credential bucket using T5 precedence rules. Output: Parquet with NPI, state, taxonomy, bucket.

Script: pipeline/nppes/build_tx_mh_registry.py

3

Canonical: Join and deduplicate

Rate Parquet joined to provider registry on NPI (inner join — only providers with confirmed Texas MH taxonomy are kept). T3 rate-type filter applied. T4 dedup key applied. Output: canonical rates fact table.

Script: pipeline/aggregate/build_canonical_facts.py

4

Score: Compute cell statistics

Grouped by (payer, billing code, credential bucket). Computes P10–P90, IQR, IQR/median ratio, T1 heterogeneity check, T5 subgroup comparison. T6 confidence label assigned. Output: cell statistics Parquet + QA sheet.

Script: pipeline/scorecard/score_cells.py

5

Publish: Build frozen payload and render PDF

For each Moderate+ confidence cell, a frozen JSON payload is constructed with all statistics, metadata, and methodology fields. A SHA-256 hash is computed over the payload body and stamped as source_snapshot_id. WeasyPrint renders the PDF from a Jinja2 template; the output PDF hash is written back to the payload. The same payload always produces the same PDF.

Scripts: pipeline/report/build_payload.py · pipeline/report/render_pdf.py

T1 — Pooling rule

T1
How rates from different sub-networks are pooled
Statewide commercial pool. All Texas MH providers in a network are pooled into one cohort. Sub-network heterogeneity is tested and disclosed.
A single statewide cohort maximizes n (important for confidence) and matches the question buyers actually have: "What does UHC pay therapists in Texas?" Pooled commercial rates are tested for top/bottom-quartile cluster divergence. If the gap between cluster medians exceeds 30%, a heterogeneity flag is added to the report. In the current OHBS-5 cohort, no heterogeneity flag is triggered.
Options considered and rejected: Geographic sub-markets (MSA-level), plan-type split (HMO vs PPO). Both reduce n below publishability threshold for most cells. Not enough data yet to publish credibly at sub-market level.

T2 — File inclusion

T2
Which MRF files are included vs excluded
Commercial MH professional networks only. Medicare, Medicaid, CHIP, dental, vision, and allowed-amount files are excluded.
Exclusion is applied by regex against file names: medicare|medicaid|chip|dental|vision|allowed.amount. Only in-network rate files for commercial products are included. Allowed-amount files report actual paid amounts, not contracted rates — a different statistic that would contaminate the benchmark.

T3 — Rate types included

T3
Which negotiated_type values are included
negotiated and fee schedule only. derived rates are excluded.
TiC files contain three rate types: negotiated (bilateral contract price), fee schedule (fixed schedule applied uniformly), and derived (computed from another rate, e.g. "95% of allowed amount"). Derived rates introduce secondary calculation uncertainty and are not comparable across providers. Including them would inflate variance and reduce interpretability without adding signal.
Composition of OHBS-5 CPT 90837 cohort: The current dataset is 100% negotiated type. No fee schedule or derived rows are present. This is disclosed in each report's rate-type mix field.

For the buyer: every rate in this benchmark reflects a bilateral contract between UHC and an individual provider — not a computed estimate or percentage-of-allowed formula. When the median is $110.30, that is the median of 6,569 actual contracted prices.

T4 — Dedup key

T4
How duplicate rate rows are eliminated
Canonical dedup key: (payer_brand, network_id, npi, billing_code, negotiated_type, rate, expiration_date). One row per unique combination.
UHC's TiC files contain the same NPI-rate pair referenced through multiple provider_group_id entries. Without dedup, the same $110 rate for a given NPI would appear multiple times, over-representing providers with complex plan structures. The dedup key preserves all meaningfully distinct rate entries (e.g. different expiration dates or different plan products) while removing exact duplicates. The result is one effective rate per NPI in the current cohort.

T5 — Credential bucket precedence

T5
How providers with multiple taxonomy codes are classified
Precedence: psychologist > lcsw > lmft > lpc_lmhc. One provider, one bucket.
Some providers hold multiple active taxonomy codes. Assigning one bucket per NPI prevents the same person's rate from inflating multiple cohort counts. Precedence follows typical clinical hierarchy (doctoral-level before master's-level) and license specificity (clinical social work before generic counselor). Published cohorts are: psychologist (doctoral-level) and master_level (combined lcsw + lmft + lpc_lmhc). Master's sub-credential breakdown is included in the full report.
Taxonomy codes included by bucket:
Psychologist: 103TC0700X, 103TC2200X, 103TP2700X, 103TP0016X, 103T00000X, 103TF0000X, 103TH0004X, 103TM1800X
LCSW: 1041C0700X
LMFT: 106H00000X
LPC/LMHC: 101YM0800X, 101YP1400M

T6 — Confidence labels

T6
Criteria for High / Moderate / Sparse / Suppressed
Confidence is a function of sample size (n) and rate dispersion (IQR/median). A large, tightly-clustered cohort is more credible than a small or wildly-spread one.
Tiern requirementDispersion requirementInterpretation
High ≥ 100 IQR/median ≤ 0.50 Large, coherent cohort. Statistic is stable and representative.
Moderate ≥ 30 IQR/median ≤ 0.80 Sufficient for directional benchmark. Report with appropriate caveats.
Sparse ≥ 10 Any Small cohort. Use for orientation only; do not cite in negotiations.
Suppressed < 10 Too few providers. Not published. Would identify individual rates.
Current cohorts: UHC 90837 master_level (TX): n=6,569, IQR/p50=0.16 → High. UHC 90837 psychologist (TX): n=1,274, IQR/p50=0.00 → High.

T7 — Release ID and versioning

T7
How reports are identified and updated
Release ID format: {STATE}-{CPT}-{PAYER}-{BUCKET}-{YEAR}Q{QUARTER}. Example: TX-90837-UHC-MASTERS-2026Q1.
Each report is associated with a frozen JSON payload. A SHA-256 hash of the payload body is computed and stored as source_snapshot_id. The rendered PDF hash is separately computed and stored back in the payload. This means: (1) the same input always produces the same PDF, (2) any change to methodology or data produces a different hash, (3) a buyer can verify their PDF against the published hash. When source data is updated (quarterly), a new release ID is issued. Old reports retain their original release IDs and remain valid as historical records.

Confidence and publishability

A cell must reach Moderate confidence before it is offered for sale. Cells below Moderate are suppressed entirely — we do not publish statistics we cannot stand behind.

Currently published:
UHC CPT 90837 · Master's-level (TX): n=6,569, IQR/p50=0.16 → High
UHC CPT 90837 · Psychologist (TX): n=1,274, IQR/p50=0.00 → High

Reproducibility

Every report PDF can be traced back to its exact source data and pipeline state.

Frozen payload guarantee

When you purchase a report, the PDF you receive corresponds to a specific frozen JSON payload. The payload body SHA-256 (source_snapshot_id) and the PDF SHA-256 (pdf_hash) are both stamped in the report's provenance block. If you receive the same release ID from two sources, you can verify they are identical by comparing the hashes. Any difference in methodology or data will produce a different source_snapshot_id.

What this data is not

These benchmarks are not predictions of what any payer will offer you. They are not guarantees that the median rate is attainable, and they are not legal advice.

These are an empirical snapshot of what one payer was contracted to pay a specific cohort of Texas providers as of a specific date, derived from that payer's own mandated public filings. The median may be above or below Medicare. Your rate may be above or below the median. Both are factual findings, not endorsements.

Individual contracted rates depend on your specific agreement with the payer. The benchmark shows you what the market looks like. If the data surprises you, that is the data working as intended.