How do you tell if a therapy reimbursement benchmark is worth using?

A median drawn from twelve contracts can print the same dollar figure as one drawn from twelve thousand. The number of providers behind the figure determines whether it belongs in a rate-review letter — or nowhere near one.

Question 1: Where did the number come from?

The source determines what a number can prove. There are three main sources for therapy reimbursement benchmarks:

Payer Transparency in Coverage filings. Since July 2022, every commercial health plan has been required by federal rule to publish its in-network contracted rates monthly in machine-readable files, tied to each provider's NPI. A benchmark built from this data is built from the payer's own disclosures — which means citing it in a rate-review letter is citing the payer back to themselves. That's the strongest possible citation.

Provider surveys. Many therapy associations and billing companies publish survey-based rate guides. Surveys capture self-reported rates from participating therapists and are faster to produce than parsing federal filings. Their limitation is selection bias — therapists who respond to surveys may not represent the full distribution of contracted rates, and surveys can't be verified against the payer's own filing.

Synthetic estimates. Some aggregators average multiple sources into a "market rate" estimate. These can be useful for rough orientation but rarely stand up in a rate-review conversation because they can't be traced to a specific payer's actual contracts.

If a page doesn't tell you which type of source it uses, treat it as a survey estimate at best.

Question 2: How many providers is the median based on?

A median drawn from twelve contracts and a median drawn from twelve hundred can print the same dollar figure. They don't carry the same weight.

Sample size determines how stable the median is — whether a single high or low outlier can pull it meaningfully, and whether the range around it describes the payer's actual contracting pattern or just reflects a few data points.

Practical working thresholds (not formal statistical standards):

  • Under 30 contracts. The median can move several dollars on one unusual contract. Directional at best — you can see the rough neighborhood, but the number isn't stable enough to anchor a rate letter.
  • 30 to 150 contracts. The median is reasonably stable. The range starts to describe the payer's actual contracting pattern.
  • Above 150 contracts. The distribution is well-defined. Your position inside it can support a formal rate-review request.

A source that doesn't publish sample size is asking you to trust the median without showing you how stable it is. That's a reason to be skeptical, not reassured.

Question 3: Is it filtered to your exact payer, code, and state?

A blended average across payers is useful for understanding the category. It's not useful for a rate-review letter with one specific payer, because that payer's analyst will compare your request to their own schedule — not to a multi-payer composite.

The comparison that matters is your rate against the same payer's contracts in the same state for the same code. A benchmark is worth citing when it holds all three constant. When it doesn't, it describes a market — which is harder to act on.

Credential group matters too. Most commercial payers in Texas contract LCSWs, LPCs, and LMFTs at one rate tier and psychologists at a separate, higher tier. A benchmark that mixes the two groups will produce a median that's above the correct reference for master's-level therapists and below the correct reference for psychologists — useful for neither.

Question 4: Does the source tell you when it doesn't know?

The most trustworthy sources are the ones that flag their own thin spots. Not every payer contracts every credential group at scale in every state. When a cohort is thin — fewer than 30 contracts, for example — a responsible source says so and labels the distribution as directional. A source that shows the same confident-looking number regardless of cohort depth is either papering over the uncertainty or hasn't looked.

A source that flags confidence explicitly — showing a confidence score, a sample size threshold, or a note that the distribution is directional — is demonstrating exactly the kind of methodological discipline that makes a benchmark worth trusting. The willingness to say "we don't know" in the places where the data is thin is what earns credibility in the places where it's strong.

What a benchmark that shows its work looks like

A benchmark worth using shows all four of these things before you act on it — ideally before you pay for it:

  • The source: "Built from [payer]'s Transparency in Coverage filing, filed under CMS requirements"
  • The sample size: the number of provider contracts backing the distribution
  • The filter: payer, code, state, and credential group all explicitly stated
  • The confidence framing: a score or label that distinguishes strong cohorts from thin ones

A number that shows all four is data. A number that hides any of them is a claim. The distinction matters most when you're deciding whether to put the number in a rate-review letter — because the payer's analyst will ask exactly these questions when they receive it.

What's visible before you buy

Every RateScope benchmark page — the BCBS Texas 90837 page, for example — shows sample size, confidence score, the credential-group filter, and a link to the methodology page before purchase. The methodology page explains how each payer's Transparency in Coverage filing becomes a usable cohort — which NPIs are included, how duplicates are resolved, and which cells are flagged as too thin to support a read. If the cohort for a given payer × code × state × credential combination is below the threshold for a confident distribution, the page says so. You know the data quality before committing.