HCC Risk Adjustment Data Model: Building Accurate Risk Score Pipelines in SQL

For a Medicare Advantage plan with 100,000 members, a 0.01 difference in average RAF score translates to roughly $4 million in annual revenue.

A modeling error that drops 3% of valid diagnosis codes does not produce an error. The pipeline runs clean, the submission goes out, and the plan receives significantly less than it should. You find out twelve months later during reconciliation — if at all.

This guide covers the CMS-HCC data model, the pipeline architecture that produces accurate RAF scores, the SQL patterns for each stage, and the common mistakes that silently undercount risk.

How CMS-HCC Risk Adjustment Works

CMS risk-adjusts Medicare Advantage capitation payments to account for member health status:

CMS assigns a base rate (benchmark) for each county
Each member's RAF (Risk Adjustment Factor) score is calculated from demographics and diagnosis history
The plan's capitation payment = benchmark × average RAF score across members
A RAF score of 1.0 means average expected cost. 1.5 means 50% above average.

The RAF score has two components:

Demographic score — based on age, sex, dual eligibility status, and Medicaid eligibility. Computed from enrollment data.

Disease score — based on HCC codes mapped from submitted diagnosis codes. Computed from encounter and claims data.

The CMS-HCC model (Version 28 as of 2024) maps roughly 10,000 ICD-10-CM diagnosis codes into 115 HCC categories. Each HCC has a coefficient that contributes to the RAF score. HCCs interact — comorbidity adjustments apply when certain HCC pairs occur together.

The Core Data Model

The HCC risk adjustment pipeline requires four key tables.

Diagnosis Encounter Staging

CREATE TABLE stg_risk_adjustment_encounters (
    encounter_key           BIGINT          NOT NULL GENERATED ALWAYS AS IDENTITY,
    member_id               VARCHAR(50)     NOT NULL,
    rendering_npi_id        CHAR(10)        NOT NULL,
    service_date            DATE            NOT NULL,
    face_to_face_flag       BOOLEAN         NOT NULL DEFAULT FALSE,
    diagnosis_code          VARCHAR(10)     NOT NULL,
    diagnosis_code_type     CHAR(5)         NOT NULL DEFAULT 'ICD10',
    data_source             VARCHAR(50)     NOT NULL,  -- 'CLAIMS','ENCOUNTER','CHART'
    submission_year         SMALLINT        NOT NULL,
    loaded_datetime         TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,

    CONSTRAINT pk_stg_ra_encounters PRIMARY KEY (encounter_key)
);

CREATE INDEX idx_stg_ra_member_year 
    ON stg_risk_adjustment_encounters (member_id, submission_year);
CREATE INDEX idx_stg_ra_diagnosis   
    ON stg_risk_adjustment_encounters (diagnosis_code);

CMS-HCC Mapping Reference and Coefficients

-- Loaded from CMS software release each year
CREATE TABLE ref_icd10_to_hcc (
    icd10_code      VARCHAR(10)  NOT NULL,
    hcc_number      SMALLINT     NOT NULL,
    hcc_label       VARCHAR(200) NOT NULL,
    model_version   VARCHAR(10)  NOT NULL,  -- 'V28', 'V24'
    effective_year  SMALLINT     NOT NULL,

    CONSTRAINT pk_icd10_hcc 
        PRIMARY KEY (icd10_code, hcc_number, model_version, effective_year)
);

-- CMS model coefficients by member segment
CREATE TABLE ref_hcc_coefficients (
    model_version   VARCHAR(10)     NOT NULL,
    payment_year    SMALLINT        NOT NULL,
    hcc_number      SMALLINT        NOT NULL,
    segment         VARCHAR(50)     NOT NULL,  -- 'CNA','ESRD','NE','SNPNE'
    coefficient     DECIMAL(10, 6)  NOT NULL,

    CONSTRAINT pk_hcc_coefficients 
        PRIMARY KEY (model_version, payment_year, hcc_number, segment)
);

Member HCC Summary and RAF Score Output

-- One row per member per HCC per year
CREATE TABLE fct_member_hcc (
    member_hcc_key      BIGINT      NOT NULL GENERATED ALWAYS AS IDENTITY,
    member_id           VARCHAR(50) NOT NULL,
    hcc_number          SMALLINT    NOT NULL,
    submission_year     SMALLINT    NOT NULL,
    encounter_count     SMALLINT    NOT NULL DEFAULT 1,
    first_service_date  DATE        NOT NULL,
    last_service_date   DATE        NOT NULL,

    CONSTRAINT pk_member_hcc PRIMARY KEY (member_hcc_key),
    CONSTRAINT uq_member_hcc_year UNIQUE (member_id, hcc_number, submission_year)
);

-- Final RAF score output table
CREATE TABLE fct_member_raf_score (
    member_id           VARCHAR(50)     NOT NULL,
    payment_year        SMALLINT        NOT NULL,
    segment             VARCHAR(50)     NOT NULL,
    demographic_score   DECIMAL(10, 6)  NOT NULL,
    disease_score       DECIMAL(10, 6)  NOT NULL,
    interaction_score   DECIMAL(10, 6)  NOT NULL DEFAULT 0,
    total_raf_score     DECIMAL(10, 6)  NOT NULL,
    hcc_count           SMALLINT        NOT NULL,
    calculated_datetime TIMESTAMP       NOT NULL DEFAULT CURRENT_TIMESTAMP,

    CONSTRAINT pk_member_raf PRIMARY KEY (member_id, payment_year, segment)
);

Stage 1: Diagnosis Code Ingestion and Validation

The first pipeline stage ingests diagnosis codes from multiple sources — claims (837P/837I), encounter records, and chart reviews — and applies validation rules.

⚠️ The face_to_face_flag filter is critical. CMS only accepts diagnosis codes from face-to-face encounters with eligible provider types. Missing this filter is the #1 cause of CMS submission rejections.

-- Load validated encounters only
INSERT INTO stg_risk_adjustment_encounters (
    member_id, rendering_npi_id, service_date, face_to_face_flag,
    diagnosis_code, data_source, submission_year
)
SELECT
    s.member_id,
    s.rendering_npi_id,
    s.service_date,
    s.face_to_face_flag,
    s.diagnosis_code,
    s.data_source,
    EXTRACT(YEAR FROM s.service_date)::SMALLINT
FROM stg_raw_encounters s
INNER JOIN ref_icd10_codes r
    ON s.diagnosis_code = r.icd10_code
   AND r.effective_year = EXTRACT(YEAR FROM s.service_date)::SMALLINT
WHERE s.face_to_face_flag = TRUE
  AND s.service_date >= DATE_TRUNC('year', CURRENT_DATE - INTERVAL '2 years');

Always run this audit query after loading — it tells you how many codes were silently rejected:

SELECT
    s.diagnosis_code,
    COUNT(*) AS rejected_count
FROM stg_raw_encounters s
LEFT JOIN ref_icd10_codes r
    ON s.diagnosis_code = r.icd10_code
   AND r.effective_year = EXTRACT(YEAR FROM s.service_date)::SMALLINT
WHERE r.icd10_code IS NULL
GROUP BY s.diagnosis_code
ORDER BY rejected_count DESC;

Stage 2: ICD-10 to HCC Mapping

Map valid diagnosis codes to HCC categories using the CMS crosswalk. A member gets credit for an HCC once — regardless of how many times the code appears across encounters.

INSERT INTO fct_member_hcc (
    member_id, hcc_number, submission_year,
    encounter_count, first_service_date, last_service_date
)
SELECT
    e.member_id,
    h.hcc_number,
    e.submission_year,
    COUNT(DISTINCT e.service_date) AS encounter_count,
    MIN(e.service_date)            AS first_service_date,
    MAX(e.service_date)            AS last_service_date
FROM stg_risk_adjustment_encounters e
JOIN ref_icd10_to_hcc h
    ON e.diagnosis_code = h.icd10_code
   AND h.model_version = 'V28'
   AND h.effective_year = e.submission_year
GROUP BY e.member_id, h.hcc_number, e.submission_year
ON CONFLICT (member_id, hcc_number, submission_year)
DO UPDATE SET
    encounter_count   = EXCLUDED.encounter_count,
    last_service_date = EXCLUDED.last_service_date;

Stage 3: HCC Hierarchy Application

The CMS-HCC model uses hierarchies — when a more severe HCC is present, less severe HCCs in the same category are suppressed.

Example: HCC 19 (Diabetes without Complication) is suppressed if HCC 17 or HCC 18 is present.

Skipping this step causes your internal RAF projections to overcount — and your numbers will never match CMS's calculation.

-- Hierarchy suppression reference table
CREATE TABLE ref_hcc_hierarchy (
    model_version   VARCHAR(10) NOT NULL,
    higher_hcc      SMALLINT    NOT NULL,  -- if present...
    lower_hcc       SMALLINT    NOT NULL,  -- ...suppress this one
    CONSTRAINT pk_hcc_hierarchy 
        PRIMARY KEY (model_version, higher_hcc, lower_hcc)
);

-- View that applies hierarchy suppression per member
CREATE OR REPLACE VIEW vw_member_hcc_after_hierarchy AS
SELECT
    m.*,
    EXISTS (
        SELECT 1
        FROM ref_hcc_hierarchy h
        JOIN fct_member_hcc m2
            ON h.higher_hcc = m2.hcc_number
           AND m2.member_id = m.member_id
           AND m2.submission_year = m.submission_year
        WHERE h.lower_hcc = m.hcc_number
          AND h.model_version = 'V28'
    ) AS is_suppressed_flag
FROM fct_member_hcc m;

Stage 4: RAF Score Calculation

Sum coefficients for non-suppressed HCCs to get the disease score, then add the demographic score to produce the final RAF:

WITH disease_scores AS (
    SELECT
        m.member_id,
        m.submission_year,
        SUM(c.coefficient) AS disease_score,
        COUNT(*)           AS hcc_count
    FROM vw_member_hcc_after_hierarchy m
    JOIN ref_hcc_coefficients c
        ON m.hcc_number = c.hcc_number
       AND c.model_version = 'V28'
       AND c.payment_year = m.submission_year + 1
       AND c.segment = 'CNA'
    WHERE m.is_suppressed_flag = FALSE
    GROUP BY m.member_id, m.submission_year
)
INSERT INTO fct_member_raf_score (
    member_id, payment_year, segment,
    demographic_score, disease_score,
    interaction_score, total_raf_score, hcc_count
)
SELECT
    d.member_id,
    d.submission_year + 1               AS payment_year,
    'CNA'                               AS segment,
    dem.total_demographic_score         AS demographic_score,
    d.disease_score,
    0                                   AS interaction_score,
    dem.total_demographic_score 
        + d.disease_score               AS total_raf_score,
    d.hcc_count
FROM disease_scores d
JOIN fct_member_demographic_score dem
    ON d.member_id = dem.member_id
   AND dem.payment_year = d.submission_year + 1
ON CONFLICT (member_id, payment_year, segment)
DO UPDATE SET
    disease_score       = EXCLUDED.disease_score,
    total_raf_score     = EXCLUDED.total_raf_score,
    hcc_count           = EXCLUDED.hcc_count,
    calculated_datetime = CURRENT_TIMESTAMP;

5 Mistakes That Silently Undercount Risk

1. Not filtering for face-to-face encounters

Lab-only visits, ancillary services, and telehealth from non-eligible providers do not count. A pipeline that ingests all encounters without this filter loads codes CMS will reject at submission.

2. Using claim received dates instead of service dates

Late claims arrive in the following year. A service date of December 2025 on a claim loaded in February 2026 should count toward the 2025 submission year. Always use service_date, never claim_received_date.

3. Missing the hierarchy application step

If your pipeline sums coefficients for all HCCs without hierarchy suppression, you will overcount RAF scores internally — and your numbers will not match CMS.

4. Not deduplicating across data sources

The same encounter may appear in claims, an encounter feed, and a chart review extract. Without deduplication at the member-HCC-year level, you count the same HCC multiple times while CMS counts it once.

5. Dropping diagnosis codes with invalid NPI

Validate NPIs separately — do not drop the diagnosis code because the provider record has a formatting issue. You lose legitimate risk score contributions from those encounters.

FAQ

How far back does CMS look at diagnosis codes?

For the standard prospective model, CMS uses diagnoses from the prior calendar year. Diagnosis codes from 2025 determine risk scores for payment year 2026.

What is the difference between RAPS and EDPS submissions?

RAPS (legacy) accepts diagnosis codes extracted from claims. EDPS (current standard) accepts full encounter records — demographics, procedure codes, diagnosis codes, and provider information. Most plans submit via both pathways.

How do we handle mid-year member additions?

CMS applies a partial-year factor for members who join mid-year. Your fct_member_raf_score table should store both the full RAF score and the enrollment months count used by CMS for the prorated calculation.

Free Tools for HCC Risk Adjustment Pipelines

If you're building or validating an HCC risk adjustment pipeline, mdatool provides free tooling at every stage:

🧮 HCC Calculator — verify individual member RAF scores against CMS-HCC V28 coefficients without running a full pipeline
🔍 ICD-10 Search — look up codes and confirm HCC mapping before they enter your submission pipeline
🔧 SQL Linter — catch bugs in your risk adjustment SQL before production
📖 Healthcare Data Dictionary — 100,000+ verified definitions including HCC, RAF, RAPS, EDPS, and RADV terminology

Free to start. No credit card required.

Originally published at mdatool.com

推荐订阅源

DEV Community