HERC's Guide to the Nosos Risk Adjustment Score
Suggested CitationWagner T, Moran E, Shen ML, Gehlert E. HERC's Guide to the Nosos Risk Adjustment Score. Health Economics Resource Center, VA Palo Alto Health Care System, U.S. Department of Veterans Affairs. May 2024.
DisclaimersAll tables are stored in an Excel file. Download the tables here.
Many URLs are not live because they are VA intranet-only. Researchers with VA intranet access can access these sites by copying and pasting the URLs into their browser.
For a list of VA acronyms, please visit the VA acronym checker on the VA intranet at https://vaww.va.gov/Acronyms/fulllist.cfm.
1. Overview
This document provides an overview of the VA Nosos risk score, based on the Centers for Medicare and Medicaid (CMS) Hierarchical Condition Categories (HCC) risk adjustment model (Note: Nosos is the Greek word for 'chronic disease'). The science behind Nosos is documented the paper paper "Risk Adjustment Tools for Learning Health Care Systems: A Comparison of DxCG and CMS-HCC V21" (Wagner et al., 2016, Health Services Research; DOI: 10.1111/1475-6773.12454).
The purpose of the Nosos program is to create risk scores for VA patients, so that researchers may adjust for risk when making comparisons of treatments or outcomes. The Nosos scores are computed by first computing the CMS HCC risk scores using the CMS HCC Risk Model program. The CMS HCC risk scores primarily use the patients’ diagnoses (ICD-9/ICD-10 codes), age and gender. The Nosos risk score builds on this, adding pharmacy records as well as VA-specific items such as VA priority status and VA-computed costs. The risk scores, along with the additional factors, are then used as predictors in a regression model to model the annual VA cost for each patient. Estimates are then rescaled so that the mean Nosos score for the population will always equal one.
To make it easier for researchers to customize the risk scores for their own purposes, we have attempted to present the scoring algorithm as a series of modular SAS macros, with minimal dependence between different tasks. For example, if a researcher wishes to use different diagnoses than what we have presented (such as excluding certain types of visits) it will only be necessary to modify the macro that extracts ICD-9/ICD-10 codes. A researcher who does not want to use pharmacy codes can skip the pharmacy extract macro and use a modified version of the regression macro that does not include pharmacy codes. See Technical Report 30 Appendix A for an example of pulling data and scoring Nosos for a complete fiscal year.
Nosos scores are available for all patients in the VA system for FY2006 and forward, calcuated both with pharmacy data (PHA) and without pharmacy data (NOPHA). A brief description of each variable is listed in Appendix A. The instructions for requesting access to the Nosos data are listed in Section 2. For questions regarding to V21/Nosos, please contact Elizabeth Gehlert (Elizabeth.Gehlert@va.gov).
1.1. Updates
2023, update 2 (May 2024): Previously, the Nosos scores were available in CDW in separate tables for each fiscal year. We will move all years of data into tables with pharmacy data (PHA) and without pharmacy data (NOPHA), with scrambled security number (SCRSSN) as the patient identifier. Note: Researchers will need to have IRB-approval to access SCRSSN before they request access to Nosos risk scores.
2023, update 1 (February 2024): There were several changes to Nosos. First, we stopped creating quarterly Nosos scores and will only create the annual Nosos score (EOY) using data from the full fiscal year. Next, we dropped the registry variable. Separate analyses showed the registry data provided little insight into use of services (see Technical Report 43). Finally, we updated the VA purchased care datasets (moved to the IVC Consolidated Data Set (CDS) file and revised the race and ethnicity data (moved to the Observational Medical Outcomes Partnership (OMOP) data). See Appendix A for a list of all variables in the Nosos file.
2021: The original Nosos technical report was written to describe the Nosos risk score based on the CMS HCC risk score version 21 (V21) model. In 2020 (FY2019 data), we moved to CMS HCC risk score version 24 (V24) and we updated the guidebook accordingly.
2. Access
Access depends on the intent of the project. For more information, see the VHA Data Portal page on Operations and Quality Improvement projects data access (Intranet-only: http://vaww.vhadataportal.med.va.gov/DataAccess/OperationsAccess.aspx). This information was last updated on May 8, 2024.
2.1. Research Access
Researchers who wish to access the Nosos risk scores should apply for access using the VA Data Request Tracker (DART) process (Intranet-only: http://vaww.vhadataportal.med.va.gov/DataAccess/DARTRequestProcess.aspx). Select at least the following:
- DART Data Sources Page > Requested Datasets : Health Economics Resource Center (HERC) Cost Data - Includes Average Cost Data, V21 and Nosos Risk Scores, and Discharge Data Sets with Subtotals
- DART Data Sources Page > Identifiers: Scrambled SSN
- Research Request Memo: In the memo body denote that you will use Nosos risk score data
Note: Researchers will need to have IRB-approval to access SCRSSN before they request access to Nosos risk scores.
2.2. Operations Access
Operations users who wish to access the Nosos risk scores should apply for access following the Operations instructions on the VHA Data Portal (Intranet-only: http://vaww.vhadataportal.med.va.gov/DataAccess/HealthcareOperationsRequestProcess.aspx) by submitting an ePAS “VHA NDS Access Form for Health Operations” and selecting at least the following:
- Request tab > Data Sources: Corporate Data Warehouse
- Corporate Data Warehouse (CDW) tab: select 'CDW SAS Datasets' > 'Medical SAS Files'
3. Create diagnosis and person files
The first program in the process flow (ICD9ExtractNoForward) will load a macro that creates two datasets: the diagnosis file and the person file.
3.1. Diagnosis file
The diagnosis file has two fields per fiscal year (FY): SCRSSN and DIAG. SCRSSN is the patient’s scrambled social security number. DIAG is the ICD-9 (before FY2016) or ICD-10 (FY2016 and forward) diagnosis codes.
The program extracts diagnostic codes for all VA users in a fiscal year. We exclude any patient without valid ICD-9/ICD-10 diagnosis codes. The data are then converted into a long format so that each row represents a single diagnostic code per person; a person with more than one diagnosis in the year will have multiple rows of data. See Table 1 for a summary of the ICD-10 extract program. See Appendix B for a list of all input VA files.
3.1.1. FY2023 and Forward
The diagnoses are obtained from the VA Corporate Data Warehouse (CDW) Inpatient Tables, CDW Outpatient Tables, and the Integrated Veterans Care (IVC) Consolidated Data Set (CDS). We dropped the Fee Basis and VA Community Care Program Integrity Tool (PIT) tables; all VA community care data are now within the IVC-CDS tables. Note: IVC-CDS data do not include pharmacy data and data from Camp Lejeune.
3.1.2. FY2019 to FY2022
The diagnoses are obtained from the VA Corporate Data Warehouse (CDW) Inpatient tables, CDW SE workload table, and VA Community Care Program Integrity Tool (PIT) tables.
3.1.3. Prior to FY2019
The diagnoses were obtained from the VA Patient Treatment File (PTF) and National Patient Care Database (NPCD). The inpatient ICD-9 codes are pulled from the main and bedsection files, while the outpatient ICD-9 codes are pulled from the VA outpatient (SE) files. We also include purchased care from the Fee Basis data.
3.2. Person File
The person file lists all patients who had any diagnosis during the indicated period, fiscal year (FY), along with their gender and date of birth.
3.2.1. FY2023 and Forward
We obtain date of birth and gender from the Observational Medical Outcomes Partnership (OMOP) person table in CDW. If date of birth or gender are not available in the OMOP person table, we reference the CDW Patient table. We also include the following variables, per the CMS program requirements:
- LTIMCAID: number of months in Medicaid during the payment year. We set LTIMCAID=0 (otherwise) for all patients.
- NEMCAID: denotes if a patient is a new Medicare enrollee and the number of months in the payment year. We set NEMCAID=0 (otherwise) for all patients.
- OREC: original reason for entitlement. We set OREC=0 (Old age, OASI) for all patients.
3.2.2. Prior to FY2023
We obtain date of birth and sex from the vital.mini.table, if available. If gender and date of birth are not in the vital table, we obtain date of birth and sex from the SPatient.SPatient table in CDW. We will also include the following variables, per the CMS program requirements:
- LTIMCAID: number of months in Medicaid during the payment year. We set LTIMCAID=0 (otherwise) for all patients.
- NEMCAID: denotes if a patient is a new Medicare enrollee and the number of months in the payment year. We set NEMCAID=0 (otherwise) for all patients.
- OREC: original reason for entitlement. We set OREC=0 (Old age, OASI) for all patients.
We have included two variations of this step. The macro icd9p0extr(fy,icddata,person) will pull all diagnoses for the fiscal year. The variation icd9p0extr_qtr(startdt,enddt,icddata,person) will pull diagnoses for a time period other than a complete fiscal year, such as the previous 12 months from a given date or a period of less than 12 months.
4. Score CMS HCC risk score
The V24 scoring program is a SAS program provided by CMS (v2419P1P; see Table 2 for a program summary) . We have made no modifications to the scoring algorithm and have only adjusted parameters for directories and formats.
The program supplies parameters to a main macro (%V2419P1M) that calls other external macros specific to V24 HCCs:
- %AGESEXV2: Create age/sex, originally disabled, and disabled variables.
- %V24I0ED1: Perform edits to diagnosis.
- %V24H86L1: Assign labels to HCCs.
- %V24H84H1: Set HCC=0 according to hierarchies.
- %SCOREVAR: Calculate a score variable.
Prior to running this program, it is necessary to set the SAS library references to the location of the external macros and datasets on the user’s machine.
The parameters of the V21419P1M macro are:
- INP: SAS PERSON dataset created in step 1.
- IND: SAS DIAGNOSIS dataset created in step 1.
- OUTDATA: Name of the file to be created with HCC scores.
- IDVAR: Name of patient identifier. We used SCRSSN. If using other data sources it could be patientSID or real SSN.
- KEEPVAR: List of variables that should be retained in the output set include "HICNO", "&inputvars", "&scorevars", "&demvars", "&hcc24_list86", and "&cvv24_list86". The default values should be used.
- SEDITS: A switch that controls whether to perform MCE edits on ICD10 (1=Yes, 0=No).
- DATE_ASOF: Date that will be used to compute age set to "01OCTfy##&fym1" where ## are the last two digits of the federal fiscal year.
4.1. FY2019 and Forward
There are nine scores are computed for each patient: New Enrollee, Institutional, C-SNP new enrollee, Community – Non-dual aged, Community – Non-dual disabled, Community – Full benefit dual aged, Community – Full benefit dual disabled, Community – Partial benefit dual aged, and Community – Partial benefit dual disabled. For patients who have 90 days or more in long-term care, we use the Institutional score; for patients who have fewer than 90 days in long-term care, we use the Community – Non dual aged score.
4.2. Prior to FY2019
Prior to FY2019 (V21/V22), there were three scores generated: New Enrollee, Community, and Institutional. For patients who have 90 or more days in long-term care, we use the Institutional score; for patients who have fewer than 90 days in long-term care, we use the Community score.
See Table 2 for a summary of the V2419P1P program. For information on the V21 scoring program, see Technical Report 30 (Nosos v21).
Note: Basic CMS risk scoring is now complete. The following programs are needed for computing Nosos.
5. Add mental health diagnoses
The program nosos_psych takes a list of patients and ICD-10 diagnosis codes and creates 62 indicators for mental health conditions. The categories are based on the Sloan et al (2006) article “Development and Validation of a Psychiatric Case-Mix System” [1]. In 2012 we updated the Psychiatric Case-Mix System (PsyCMS) code to account for new ICD-9 codes created after publication of the original paper and added a 47th category for Pervasive Developmental Disorder (see Appendix C for mapping scheme). In 2016 we updated the PsyCMS for ICD-10 diagnosis codes. PsyCMS now includes 62 mental health and substance use categories. Details on the transition from ICD-9 to ICD-10 are documented in HERC technical report 31.
See Table 3 for a summary of the nosos_psych program. See Appendix C for the list of PsyCMS ICD-10 mappings.
[1] Sloan KL, Montez-Rath ME, Spiro A, 3rd, et al. Development and validation of a psychiatric case-mix system. Med Care. Jun 2006;44(6):568-580.
6. Add pharmacy data
The dssrx_cdw program creates indicators for 25 VA drug class categories from the VA Corporate Data Warehouse (CDW) Managerial Cost Accounting (MCA, formerly Decision Support System (DSS)) pharmacy table, [CDWWord].[dss].[PHA]. See Table 4 for a summary of the dssrx_cdw program. See Appendix D for the VA drug class mapping scheme. We have also created an Excel file that shows the crosswalk of National Drug Codes (NDC) to VA drug class categories.
The dssrx_cdw macro requires the parameter &fy for fiscal year to be set. If creating Nosos scores for periods other than a complete FY, this macro must be modified. See Technical Report 30 section 'Computing Nosos scores when cost data are not available or for periods other than complete fiscal year' for more information. We have also created a variation, dssrx_cdw_dates, which can be used for periods other than a complete FY. This version will take two extra parameters, &startdt and &enddt, and pulls all drug indicators for patients between the two dates.
The encounter-level drug costs have been cleaned of exorbitant prescription costs. Encounter-level costs that exceed a set threshold were replaced by the mean cost of the clinic stop or treating specialty.
7. Add demographic data
Demographic information is added using the risk_insure and risk_priority macros. (Note: registry information is dropped for FY2023 and forward.) See Table 5 for a summary of the risk_insure program and Table 6 for a summary of the risk_priority program.
7.1. FY2023 and Forward
For FY2023 and forward race and ethnicity are obtained from the OMOP person table; if missing, we use CDW PatSub.PatientRace, PatSub.PatientEthnicity, and Patient.Patient to complete this variable. Insurance and marital status are obtained from the CDW tables Outpat.Visit and Patient.Patient, and Vital.Master. VA registry information was dropped.
7.2. FY2019 to FY2022
For FY19 and forward, insurance, race and marital status are obtained from several CDW files. We used CDWWork.Outpat.Visit (variables: PatientMaritalStatus and PatientInsuranceType), CDWWork.PatSub.PatientRace (variables: CollectionMethod, LegacyRace), andCDWWork.PatSub.PatientEthnicity (variables: Ethnicity, CollectionMethod). Prior to FY19, we used the SAS SF file, using the values for the most recent visit day (VIZDAY) in the OUTP library of the VA Informatics and Computing Infrastructure (VINCI), in the risk_insure program.
Missing values of insurance, race or marital status are replaced with their most common values (married, insured, white). We noticed that this occurs in Fee Basis only patients, and this group had higher costs than the rest of the population. If any of these three variables are missing, the variable missing_demog is set to 1 (1=yes); otherwise it is set to 0 (0=no).
VA priority (1-9) is obtained from the ADUSH Enrollment file, in which VINCI has SAS libref ENROLL.
Inclusion in one of the 16 VA registries is obtained annually from the Allocation Resource Center (ARC) and is included as an indicator variable (risk_registry; see Appendix E for the list of all VA registries.).
7.3. Prior to FY2019
Prior to FY2019 we used the SAS SF file, using the values for the most recent visit day (VIZDAY) in the OUTP library of the VA Informatics and Computing Infrastructure (VINCI), in the risk_insure program.
Missing values of insurance, race or marital status are replaced with their most common values (married, insured, white). We noticed that this occurs in Fee Basis only patients, and this group had higher costs than the rest of the population. If any of these three variables are missing, the variable missing_demog is set to 1 (1=yes); otherwise it is set to 0 (0=no).
VA priority (1-9) is obtained from the ADUSH Enrollment file, in which VINCI has SAS libref ENROLL. We are using the final set for each completed year, which has the name ENONEPER_SEP20%fy for 2014 and prior.
Inclusion in one of the 16 VA registries is obtained annually from the Allocation Resource Center (ARC) and is included an indicator variable (risk_registry; see Appendix E for the list of all VA registries).
8. Add DSS cost
See Table 7 for a summary of the risk_dsscost program.
8.1. FY2023 and Forward
The program risk_dsscost pulls the cost data for patients (dsscost_&fy). The variables totdss, phar_cst_dss, and cc_total_cost are necessary for the Nosos regression program. This macro can include concurrent cost (dsscost_&fy, where ICD-10 codes are from the same fiscal year(‘&fy’)) or prospective cost (dsscost_&fyp1, where ‘&fyp1’ is the following fiscal year). Beginning in FY2023 the Fee Basis costs are dropped and total VA-purchased community care costs are included (cc_total_cost).
8.2. Prior to FY2023
The program risk_dsscost pulls the cost data for patients (dsscost_&fy) from datasets stored in VINCI. The three variables totdss, phar_cst_dss and fee_cost_total are necessary for the Nosos regression program.
This macro can be set to include concurrent cost (dsscost_&fy where the ICD-9/ICD-10 codes are from the same &fy) or prospective cost (dsscost_&fyp1 where &fyp1 is the following fiscal year).
The model may also be changed to run with either incurred or paid Fee Basis costs. Currently, the copies of the DSS cost data in the VINCI folder are based on paid cost (dsscost_fenpaid_fy&fy), but could be replaced with incurred cost (dsscost_fenincurred_fy&fy).
9. Score Nosos
The nosos_combine program takes the list of patients and HCC scores and merges in the demographics, priority, mental health indicators and drug indicators. In addition, the total length of time in nursing homes is used to determine which V24 score (Community -- Non-Dual Aged or Institutional) is appropriate for each patient. That value is assigned to the variable score_cms. The variable score_cms indicates which risk score should be used for an individual. This indicator is based on the number of days spent in long-term care. For patients with 90 or more days in long-term care, the institutional risk score is used; for patients with fewer than 90 days in long-term care, the Community -- Non-Dual Aged score is used.
The dssrx= parameter may be left empty when creating a Nosos regression without pharmacy data. For more information see 'Nosos regression.' See Table 8 for a summary of the nosos_combine program.
9.1. FY2023 and Forward
The fields for determining length of time in a nursing home are los_3 from MCA/DSS cost and cc_nh_los from the IVC-CDS data. If lost_3 + cc_nh_los ≥ 90, the institutional score is used.
9.2. Prior to FY2023
The fields for determining length of time in nursing home are los_3 and feelos_ip_cltc from MCA/DSS Cost. If los_3 + feelos_ip_cltc ≥ 90, the insitutiaonl scores is used.
10. Nosos regression
The nosos_regression program will load the macro %nososreg that computes the Nosos score. Some basic cleaning of data (removing negative costs) is performed and the variable total_cost is created by summing total MCA/DSS, MCA/DSS pharmacy and Fee Basis costs.
The dependent variable in the regression is square root of total cost. Predictors include score_cms, drug indicators, mental health indicators, priority, age, gender, race, ethnicity, missing demographics, marital status, and no insurance. An ordinary least squares (OLS) regression is performed and a predicted square root of cost (xb_sqrtls) is obtained for each patient. The mean standard error (MSE) of the regression model is saved into the macro variable &mse.
The predicted total cost for each patient (with an additive smearing correction) is then obtained by squaring the predicted square root of cost (xb_sqrtls) and adding the mean standard error. Finally, the mean value of the predicted total cost for the FY is obtained. Each predicted total cost is divided by the overall mean to obtain the Nosos risk score.
The program will also write the coefficients for each regression equation into a dataset, so that the regression model may be reused on another dataset if desired (see Technical Report 30 section 'Computing Nosos scores when cost data are not available or for periods other than complete fiscal year'). See Table 9 for a summary of the nosos_regression program.
A variation on the Nosos regression without pharmacy data has been created, nosos_regression_nopharm.sas. This takes the same inputs as the Nosos regression (nosos_regression) created by nosos_combine, but it does not include pharmacy in the dependent variable and does not include drug class indicators in the independent variables of the regression model or in the final output.
See Appendix A for a list of all variables included in Nosos.
11. Using the Nosos Risk Scores
11.1. Which quarter Nosos score do I use?
Beginning FY2023 we only create the annual Nosos score, previously referred to as the Quarter 4/fiscal year end (EOY) Nosos score. We discontinued the creation of the quarterly Nosos files.
Prior to FY2023 the Nosos risk scores were calculated quarterly for the Office of Productivity, Efficiency and Staffing (OPES). Data at CDW include a variable “Quarter”, which denotes the last quarter of data used in calculating the risk score. The values for the variable “Quarter” are ‘1’ (Quarter 1), ‘2’ (Quarter 2), ‘3’ (Quarter 3), and ‘Null’ (Quarter 4/fiscal year end). Risk scores from Quarters 1-3 are based on mid-year cost data which are a mixture of known costs and projected costs based on a rolling 4 quarters of data, whereas Quarter 4/fiscal year end Nosos scores use regression models where we know the actual costs from the entire prior year. We suggest using Quarter 4 Nosos scores (Quarter=Null), which use data from the entire fiscal year.
11.2. What do Nosos scores mean?
The Nosos scores are centered around 1. A value of 1 means that the patient is expected to have costs that are the national average for VA patients. If a patient has a risk score of 2.5, then the patient has an expected cost that is 2.5 times higher than the average VA patient.
11.3. How can I improve a patient's Nosos score?
Some groups have asked if it is possible to improve a patient’s Nosos score. Because Nosos scores are based on diagnostic and demographic observational data from patients’ VA medical records, it is not possible for a site to easily change a risk score. Nosos uses all of the diagnostic information within a 12-month period (full fiscal year or rolling 4 quarters) to compute the HCCs. The diagnostic information is combined in a non-linear fashion along with demographic information. The Nosos score was designed to reflect the patient’s underlying illnesses and was designed to be relatively unaffected by small changes in diagnostic coding.
If there is a concern about patients being under coded or assessed at a lower risk score, you can consider the provider’s underlying coding. Providers should be coding for all conditions with which a patient presents. Additionally, providers should ensure that their coding covers all conditions across the patient’s spectrum of chronic conditions, rather than coding for only one condition. For example, if a patient has a non-curable chronic condition, such as epilepsy, that is first coded in FY2013 but is not recorded again in FY2014 or FY2015, it could cause their FY2014 and FY2015 risk scores to be lower than in FY2013 because epilepsy is not showing up in the diagnostic information for FY2014 or FY2015.
AcknowledgementsNosos was jointly developed by the Health Economics Resource Center (HERC) and the Office of Productivity, Staffing and Efficiency (OPES). Funding for HERC’s efforts was provided by Operational Analytics and Reporting (OAR). This would not have been possible without major contributions from the following individuals:
- OAR: Peter Almenoff
- OPES: Eileen Moran, Theodore Stefos, Mei-Ling Shen and Jim Campbell
- Ci2i: Steven Asch
- HERC: Todd Wagner, Anjali Updahyay, John Cashy, Elizabeth Gehlert (Cowgill), Winifred Scott, Jeanie Lo, Lakshmi Ananth, Juliette Hong
Last Updated: May 8, 2024