As health experts know all too well, a small proportion of patients account for a large percentage of health costs. In 2016, for example, the top one percent of patients ranked by their healthcare expenditures drove 21.1 percent of health costs, with an annual mean expenditure of $110,003. The top five percent of the population is responsible for 50 percent of total expenditures. The bottom 50 percent of the entire population accounted for less than three percent (2.8) of costs.
At the risk of stating the obvious, healthcare organizations are acutely interested in identifying patients at risk of becoming high-cost so they can intervene as early as possible to prevent health deterioration and health costs.
Machine learning offers the real possibility of identifying high-cost patients:
- Rules-based identification is limited to standard cases.
- Machine learning handles large amounts of data with skewed distributions typical of health costs.
- Machine learning uncovers complex relationships between patient characteristics and future cost.
- Customizable stratification of machine learning predictions complements limited intervention resources.
- Machine learning, as opposed to actuarial models that work at the population level, can provide personalized results.
The Geneia Data Intelligence Lab
ICYMI, Geneia created a Data Intelligence Lab and staffed it with innately curious PhD- and masters-level data scientists who are charged with using leading-edge data science to follow the numbers to answer healthcare’s intractable questions. We prioritize projects that address major cost drivers and improve health outcomes, enabling our health plan, hospital and physician clients to predict, intervene and engage high-cost patients and conditions such as heart failure and diabetes.
The GDI Lab set a goal of creating a machine learning model to predict which patients are likely to become high-cost claimants in the next 12 months.
The strategy, in summary, was:
- Calculate variables likely correlated with cost during the intake window.
- Determine healthcare costs during prediction window.
- Use variables to predict future costs, and then compare to actual costs.
- Evaluate the trained model performance on an independent dataset.
- Benchmark this performance against the current cost prediction strategy.
Of the variables considered during the process of creating the trained model, a number emerged as important ones for future cost predictions, including:
- Previous year’s total cost
- Diagnoses of type 2 diabetes, hypertension and/or lower back pain
- Emergency department claims
- Outpatient claims
The High-Cost Claimants Predictive Model
- Cohort selection: At least one month of medical eligibility and one medical claim
- Number of variables: 21
- Data sources: Demographics, medical and pharmacy claims, social determinants of health
- Model output: Regression and classification
The model yields thresholds for actual costs and classifies patients by predicted future cost in the next 12 months:
Geneia’s non-linear model accurately identifies patients’ future costs at high-cost thresholds such as between $50,000 and $99,999 in health costs in the next 12 months. Our model requires less data to train, utilizes novel data sources and outperforms well-known commercial tools.
Impactability: A Look Ahead
The GDI Lab is working to determine whose cost can be most affected or, in other words, the impactable patients who are most likely to benefit from care management intervention.
We know targeting only the riskiest patients means lost opportunities. For example, the chart below compares a risky patient with an impactable one, and shows the associated savings opportunity.
High-cost and high-needs are not the same as highly impactable.
As C. Annette DuBard, MD, MPH and Carlos T. Jackson, PhD, discussed in their paper, Active Redesign of a Medicaid Care Management Strategy for Greater Return on Investment: Predicting Impactability, “Targeting strategies that seek to identify patients based on high current or predicted costs or utilization are likely to identify large numbers of individuals whose healthcare needs will not be meaningfully altered by care management intervention.” Their research with Community Care of North Carolina’s Medicaid patient led to the creation of an impactability score.
As you can see in the graph above, high costs and impactability are strongly correlated. In fact, there is only a 53 percent overlap between the top 5,000 costliest patient with the top 5,000 with the highest impactability score. Solely focusing on members with high cost would mean missing everyone in the bottom right quadrant, below the cost threshold but with good potential for impactful change.
Dubard and Jackson’s research also demonstrated that impactability score is associated with more savings than other case management targeting strategies.
As you can see in the graph above, the same investment in care managing 5,000 patients yields very different results depending on who you choose to manage. The most impactable patients led to a per patient, six-month savings of $4,488 compared to $2,148 for inpatient super-users and $2,748 for emergency department super-users.
That’s why the Geneia Data Intelligence Lab is prioritizing an impactability model for all patients. Stay tuned for additional blog updates about this model and how it will complement our high-cost model as well as others under development.