Predictive Analytics World: Using Embeddings in Healthcare | Geneia

Predictive Analytics World: Using Embeddings in Healthcare

September 22, 2020
Fred Rahmanian, Chief Analytics and Technology Officer


Geneia CATO Fred Rahmanian discusses how embeddings simplify the process of identifying similar patients

ICYMI, I delivered a lightning talk at Predictive Analytics World in early June.

You can watch my talk here: https://youtu.be/SCMMKsFrpSA.

In my presentation, Using Low Dimensional Representation of Medical Concepts to Improve Population Health Management, I discussed how Geneia’s use of this technique allows health plans, hospitals and provider organizations to better identify and engage patient cohorts. In short, one use of low dimensional representation of medical concepts is to simplify the process of identifying similar patients that may be missed as similar by using traditional identification and stratification methods.

Specifically, I define embedding as a way of mapping categorical data with 1000’s of elements to a low dimensional vector of floating points with a much lower number of dimensions. Rather than a one-to-one ratio of categorical variables to features, one million categorical variables would be mapped to a 100 or 500 dimension (or any dimension you choose) vector of floating points.

In neural networks, embeggins are learned low dimensional representation of categorical variables.

I also shared additional information about embeddings:

  • In neural networks, embeddings are learned low dimensional representation of categorical variables.
  • These learned embeddings are interesting because they represent categorical variables in the embedding space.
  • Embedding space is a dimensional vector space that has mathematical meaning.
  • Embeddings can be used as input features into supervised learning task.
  • Their distance to other embeddings within the embedding space can be used as a measure of similarity. I illustrated this principle with an example about France and Germany.

There are many reasons I like to use embeddings in healthcare, including:

  • Easy to implement
    • Use established embedding algorithm (doc2vec)
  • Very little data engineering or preprocessing
  • Save time and computational resources
    • Can be used for medical prediction problems under simple linear models
    • Can be used in downstream modeling by non-experts
  • Find similar patients by simple clustering of their embeddings
  • Powerful for rapidly testing a large amount of proof-of-concept ideas in healthcare

To learn more about how we’re using embeddings in the Geneia Data Intelligence Lab, I invite you to watch my talk.


Related Blogs