Diseases Prediction Based On Medications Using Indexing In MongoDB

In this article, we will discuss a feature in DOCTOR-Y to predict the current patients' medical conditions based on their regular medications, using a dataset containing medicines and their corresponding medical conditions.

And this is done using searching techniques provided from the MongoDB.

If you don't know what is DOCTOR-Y check this post.

Objective

Our goal is to generate a chart that shows the patients' conditions in a form of percentages, which are calculated based on all the medications they have taken during a specific period.

For example

A patient that is prescribed with 3 medicines:

Actemra

Duexis

Indocin

The results should be something like this:

Dataset

The dataset consists of 1224 records; each record contains the drug name, its corresponding condition, and the weight of this condition. It is derived from the Medication guide offered by the FDA.

This dataset is uploaded on DOCTOR-Y's database, so it can be used by the application server.

A sample from the dataset

Abilify	Schizophrenia	0.2
Abilify	Bipolar I Disorder	0.2
Abilify	Major Depressive Disorder (MDD)	0.2
Abilify	Irritability	0.2
Abilify	Tourette's Disorder	0.2
Abilify Maintena Kit	Schizophrenia	0.5
Abilify Maintena Kit	Bipolar I Disorder	0.5

Applied Search Method

In order to traverse the collection, we can use the default search method in the MongoDB which is the collection scan of complexity O(n).

However, we opted to use indexing, single field indexing to be exact which is a searching method that uses B-tree data structure, thus having a complexity of O(log n) which offers better performance than a collection scan.

Mechanism

Each record has a drug name, condition and weight, the following steps are taken to get percentage of occurrence of each condition.

A search is conducted for each medicine taken by the user in the drug name field and the matching records are retrieved.
For example a patient is prescribed with {Actemra, Duexis, Indocin}

The following records were retrieved.

Drug name	Condition	Weight
Actemra	Rheumatoid Arthritis (RA)	0.2
Actemra	Giant Cell Arteritis (GCA)	0.2
Actemra	Polyarticular Juvenile Idiopathic Arthritis (PJIA)	0.2
Actemra	Systemic Juvenile Idiopathic Arthritis (SJIA)	0.2
Actemra	Cytokine Release Syndrome (CRS)	0.2
Duexis	upper gastrointestinal ulcers	0.33
Duexis	Osteoarthritis	0.33
Duexis	Rheumatoid Arthritis (RA)	0.33
Indocin	Rheumatoid Arthritis (RA)	0.2
Indocin	Ankylosing spondylitis (AS)	0.2
Indocin	Osteoarthritis	0.2
Indocin	Acute painful shoulder	0.2
Indocin	Acute gouty arthritis	0.2

An iteration is done through the retrieved records while putting each new condition into a hash table with the key being the condition and the value being the weight.

If a condition already exists in the hash table, we add its weight to the existing weight in the hash table.

Now we have a table of patient conditions with their corresponding weights.

Moving on with the previous example we get the following hash table.

Key (Condition)	Value (Weight)
Rheumatoid Arthritis (RA)	0.73
Osteoarthritis	0.53
upper gastrointestinal ulcers	0.33
Giant Cell Arteritis (GCA)	0.2
Polyarticular Juvenile Idiopathic Arthritis (PJIA)	0.2
Systemic Juvenile Idiopathic Arthritis (SJIA)	0.2
Cytokine Release Syndrome (CRS)	0.2
Ankylosing spondylitis (AS)	0.2
Acute painful shoulder	0.2
Acute gouty arthritis	0.2
Total	2.99

We divided each weight by the total sum of weights and multiplied it by 100 to get the percentage of occurrence of each condition.

From the previous example we get the following results.

Key (Condition)	Value (Weight)
Rheumatoid Arthritis (RA)	24.4%
Osteoarthritis	17.7%
upper gastrointestinal ulcers	11%
Giant Cell Arteritis (GCA)	6.7%
Polyarticular Juvenile Idiopathic Arthritis (PJIA)	6.7%
Systemic Juvenile Idiopathic Arthritis (SJIA)	6.7%
Cytokine Release Syndrome (CRS)	6.7%
Ankylosing spondylitis (AS)	6.7%
Acute painful shoulder	6.7%
Acute gouty arthritis	6.7%

Integration With DOCTOR-Y

The final diseases and their percentages are sent to the system server, which sends them to the client-side to be represented on a chart as shown in the figure below.