23
Diseases Prediction Based On Medications Using Indexing In MongoDB
In this article, we will discuss a feature in DOCTOR-Y to predict the current patients' medical conditions based on their regular medications, using a dataset containing medicines and their corresponding medical conditions.
And this is done using searching techniques provided from the MongoDB.
If you don't know what is DOCTOR-Y check this post.
Our goal is to generate a chart that shows the patients' conditions in a form of percentages, which are calculated based on all the medications they have taken during a specific period.
A patient that is prescribed with 3 medicines:
- Actemra
- Duexis
- Indocin
The dataset consists of 1224 records; each record contains the drug name, its corresponding condition, and the weight of this condition. It is derived from the Medication guide offered by the FDA.
This dataset is uploaded on DOCTOR-Y's database, so it can be used by the application server.
Abilify | Schizophrenia | 0.2 |
Abilify | Bipolar I Disorder | 0.2 |
Abilify | Major Depressive Disorder (MDD) | 0.2 |
Abilify | Irritability | 0.2 |
Abilify | Tourette's Disorder | 0.2 |
Abilify Maintena Kit | Schizophrenia | 0.5 |
Abilify Maintena Kit | Bipolar I Disorder | 0.5 |
In order to traverse the collection, we can use the default search method in the MongoDB which is the collection scan of complexity O(n).
However, we opted to use indexing, single field indexing to be exact which is a searching method that uses B-tree data structure, thus having a complexity of O(log n) which offers better performance than a collection scan.
Each record has a drug name, condition and weight, the following steps are taken to get percentage of occurrence of each condition.
-
A search is conducted for each medicine taken by the user in the drug name field and the matching records are retrieved.
For example a patient is prescribed with {Actemra, Duexis, Indocin}The following records were retrieved.
Drug name Condition Weight Actemra Rheumatoid Arthritis (RA) 0.2 Actemra Giant Cell Arteritis (GCA) 0.2 Actemra Polyarticular Juvenile Idiopathic Arthritis (PJIA) 0.2 Actemra Systemic Juvenile Idiopathic Arthritis (SJIA) 0.2 Actemra Cytokine Release Syndrome (CRS) 0.2 Duexis upper gastrointestinal ulcers 0.33 Duexis Osteoarthritis 0.33 Duexis Rheumatoid Arthritis (RA) 0.33 Indocin Rheumatoid Arthritis (RA) 0.2 Indocin Ankylosing spondylitis (AS) 0.2 Indocin Osteoarthritis 0.2 Indocin Acute painful shoulder 0.2 Indocin Acute gouty arthritis 0.2 -
An iteration is done through the retrieved records while putting each new condition into a hash table with the key being the condition and the value being the weight.
If a condition already exists in the hash table, we add its weight to the existing weight in the hash table.
Now we have a table of patient conditions with their corresponding weights.Moving on with the previous example we get the following hash table.
Key (Condition) Value (Weight) Rheumatoid Arthritis (RA) 0.73 Osteoarthritis 0.53 upper gastrointestinal ulcers 0.33 Giant Cell Arteritis (GCA) 0.2 Polyarticular Juvenile Idiopathic Arthritis (PJIA) 0.2 Systemic Juvenile Idiopathic Arthritis (SJIA) 0.2 Cytokine Release Syndrome (CRS) 0.2 Ankylosing spondylitis (AS) 0.2 Acute painful shoulder 0.2 Acute gouty arthritis 0.2 Total 2.99 -
We divided each weight by the total sum of weights and multiplied it by 100 to get the percentage of occurrence of each condition.
From the previous example we get the following results.
Key (Condition) Value (Weight) Rheumatoid Arthritis (RA) 24.4% Osteoarthritis 17.7% upper gastrointestinal ulcers 11% Giant Cell Arteritis (GCA) 6.7% Polyarticular Juvenile Idiopathic Arthritis (PJIA) 6.7% Systemic Juvenile Idiopathic Arthritis (SJIA) 6.7% Cytokine Release Syndrome (CRS) 6.7% Ankylosing spondylitis (AS) 6.7% Acute painful shoulder 6.7% Acute gouty arthritis 6.7%
23