30
Diseases Prediction Based On Medications Using Indexing In MongoDB
In this article, we will discuss a feature in DOCTOR-Y to predict the current patients' medical conditions based on their regular medications, using a dataset containing medicines and their corresponding medical conditions.
And this is done using searching techniques provided from the MongoDB.
And this is done using searching techniques provided from the MongoDB.
If you don't know what is DOCTOR-Y check this post.
Our goal is to generate a chart that shows the patients' conditions in a form of percentages, which are calculated based on all the medications they have taken during a specific period.
A patient that is prescribed with 3 medicines:
The dataset consists of 1224 records; each record contains the drug name, its corresponding condition, and the weight of this condition. It is derived from the Medication guide offered by the FDA.
This dataset is uploaded on DOCTOR-Y's database, so it can be used by the application server.
This dataset is uploaded on DOCTOR-Y's database, so it can be used by the application server.
Abilify | Schizophrenia | 0.2 |
Abilify | Bipolar I Disorder | 0.2 |
Abilify | Major Depressive Disorder (MDD) | 0.2 |
Abilify | Irritability | 0.2 |
Abilify | Tourette's Disorder | 0.2 |
Abilify Maintena Kit | Schizophrenia | 0.5 |
Abilify Maintena Kit | Bipolar I Disorder | 0.5 |
In order to traverse the collection, we can use the default search method in the MongoDB which is the collection scan of complexity O(n).
However, we opted to use indexing, single field indexing to be exact which is a searching method that uses B-tree data structure, thus having a complexity of O(log n) which offers better performance than a collection scan.
However, we opted to use indexing, single field indexing to be exact which is a searching method that uses B-tree data structure, thus having a complexity of O(log n) which offers better performance than a collection scan.
Each record has a drug name, condition and weight, the following steps are taken to get percentage of occurrence of each condition.
A search is conducted for each medicine taken by the user in the drug name field and the matching records are retrieved.
For example a patient is prescribed with {Actemra, Duexis, Indocin}
The following records were retrieved.
Drug name | Condition | Weight |
---|---|---|
Actemra | Rheumatoid Arthritis (RA) | 0.2 |
Actemra | Giant Cell Arteritis (GCA) | 0.2 |
Actemra | Polyarticular Juvenile Idiopathic Arthritis (PJIA) | 0.2 |
Actemra | Systemic Juvenile Idiopathic Arthritis (SJIA) | 0.2 |
Actemra | Cytokine Release Syndrome (CRS) | 0.2 |
Duexis | upper gastrointestinal ulcers | 0.33 |
Duexis | Osteoarthritis | 0.33 |
Duexis | Rheumatoid Arthritis (RA) | 0.33 |
Indocin | Rheumatoid Arthritis (RA) | 0.2 |
Indocin | Ankylosing spondylitis (AS) | 0.2 |
Indocin | Osteoarthritis | 0.2 |
Indocin | Acute painful shoulder | 0.2 |
Indocin | Acute gouty arthritis | 0.2 |
An iteration is done through the retrieved records while putting each new condition into a hash table with the key being the condition and the value being the weight.
If a condition already exists in the hash table, we add its weight to the existing weight in the hash table.
Now we have a table of patient conditions with their corresponding weights.
Moving on with the previous example we get the following hash table.
Key (Condition) | Value (Weight) |
---|---|
Rheumatoid Arthritis (RA) | 0.73 |
Osteoarthritis | 0.53 |
upper gastrointestinal ulcers | 0.33 |
Giant Cell Arteritis (GCA) | 0.2 |
Polyarticular Juvenile Idiopathic Arthritis (PJIA) | 0.2 |
Systemic Juvenile Idiopathic Arthritis (SJIA) | 0.2 |
Cytokine Release Syndrome (CRS) | 0.2 |
Ankylosing spondylitis (AS) | 0.2 |
Acute painful shoulder | 0.2 |
Acute gouty arthritis | 0.2 |
Total | 2.99 |
We divided each weight by the total sum of weights and multiplied it by 100 to get the percentage of occurrence of each condition.
From the previous example we get the following results.
Key (Condition) | Value (Weight) |
---|---|
Rheumatoid Arthritis (RA) | 24.4% |
Osteoarthritis | 17.7% |
upper gastrointestinal ulcers | 11% |
Giant Cell Arteritis (GCA) | 6.7% |
Polyarticular Juvenile Idiopathic Arthritis (PJIA) | 6.7% |
Systemic Juvenile Idiopathic Arthritis (SJIA) | 6.7% |
Cytokine Release Syndrome (CRS) | 6.7% |
Ankylosing spondylitis (AS) | 6.7% |
Acute painful shoulder | 6.7% |
Acute gouty arthritis | 6.7% |
30