Diseases Prediction Based On Medications Using Indexing In MongoDB

In this article, we will discuss a feature in DOCTOR-Y to predict the current patients' medical conditions based on their regular medications, using a dataset containing medicines and their corresponding medical conditions.

And this is done using searching techniques provided from the MongoDB.

If you don't know what is DOCTOR-Y check this post.


Our goal is to generate a chart that shows the patients' conditions in a form of percentages, which are calculated based on all the medications they have taken during a specific period.

For example

A patient that is prescribed with 3 medicines:

  • Actemra
  • Duexis
  • Indocin

The results should be something like this:


The dataset consists of 1224 records; each record contains the drug name, its corresponding condition, and the weight of this condition. It is derived from the Medication guide offered by the FDA.

This dataset is uploaded on DOCTOR-Y's database, so it can be used by the application server.

A sample from the dataset
Abilify Schizophrenia 0.2
Abilify Bipolar I Disorder 0.2
Abilify Major Depressive Disorder (MDD) 0.2
Abilify Irritability 0.2
Abilify Tourette's Disorder 0.2
Abilify Maintena Kit Schizophrenia 0.5
Abilify Maintena Kit Bipolar I Disorder 0.5

Applied Search Method

In order to traverse the collection, we can use the default search method in the MongoDB which is the collection scan of complexity O(n).

However, we opted to use indexing, single field indexing to be exact which is a searching method that uses B-tree data structure, thus having a complexity of O(log n) which offers better performance than a collection scan.


Each record has a drug name, condition and weight, the following steps are taken to get percentage of occurrence of each condition.

  1. A search is conducted for each medicine taken by the user in the drug name field and the matching records are retrieved.
    For example a patient is prescribed with {Actemra, Duexis, Indocin}

    The following records were retrieved.
    Drug name Condition Weight
    Actemra Rheumatoid Arthritis (RA) 0.2
    Actemra Giant Cell Arteritis (GCA) 0.2
    Actemra Polyarticular Juvenile Idiopathic Arthritis (PJIA) 0.2
    Actemra Systemic Juvenile Idiopathic Arthritis (SJIA) 0.2
    Actemra Cytokine Release Syndrome (CRS) 0.2
    Duexis upper gastrointestinal ulcers 0.33
    Duexis Osteoarthritis 0.33
    Duexis Rheumatoid Arthritis (RA) 0.33
    Indocin Rheumatoid Arthritis (RA) 0.2
    Indocin Ankylosing spondylitis (AS) 0.2
    Indocin Osteoarthritis 0.2
    Indocin Acute painful shoulder 0.2
    Indocin Acute gouty arthritis 0.2
  2. An iteration is done through the retrieved records while putting each new condition into a hash table with the key being the condition and the value being the weight.

    If a condition already exists in the hash table, we add its weight to the existing weight in the hash table.

    Now we have a table of patient conditions with their corresponding weights.

    Moving on with the previous example we get the following hash table.
    Key (Condition) Value (Weight)
    Rheumatoid Arthritis (RA) 0.73
    Osteoarthritis 0.53
    upper gastrointestinal ulcers 0.33
    Giant Cell Arteritis (GCA) 0.2
    Polyarticular Juvenile Idiopathic Arthritis (PJIA) 0.2
    Systemic Juvenile Idiopathic Arthritis (SJIA) 0.2
    Cytokine Release Syndrome (CRS) 0.2
    Ankylosing spondylitis (AS) 0.2
    Acute painful shoulder 0.2
    Acute gouty arthritis 0.2
    Total 2.99
  3. We divided each weight by the total sum of weights and multiplied it by 100 to get the percentage of occurrence of each condition.

    From the previous example we get the following results.
    Key (Condition) Value (Weight)
    Rheumatoid Arthritis (RA) 24.4%
    Osteoarthritis 17.7%
    upper gastrointestinal ulcers 11%
    Giant Cell Arteritis (GCA) 6.7%
    Polyarticular Juvenile Idiopathic Arthritis (PJIA) 6.7%
    Systemic Juvenile Idiopathic Arthritis (SJIA) 6.7%
    Cytokine Release Syndrome (CRS) 6.7%
    Ankylosing spondylitis (AS) 6.7%
    Acute painful shoulder 6.7%
    Acute gouty arthritis 6.7%

Integration With DOCTOR-Y

The final diseases and their percentages are sent to the system server, which sends them to the client-side to be represented on a chart as shown in the figure below.