Fix your data, not your code

The exciting moment of starting an app from scratch is to know that you can do everything right. You'll have your CRUD operations very well thought out and ensuring that any database request is done properly, with the right data and at the right time.

Well at least, it's how I like to think about it and even though that should be fairly common, the world changes and data that was correct at the time might now be incorrect. Also it's not every day that you start an app from scratch.

Data quality plays an important role in the functioning of your application, furthermore if you have a high volume of end users using your app daily.

This blog post will quickly go through an use case that a team had to deal with recently where the decision was to fix the data and not the code

The scenario

It's 2015

  • Two status that can be assigned to an end user: Approved or Disapproved
    • Approved: use the app with no restrictions
    • Disapproved: use the app with no restrictions but you are not part of the user listings
  • The status and when it changed is stored in a specific table in our database

It's 2021

  • Disapproved will now have a few restrictions in usage and continue the "invisibility mode"
  • Quickly implement the needed changes

But the question arises

What about the users that were disapproved in the past 6 years?

The options

  1. Make new restrictions only available for users that are disapproved starting from a certain date
  2. Go through each user and understand whom should keep the disapproved status, now with the new set of restrictions

Pros of #1

  • Fast & easy

Cons of #1

  • Disapproved now has two different meanings: one before the date it goes live and another after
  • That meaning is hidden in the code (don't get me started on documentation...)
  • Random logic deciding if users have the new set of restrictions or not

Pros of #2

  • Data will mirror the current state of affairs
  • It can help uncover other issues that lead to a better solution. eg: having more than Approved and Disapproved

Cons of #2

  • Who the hell wants to analyse a spreadsheet row by row?
  • Slower than #1
  • It might come to a place where hard decisions must be made. eg: choosing between continue user x disapproved or not and therefore ruin the user experience

The final decision

In the end we went ahead with second option. While it took a while to review all the users and make the decision which ones should be under the new changes, we do believe that in the long run not having random decisions in the code (if statements checking and comparing dates, just because) will allows us to have a better product and happier developers.

This is my first post at dev.to, which has been a long time coming.
Hoping this can help someone out there :)

14