26
What not to do when using async background jobs(based on rails+sidekiq experience)
It is really common to have asynchronous tasks in web applications. It can be useful to move out of the web server tasks that are complex and slow, tasks that have external dependencies and those dependencies that you cannot rely that is always on, and any other kind of tasks that is not necessary to be run synchronous during the http request, letting the duration of the request be smaller.
But it is easy to have some not expected issues when using background tasks if you are not used to them and know what to avoid and gets into some messy problems to worry about. That's what I want to share in this post, some practices that I tried to not do based on my own experience using sidekiq to run background jobs in ruby on rails applications.
Let's say you have a background task that need to receive a timestamp as a param. If you just pass a DateTime object as a param to the sidekiq job, sidekiq will serialize it to json and then deserialize it again to timestamp. This usually can occur in different servers, different machines, and it is not certain that it will not lose precision. Maybe it lost the precision of milliseconds, and it will not make any difference to your application, but it can happen with any other complex param.
This mistake is really common if you configure your mailers to be used with sidekiq. Mostly because when mailers are sending synchronous, you pass entire record's objects to them with no issue and everything works fine. And at some point you configure your project to call mailers using sidekiq, or you start to work in a new project that already has that configuration. We have two problems with this approach.
The first one is the same of the topic before. Database records are complex objects, you have multiple attributes of different types, and some of them can lose precision and you will ended up with different values in your job
The second one, and most important in my opinion, is that when you call a job you cannot be sure when it will run. It is possible that when the job starts to run the database record passed to it got outdated. For example, we can call a job to send an email passing a database record that contains the email as an attribute, and after the job got enqueued the record got updated and we ended up sending the email to a wrong destination, using an outdated email information.
So, if it is important to always get the updated data from database when you start to run your job, that is better to pass only enough information so the job can search in the database to the rest of information by itself. Doing this we also avoid that the serialization and deserialization processes take more time, since simpler objects are quicker to process.
This seems an obvious practice to avoid reading the line above. But specially with active record I see it happen a lot when we are using active record callbacks. You just have to call a job using a wrong callback, let's say, any callback that runs before the active record transaction finished and be committed. Considering a rails application as example, the best solution here is to always use callbacks that run after transaction commit when you want to call jobs, since it runs only after the database transaction is finished.
Another scenario that I saw it happen is when we have some use cases(or services) chained to other use cases(or services). For example, let's say we have the classes below:
Looking to the ProfileUpdate class, everything appears to be correct, the mailer is being called outside the transaction. But if we look to UserSafetyDataValidation class, we can see that there is a job being called there, and everything on call method of this class run inside the database transaction initiated outside of this class. This can cause the job to get run even if the user.update! fail.
In this case, one approach that we can use is to chain back the messages from inside classes to outside classes and call jobs and other tasks outside use cases and services, maybe using listener and broadcast architecture.
You can use the isolator gem to help you avoid those issues, and you can read about another good approach to this last scenario in this rails after_commit everywhere post.
This is not too common as the issues listed above, but it not an edge case either. This issue can be avoided if we always care and think of backwards compatibility in the code we are developing. The problem here is that when you change a job param or a job name, you can have jobs that were enqueued considering the old signature and it will fail when the job is selected from the queue to be executed. For example, if you add a new required param to a job, old enqueued jobs will fail since it does not have this new param enqueued with them.
In a situation that you really need to change a job's signature, one approach is to change it in some steps. For example, if it is really necessary to add a new param to a job, you can add this param first as an optional param, setting a default value to it, and after a while, when you can be sure that all enqueued jobs now already have the new param being set to it, you can remove the new param's default value and make it required. If the requirement is to change the job name, you can create a new job with the new name, but keep the old job at first moment, removing it only when you do not have any task enqueued for that job and is safe to remove it.
This is really common in the first month of projects, when most of things need to be done quick to release the application's first version into production. It is important at least to define queues for two different priorities: tasks that need to be run in a short delay, that are more transactional; tasks that can take longer to run, that the delay is not really important. For example, imagine that you have a job that sends a confirmation SMS with a code after the user changes their phone number, but when that SMS is enqueued, you have a lot of promotional emails being sending and perhaps it causes an hour delay for the confirmation SMS. Probably your user will not wait all this time to receive the SMS.
So, those are some of the practices that I learned to avoid when working with background jobs, but I think that most of those ideas can be applied to broadcast and events based application. But like I said in the beginning of the post, these are based only in my experience, mostly based in ruby on rails projects and sidekiq. Maybe there are some more important practices to avoid using background with other frameworks and languages.
26