29
Best Practices of Exception Handling
Exception handling is one of the most important aspects of software engineering. There are many articles on this topic. Yet, I have seen other developers and myself making mistakes and not exiting gracefully. In my experience, I have encountered many exceptional scenarios and failure reports. I have written a lot of codes and made mistakes a thousand times. The following quote by Malcolm Muggeridge summarises exception handling pretty well.
Few men of action have been able to make a graceful exit at the appropriate time.
In this article, I am documenting my share of learnings along the process.
Note: I am using Java in this article as a reference. It is my primary language of coding. But, the principles are language agnostic. I will cover the things that are out of the scope of coding as well. As I learn new things about the topic, I will keep this post updated. I have used unchecked and runtime exceptions interchangeably.
I am dividing this article into two sections
- Handling exceptions inside code
- Handling exceptions outside code
Exceptions should be used for exceptional scenarios only. In Effective Java, author Joshua Bloch is extremely vocal about this. If he is so vocal about this, it is an issue. I hope none of us do this anymore. Following is an example from the book
// Horrible abuse of exceptions. Don't ever do this!
try {
int i = 0;
while(true)
range[i++].climb();
} catch (ArrayIndexOutOfBoundsException e) {
}
Try to guess the answer! The above code is clever but lacks readability, intent, and performance. JVM does not apply some optimization as well seeing try-catch.
The simpler version is readable, understandable, and lets JVM apply optimizations.
for (Mountain m : range)
m.climb();
It is obvious but often forgotten. I have seen a similar code as follows.
try {
executeCommand();
} catch (IllegalStateException e) {
//Highly suspicious. Anything can happen here
//without your knowledge.
}
This is not a good practice at all. In the future, if something happens, you will never know. I am also a criminal of ignoring an exception altogether (by mistake) 😔. Later I fixed it while refactoring. I still don't know about the impact 🥶. And it still bugs me!
Make sure you have ways to know that exception has occurred. It might be some metrics or UI-based log aggregator for capturing all logs. But it is a necessity. Otherwise, you will not know if there is anything wrong with the system.
If you log and throw another exception, the catcher of exceptions might again log details about exceptions. So, you will have redundant information for the same exception. It will pollute error logs. In turn, it will make your life uneasy while debugging.
try {
executeCommand();
} catch (IllegalStateException e) {
// Following log is redundant information.
// It will pollute your error log.
log.error("Exception at doSomething:", e);
throw new BadRequest("Exception has occurred.", e);
}
This is a common practice. Let me warn you! If you are not logging exception details, you are digging a hole for the future. It will be hard to debug an actual issue. Your on-call will keep looking for issues without knowing where to look. Having a proper stack trace will help him to decide where to look. It will help root cause the issue faster.
Dependencies or library calls throw exceptions. Those exceptions can be specific to the callee. Those are seldom related to the caller. But we need to handle exceptions at the caller end. It is better to map specific exceptions from the callee to exception types of the caller. Exception Translator is one of the best tools available in this case. Another way can be propagating status codes such as success or failure, of the operation. In the following example, CommandArgumentException conveys better meaning relative to the caller.
try {
executeCommand();
} catch (IllegalStateException e) {
throw new CommandArgumentException("Arguments
to command are not right.", e);
}
Also, I have worked on monolith software written in C (codebase size was 2.2GB). In C, we don't have a proper exception handling framework like in Java. In such cases, exception translator and framework help a lot. Following are a couple of ways
- It simplifies the handling of exceptions by calling call-back functions. In case of some null pointer exception, you should hand over the control to the exception handler. You cannot call exit(). For debugging, you need to collect reports.
- If you don't capture stack trace in Exception Handler, you will not be able to debug the issue. The stack trace will help you create the test case. It will be your friend for the entire debugging.
Compilers force us to try-catch checked exceptions. We often miss runtime exceptions. We should be skeptical about runtime exceptions especially while calling dependencies. In large-scale distributed systems, failure is common. We once encountered a dependency call that does not have any checked exceptions. After careful inspection, we wrapped it up inside try-catch for runtime exceptions. It allowed us to exit gracefully.
As I work in distributed system development, I generally follow this. It helps to exit with grace. But I make sure to log this and capture such exceptions. In case of a surge of generic exceptions, we should investigate the issue. There might be some new checked exception in the callee side or there is some runtime exception. In any case, we should check the issue.
Throwables are the superclass of everything including Errors. Errors are used by JVM to indicate a serious system-related problem like StackOverflowError etc. Applications are not supposed to handle these errors. When JVM throws an error, there is some problem. Go check it! 🧐
This one is interesting. Let me elaborate on failure atomicity. In case of a failed operation on an object, we should roll back all the changes to the previous state. The object should not have partial updates. So, we can check if an operation can fail or not. We can use a checker function for this. JDK has some good examples of this. Iterator interface in Java has a checker hasNext() method. If it fails, we generally don't call the next() method. We can use this pattern as well when needed.
Most of the developers dislike documentation. As we write a piece code, we leave our legacy. Your documentation ensures your successor understands your code. Secretly, we all hope - when the next developer sees our code, he says
We should document all the exceptions including runtime exceptions. In Java, the "throws" clause forces to check checked exceptions. For covering all the bases, we should document runtime exceptions as well.
- We should document all exceptions including checked and unchecked using the JavaDoc '@throws' tag. Don't just mention Exception or Throwable. The below example does not express anything.
/**
* JavaDoc Exception sample documentation
* @throws Exception in case of exceptional scenarios.
*/
public void doSomething() throws AnException, AnotherException {
...
}
- We should only include checked exceptions in the throws clause, but not unchecked exceptions. It lets developers know that an exception mentioned in JavaDoc, but missing in the throws clause is unchecked.
- Document the condition for which exception is thrown. For the above example, it is desirable to include conditions for which each exception is thrown.
This will give you confidence that your code works during exceptional scenarios. Check for outputs during exceptions. If it is not the same as you expected, anything can fail in production.
In case you use some counter or special logs during the exception, your unit tests for exceptions should cover those. Unit tests will fail in case someone changes counter or logs. And seldom, we have reasons to change these. If there is any reason, the developer should know why it was there in the first place. Your tests should be like documentation. It should cover all the aspects of your code including exceptional scenarios.
Operations are one of the big issues in large software systems. It includes a large-scale distributed system as well as a standalone desktop or mobile app. Sustenance of our service is one of the most important goals. We should have a proper mechanism for handling error scenarios. Following are my two cents on this
Your on-call might have to debug a production issue at midnight. The last thing he wants at that time is confusion. Good logs and indicators (specific to your use case) can help him find the root cause of the issue faster. Ambiguity and complexity are hindrances to the root cause of the issue.
Now food for thought, we have two code segments.
Option 1
String serviceA = "Something has happened in serviceA";
String serviceB = "Something has happened in serviceB";
//somewhere else
log.error(serviceA);
//somewhere else
log.error(serviceB);
Option 2
String errorStatement = "Something has happened in ";
String serviceA = "serviceA";
String serviceB = "serviceB";
//somewhere else
log.error(errorStatement + serviceA);
//somewhere else
log.error(errorStatement + serviceB);
Suppose, we got an exception at midnight and we are getting "Something has happened in serviceA" as exception signature. Which one do you think will help to find code faster? I think Option 1. Single copy-pasting is easier than creating strings from multiple variables. Obviously, with the signature of exception, you can fast-track your problematic point. I still find Option 1 easier for on-call.
Note: Have you checked the last blank space at the end of errorStatement definition? These small spaces can give you nightmares.
Add metrics wherever you think an error might occur. And add alarms on this. This will let you know if something is wrong with your system. In the monolith that I worked on, we incorporated AddressSantizer to collect data about runtime exceptions. It was worth it. It helped a lot for finding a lot of long-term issues. I can still remember the days of debugging a problematic pointer! 😫
Metrics and alarms without action items are meaningless. It is the same as a fire brigade team without protocol. Update the run-book with all relevant details and action items for important exceptions. Also, if required, add metrics to the dashboard.
You don't want a false alarm in the middle of the night. If you need to update an alarm, update it without delay. Getting regular wrong alarms can make you lose your trust in your system.
Here are some general guidelines for handling exceptions.
Skepticism is better than optimism for handling errors and exceptions. Even if you have done everything, exceptions will occur. If you have a proper mechanism in place, you can debug an issue faster.
If you build your system for sustenance, you will be better prepared for unknowns. This is one of the signs of mature developers.
Failure is seldom a single developer's mistake. From coding a solution to deploying that involves a lot of steps. Any mistake at any stage can cause failure. We should aim for reducing failures by owning the process. There might be some issues unknown to the team. The developer and reviewers might miss it. Lack of testing might have missed the bug. We should look forward to improving the process. A root-cause analysis might help in this regard.
Exception handling is a well-known misunderstood aspect of programming. In summary, we
- Should aim for sustenance and share knowledge across the way. It will help a lot.
- Do not ignore exceptions. Handle it properly. Some principles mentioned might not be applicable to you. Break those and make yours. And share it with others.
- Do not play blame games. It will not help anyone.
We know how to use try-catch, but we don't know how it can affect our system. This article is my attempt to capture my thoughts on this topic. Please like the post if you enjoyed reading it. Let me know in the comments if you have any thoughts or questions.
Happy Learning!! 😀
29