26
Software Performance β A Pragmatic Guide
let i = 0;
const stopAt = Date.now() + 1000;
while (Date.now() < stopAt) {
i++;
}
console.log(i); // ~8,312,450
Running the script above on my machine, outputs a number around 8 millions.
This means that my machine can evaluate the current date, compare it to a point of time, and increment a variable 8 million times in a single second.
We've seen developers spend endless hours optimizing their code to gain every last bit of performance possible, and we've seen developers not caring at all about performance as long as the code worked.
To understand what's the most optimal way of tackling performance, we have to understand why we write software in the first place.
The more we write software, the more tools, technologies and methodologies we get introduced to, the easier it is to forget the sole purpose of the software we build: to help people.
With that in mind, any change we introduce to our software has to achieve one of two goals:
- A change that makes our software more helpful for people.
- A change that enables us to make the software more helpful for people.
Writing code that runs fast serves the people who use our software, in order for our code to run fast, we often have to come up with clever tricks that we add to our code that make it run fast, and clever tricks make our code more difficult to work with.
So we need to find a balance between speed and maintainability. This balance varies a lot depending on our application, who we serve, how fast our code needs to run, and in my opinion, most importantly how often our code runs.
For example, the way developers handle performance at Google is completely different from how we should handle performance for an MVP for a startup.
- When a developer at Google writes a line of code, it has the potential to run billions of times in a matter of days.
- When we write a line of code for a startup, that line of code can run only a few times a day, or never.
So when we're at a startup, it may make sense to overcome the performance issues by paying an extra $10/month for a better machine that executes our code, but it would cost a lot more if developers at Google do that. For Google, spending the time and effort optimizing the code will be cheaper than upgrading hundreds of thousands of machines.
Back in the day, developers used to argue a lot about the following question:
let i = 0;
// Which one is faster? i++? ++i?
i++;
++i;
Which one is faster? i++? ++i?
Here's the answer: It does not matter.
It's really easy to lose our focus and try to optimize bits of code that already run in microseconds. A typical machine will do that operation tens of times in less than 1 millionth of a second.
So presenting a change to the code in order to optimize performance for bits like that will likely conflict with one of the two main goals of any change to the code: Making the code easier to work with, and the difference will be a few microseconds at best.
A typical program spends most of its loading time in its I/O, so instead of trying to optimize a loop that takes 2 microseconds, trying to make it take 1 microsecond, that optimization energy is much better spent trying to optimize a database query, or trying to group HTTP requests together.
We may have the following code in JavaScript that calls two database queries, the second query is only initialized after the first one is finished, even though there is no dependency between them:
const user = await User.findOne({ id: 1 });
const orders = await Order.find({ userId: 1 });
If we run the second query without having to wait for the first query to finish, we can save ~100 milliseconds:
const [user, orders] = await Promise.all([
User.findOne({ id: 1 }),
Order.find({ userId: 1 })
]);
With just a couple of lines of code, we saved 100 milliseconds.
Notice how we're putting the same effort of modifying a couple of lines of code, but in the first change the result is saving a few microseconds, and in the second change we're saving 100 milliseconds (100,000x of the first change).
Another example would be executing a database query in a for loop:
const usersIds = [1, 2, 3, 4, 5, 6, 7];
for (const userId of usersIds) {
const user = await User.findOne({ id: userId });
// do something with the user
}
This code takes an array of users ids, and for each id, it queries the database to fetch that user, in our example that's sending 7 calls, resulting in 7 network round trips, which is very expensive.
If we change the code to find all the users with one query, then find the matching id in memory, we just saved 6 network round trips, potentially 600 milliseconds here.
const usersIds = [1, 2, 3, 4, 5, 6, 7];
const users = await User.find({ id: { $in: usersIds } });
for (const userId of usersIds) {
const user = users.find(user => user.id === userId);
// do something with the user
}
Now, it's really common when we look at this code we'd notice that we loop over the array again inside the top-level-loop, causing an O(n^2) time complexity, and we'd want to convert it to a hash map and use it instead, resulting in an O(n) time complexity for the top-level-loop.
const usersIds = [1, 2, 3, 4, 5, 6, 7];
const users = await User.find({ id: { $in: usersIds } });
const usersMap = {};
for (const user of users) {
usersMap[user.id] = user;
}
for (const userId of usersIds) {
const user = usersMap[user.id];
// do something with the user
}
But when we look at the facts: looping through an array item takes 1 microsecond (1 millionth of a second), so if my usersIds
array has 10 items on average, O(n^2) would take 0.1 millisecond.
There is room for improvement, but I'd rather spend my time improving the performance in places that are worth it, and also notice how we're writing more code to squeeze those last bits of machine performance, which would make it difficult for developers to understand this code, costing us their expensive time. It's just not worth it, one could even argue that it's degrading the code.
- We write software to help people, and as long as people are happy with the software, and we can deliver it in a timely manner we're doing a good job.
- As important as helping people, enabling ourselves to help people.
- Most of the software performance bottlenecks lie in I/O where you read from an external source (HTTP/Database), or from the disk, focus your energy on optimizing those.
- When a software engineer at Google writes a line of code, it will run billions of times more than the average developer does. Therefore the average developer should not write their code with the same standards.
Should we all follow the standards set by big companies? Or should we come up with our own?
Let me know what you think in the comments below.
I'm a software engineer who's obsessed with automating things and helping people out by using technology.
26