Heaps explained in PHP | Quick access to the highest or lowest value of a dataset

In this article we'll explore the world of Heaps, the data structure. Because some knowledge of Trees is useful I highly suggest you read my Trees and Tree Traversal in PHP article before diving into this one.

I did my best to provide some visual aid for Heaps, because it can be hard to visualize some steps. If you have trouble understanding the concept through this blog post, I suggest you also watch the videos linked at the end of the post.

What is a Heap?

A Heap is a data structure. So it is a way of how data is organized and how it can be accessed in an efficient way. There are many types of data structures, like: (Doubly) Linked Lists, Graphs, Stacks, Queues, Arrays and HashMaps. Each of these data types can be used for various use cases; but some are more performant than others in certain situations.

🎄 A Heap is a Binary Tree

The data structure of a Heap is a Binary Tree. Starting with a Root-node, every node has a maximum of two children; left & right. But there are two rules these trees have to follow.

Heap rule #1: The parent node has a lower value (MinHeap) or a higher value (MaxHeap) than its children

There are two types of heaps: a MinHeap and a MaxHeap. The difference between these are the order in which the nodes are placed inside the Tree.

The children of a node in a MinHeap have a higher value then their parent, while the children of a node in a MaxHeap will have a lower value then their parent. If there are two identical values, a node can have a child with the same value. This behavior flows all the way down the Tree.

Note: The naming of the heaps refers to the value of the top, or Root-node, of the Tree. This means the top of a MinHeap will always be the lowest value (min) of the nodes, and in a MaxHeap it will be the highest value (max).

Heap rule #2: A Heap is a Complete Binary Tree

There are 5 types of Binary Trees:

  1. Full Binary Trees: every node has zero Ăłr two children; but never one
  2. Perfect Binary Trees: every node has exactly two children
  3. Balanced Binary Trees: the left and right sub-branch of a node never vary by more than one node
  4. Degenerate Binary Trees: every node has a single child, essentially making it like a single line list
  5. Complete Binary Tree: every row of the Tree is filled from left to right, without leaving gaps between nodes.

So in a Complete Binary Tree, and thus also a Heap, it is not possible to have a node with only a right child-node. But it Ă­s possible to have one with only a left child node.

🔨 Creating a Heap

Enough theory; let's look at an example of a Heap.

Note: In this example we'll use numbers (integers) that represent the values. But that's mostly for simplicity's sake. These values can be strings, objects, whatever you want; as long as you can compare them and figure out which is the lesser or greater of the two.

Let's say we have a list of values. I'll put it in a PHP array to make it look pretty:

$values = [2, 18, 36, 5, 2, 0, -5, 72];

We'll make this list into a MaxHeap. Remember; in this case the highest value goes at the Root-node, making the values decrease every level of the tree. So in this case our Root-node will be 72.

This root-node will have two children. Those values will be 36 and 18, because those the next highest two values. The position of these values doesn't really matter at this point, they can both be either left or right.

Both of these values can have 2 children; so lets find the next 4 values in declining order from the list: 5, 2,2 and 0. Again there exact position doesn't matter, because they are all lower then 18 and 36.

And our final remaining number -5 will be the last child; the most left value on the lowest level of the tree. This makes our Complete Binary Tree complete.

So this is the MaxHeap we created:

The last node

The node that is the most right node (filled from the left) at the bottom of a Complete Binary Tree is called the last node. In our example this is -5. We'll come back to this node when we start extracting nodes.

At this point you might be thinking: this feels like cheating. And yes, I know; we cheated a bit by sorting the integers in our head from largest to smallest first, and then filling out the Heap from left to right. But that is actually the fastest way of creating a MaxHeap. However, what if we didn't know the values up front, or if they came in a random order? How would we fill out this MaxHeap?

Adding values to an empty Heap

So now let's create a Heap without sorting the numbers beforehand. In this case it would be a more iterative process:

  1. We add the first value to the heap (in our case 2). This will become our root-node by default.
  2. We insert the second value (18) at the first available location (left child).
  3. We compare this value to its parent (the Root-node in this case). If it is bigger that the parent, swap them; otherwise go back to step 2 for the next value. (we need to swap)
  4. Repeat step 3 until there are no other parents to swap with (there are no other parents)

After this we go back to step 2 and insert the next value: 36 (In our case the right child of the Root-node). We compare it to its parent. It is bigger, so we swap. 36 is now the root with 2 and 18 as its children.

Note: This swapping of values with a parent to put the values in the correct spot is referred to as sifting up.

The next value to add will be 5. Again, we insert it at the first available location in the Tree (step 2). In this case the left child of 2. We compare it to its parent and swap (step 3), because 5 is bigger than 2. We repeat step 3, but this parent (36) is bigger, so we are done. Next!

Boring! - Ok, let's stop this explanation here. I think you get the point. If you really want a more in-depth visualization; I've added some useful links at the end of this article including a video explanation.

Extracting a value from a Heap

We've seen how you can add a value to the Heap By injecting it, and then sifting it up. But how can you extract a value? It isn't as simple as removing the node, because that might cut the Tree in half. Take the Root-node for instance. On a Heap it's very likely you want to extract that value. But simply removing it will create two new Trees.

To avoid this Tree splitting, we need to replace (or swap) the Root-node with the last node. In a heap the last node can always be removed from a Tree, without corrupting it, because the Tree is already sorted.

However, when we swap the Root-node with the last node, and extract it, the Tree will no longer be a Heap at that point, because the wrong value will be at the top. So we need to turn this Tree into a Heap again, starting with the Root-node. This process is not as lengthy as turning an entire unsorted Tree into a Heap, because most of the Tree is already in the correct order.

Note: The transforming of a Binary Tree into a Heap, is known as to Heapify.

Instead of sifting the Root-node up, we need to sift the Root-node down. In this case we need to compare the node to both its children, and swap it with the largest of the two. And keep repeating it, until the node is in its correct position. At this point the Tree is once again a Heap.

Creating a Heap in PHP

Creating a Heap is made easy in PHP by using the abstract SplHeap class. It contains all methods of an Iterator as well as a few helper methods that are specific to a Heap:

  • extract() - Removes and returns the Root-node from the Heap, and reorders the Heap with a new Root node.
  • insert($value) - Adds a new value to the Heap and reorders it.
  • top() - Only returns the current Root node value; it does not change the cursor of the iterator or remove the node.
  • isCorrupted() - Returns whether the current Heap is in a corrupted state (this happens when the compare() function throws an exception).
  • recoverFromCorruption() - resets the corrupted state of the heap and allows for further use.

And then there is also abstract protected function compare($value1, $value2): int. This function is used inside the Heap algorithm to determine how it should order two values / nodes.

PHP also provides a SplMinHeap and a SplMaxHeap class that are concrete implementations of the SplHeap. These classes have an implemented compare() method. Both classes essentially use the spaceship operator to compare the values.

protected function compare($value1, $value2): int {
    // SplMaxHeap
    return $value1 <=> $value2;

    // SplMinHeap
    return $value2 <=> $value1;
}

Knowing all this we can create a MaxHeap from our $values array like this:

$heap = new \SplMaxHeap();
foreach($values as $value) {
    $heap->insert($value);
}

We saw that SplHeap is also an Iterator, so we can foreach over the Heap and have it yield its values in a decreasing order.

foreach($heap as $value){
    var_dump($value);
}
// int(72), int(36), int(18), int(5), int(2), int(2), int(0), int(-5)

Note: Iterating over the Heap will essentially call extract() for every value. This means that those values are gone from the heap. If you call ::rewind() on the Heap, this will not return those values. Using ::current() or ::top() will return the current top value without removing it. When you call ::next() however, this will again extract the current value.

Creating a custom MaxHeap in PHP

When using a MaxHeap to sort a list of objects it is possible the value of those objects is calculated through a function. In that case the regular SplMaxHeap will (probably) not work, but you can create your own MaxHeap by extending splHeap.

Imagine having a shop with product represented by a Product class. It can store a physical weight in different units of measurements; like pounds (lbs) and grams (g). To put these in a MaxHeap, it needs to be able to compare those different types of measurements. A (very simple) implementation could look like this:

class Product
{
    public function __construct(public $name, public string $weight_type, public int $weight_amount) {}
}

class ProductMaxHeap extends \SplHeap
{
    protected function compare($value1, $value2): int
    {
        return $this->asGrams($value1) <=> $this->asGrams($value2);
    }

    private function asGrams(Product $product): int|float
    {
        if ($product->weight_type === 'grams') {
            return $product->weight_amount;
        }

        if ($product->weight_type === 'pounds') {
            return $product->weight_amount * 453.59237;
        }

        throw new InvalidArgumentException('Product has unknown weight type.');
    }
}

In this example we convert the weight amount of pounds into grams in order to compare the values accordingly.

Tip: Whenever dealing with types like these, make sure to use constants that represent that type (like public const WEIGHT_TYPE_GRAMS = 'grams';) or use Enums when using PHP 8.1 or higher. This provides autocompletion in IDE's and prevents typo's like gram instead of grams.

Array Index Algorithm for Heaps

Remember how Heaps are Complete Binary Trees? This is actually a very helpful characteristic of a Heap. If we give every node in the tree a 0-based index key, and moved from top to bottom, left to right, we can actually quite simply figure out what the keys of the children of a specific node are. So we can store a heap as an array.

$heap_as_array = [ // Keys added for visual aid
    0 => 72,
    1 = 36,
    2 => 18,
    3 => 5,
    4 => 2,
    5 => 2, 
    6 => 0, 
    7 => -5,
];

Assuming $i represents the current node's key, the algorithm to figure out the left child of that node is: ($i * 2) + 1, and for the right child it is: ($i * 2) + 2.

Let's try that out. We want the values for the left and right child of 36. Its key is 1. So the left key is (1 * 2) + 1 = 3, and the right key is (1 * 2) + 2 = 4. Which are respectively the nodes: 5 and 2. Which in turn matches our Heap.

We can also do the reverse and find out the parent of the current node. The algorithm for that is: (int) ($i -1) / 2. For the right child of 36 (index: 4) that would be: (4 - 1) / 2) = 1.5. The int of 1.5 = 1. And key 1 is indeed the parent node: 36.

Heapify an 0-based array

Because a Complete Binary Tree can be stored as a 0-based array; we can also see any 0-based array as a Complete Binary Tree. How is this helpful? Because it's very easy and efficient to turn any existing Complete Binary Tree into a Heap.

When we want to convert a Complete Binary Tree into a Heap, we only have to sort half of it. Because the sorting is done by swapping two nodes, the other half of the Tree will automatically also end up in their correct position.

Note: Since we are working with a 0-based array, swapping means we can just switch the index of these values.

You can try this out yourself:

  1. Draw a Complete Binary Tree from the provided array 2, 18, 36, 5, 2, 0, -5, 72
  2. Start at the half of the array: 5
  3. Sift it down by swapping it with the largest of its children and repeat this until it has no children to swap with (It only has one child: 72)
  4. Now work your way back by applying step 3 on every element before 5 (So the next will be 36)

If done correctly, you end up with a Tree like this:

72
       /    \
     18       36
    /  \     /  \
   5    2   0   -5
 /
2

As you can see, by only sifting down half the array, the other half of the Heap automatically ended up in a correct position.

🤷 When are Heaps useful?

Because there are many types of data structures, there are also many ways to solve a problem. Sometimes a simple array is all you need. But Heaps have their time to shine.

Direct access to the highest (or lowest) value

A MaxHeap (or MinHeap) has direct access to the highest (or lowest) value of the dataset. So whenever you are working with large datasets for which you need the maximum (or minimum) value; a Heap is a safe bet. As you've seen we only need to sort half the dataset in order to figure out the minimum or maximum value. Which is pretty fast.

Working with a lot of inserts and removals

When inserting a new value into a dataset, a Heap is more efficient because it only performs (relatively) a few comparisons to end up in the right spot. In the worst case scenario an array would need to perform a comparison for every value it has. Because a Heap is a binary tree, the worst case would need exponentially fewer comparisons. Making it a more efficient.

Sorting arrays

An array can be sorted by using a MaxHeap in a process known as HeapSort.
Because a MaxHeap continuously has the highest value at the top, you can extract that value and place it at it to the beginning of an array. By adding the next value, and the next, and the next, at the beginning of the array, it ends up sorted from lowest to highest.

Note: Although this technically qualifies as HeapSort this isn't the most efficient way. We'll cover HeapSort more in-depth in an upcoming post where we'll make it more efficient.

Priority Queues

While also a topic for a future blog post, queues are essentially a First-In First-Out system that adds new values at the end of a list, and extracts values from the start of the list.

A variant on this is where you use a MaxHeap that contains a priority value for a certain object. When adding this object to the Heap, it will end up higher or lower depending on its priority. This type of queue is called a Priority Queue.

Fun fact: Did you know the letters ueue in Queue are not silent, but are actually just waiting their turn? - I'll let myself out; sorry.

Because this is such a frequently used case for MaxHeaps, PHP actually provides a SplPriorityQueue based on a MaxHeap.

Instead of updating your object to contain a priority, you can insert the object with a priority.

$queue = new SplPriorityQueue();

$queue->insert('task 1', 1);
$queue->insert('task 2', 3);
$queue->insert('task 3', 2);

var_dump($queue->extract()); // string(6) "task 2"

And because we only need to partially sort a Heap find the highest priority, this too is a lot more efficient that sorting an array-queue after every insertion.

Real world example

A very nice real world example of a Heap implemented with a PHP array can be found in the revolt/event-loop package.

In this example they implemented a queue of callbacks based on an expiry time. Whenever the expiry time has passed, the callback will be extracted. So this is an example of a Priority Queue, but it's based on a MinHeap instead of a MaxHeap; because the lowest expiry time has to be on top.

Alternative for the Standard PHP Library (SPL)

As u/sproingie pointed out on Reddit The SPL data structures are not great. They are a great jumping of point in getting started with other data structures. But if you want / need more performance you could install the Data Structures extension. While it has a Priority Queue, there is no generic Heap implementation. But you might be able to build this yourself using the DS\Vector class.

đź”— Useful links

Thanks for reading

Like I stated at the beginning of the post; Trees and Heaps are very visual things and not everything is as easily explained with a bunch of text. I do hope after reading this post you understand the gist of it. If you have any questions please leave a comment.

I hope you enjoyed reading this article! If so, please leave a ❤️ or a 🦄 and consider subscribing! I write posts on PHP almost every week. You can also follow me on twitter for more content and the occasional tip. If you want to be the first to read my next blog; consider subscribing to my newsletter.

15