Generators over arrays

I love ❤️ generators in PHP. They are like supercharged arrays that can preserve memory when used correctly. I've been using iterable instead of array type-hinting ever since I learned about them.

Generators are callback iterators

Generators are simple functions. But where a regular function will return a single value or even void, a generator can return multiple results. The only thing you have to do to change a function into a generator is to replace return with yield and call it.

A generator is an iterable, meaning you have to loop over them in order to retrieve the results. You can simply foreach over a generator, and it will return every yield it encounters.

function exampleGenerator() {
  yield 1;
  yield 2;
  yield 3;
}

$generator = exampleGenerator();
foreach ($generator as $value) {
  echo $value;
}
// will echo out: 123

Notice that we actually call the function to return the generator. In this example it's pretty obvious we need to do that, but consider an anonymous function that is stored in the $generator variable. You might accidentally try to iterate over that.

$generator = function() {
  yield 1;
  yield 2;
  yield 3;
};

// Incorrect: $generator is now an uncalled function.
foreach($generator as $value) // ...

// Correct: $generator() is now a `Generator` object.
foreach($generator() as $value) // ...

Advantages of generators over arrays

While creating a function that yields 1,2,3 is very impressive; it's not really practical. So let's look at some reasons why you might consider using generators.

They are called when you start iterating

This might not seem like a big deal, but it actually is. Consider you have a ChoiceField-object that has array $options, and you have to retrieve the options from a database. When the field is rendered, it obviously needs to show those options. But when those options aren't rendered in that request, the database call will still be performed to instantiate the field.

When you change array $options into iterable $options and provide the options via a generator, the database call will only ever be executed if you foreach over those options.

$options = function() {
  foreach(DB::query('retrieve the options') as $option) {
    yield $option;
  }
};

$field = new ChoiceField($options());

So calling the function only returns the generator, but it will not execute until you start iterating.

Tip: If you already have an iterable result set, like an array or any other iterable, you can use yield from $resuls. This will in essence foreach over all the results and yield every one of them.

// Use `yield from` instead of looping the results.
$options = (function() {
    yield from DB::query('retrieve the options');
})(); // Notice we called the function directly to return the generator.

// Or shorthand
$options = (fn() => yield from DB::query('retrieve the options'))();

They preserve memory

Besides not preforming any task without iterating, a generator only yields one result at a time, meaning it only has a single reference in memory at all times.

$options = (function() { 
    $results = DB::query('retrieve the options'); 
    foreach($results as $result) {
        // This way there is only one `Option` in memory at all times.
        yield Option::createFromResult($result);
    }
    unset($results);
})();

In this example we retrieve a simple result set from a database query. Only when we yield the result, we build up the Option model that represents that result. This saves a lot of memory

Code can be executed after returning the results

You might have noticed that we casually called unset($results) after we returned the results. This is because the generator will keep going until it no longer yields any results, unlike a return statement where the function will end immediately. That's pretty awesome. This way you can even clean up some left over memory consumption after your generator finishes.

Keys can be reused

When you yield a result, there is an implicit numeric 0-based key iterating the result. You can however yield both a key and a value by adding the => arrow.

// Without keys.
function fruits() {
  yield 'apple';
  yield 'banana';
  yield 'peach';
}

foreach(fruits() as $key => $fruit) ... // Here key will be 0, 1, 2

// With keys.
function fruits() {
  yield 'zero' => 'apple';
  yield 'one' => 'banana';
  yield 'two' => 'peach';
  yield 'two' => 'lime';
}

foreach(fruits() as $key => $fruit) // Here $key will be 'zero',' one', 'two', 'two'

Noticed how we returned the same key twice? Unlike an array, this is no problem during the iteration. However, if you were to change the generator back into an array, by using iterator_to_array() the key would be there only once, holding the last result for that key.

Things to consider when using generators over arrays

While generators behave very similar to arrays, they are not of the array type. This means you can run into these caveats.

Array functions will not work with generators

PHP's array_ functions all require an actual array. So you cannot for example simply call array_map() with your generator. To remedy this, you can use iterator_to_array() to turn your generator into an array. This will however reintroduce the memory usage of arrays.

Tip: You might use iterator_apply to preform a callback on the yielded result, but this is not recommended as this function does not return an iterator itself or any of the results. It only performs a callback for every iteration, but the callback doesn't receive the result. You have to provide the iterator as an argument, and you can then retrieve the current() iteration. It's not worth it.

The count of a generator is not predefined

Since we can yield as many results as we want, and the generator only has one reference in memory at a time, it's not possible to count the results without traversing them. To ease this process you can use iterator_count(). This will loop over every result and return the actual count.

A Generator instance can only be traversed once

When a generator finishes, it closes itself. Once this happens, you can't traverse it again. When you try to do so, you will run into this exception: Cannot traverse an already closed generator.

A solution to this could be to call the generator function again. However, you should probably refactor your code to prevent this.

Note: iterator_count() also closes the iterator, so you can't do a count and then loop. You should probably just keep a record of the count while iterating.

In conclusion

Obviously arrays have their time and place. I'd never use a generator to create a simple list. But whenever I'm working with objects or entity models, I'd like to use them to limit the memory usage.

Learned anything new? Don't keep it to yourself, but share it on social media! And if you have any questions or remarks let me know via twitter.

22