25
Understanding Ruby - For vs Each
For those coming from other languages with for
loops the concept of each
, anonymous functions, blocks, and all of those new terms feels very foreign.
Why is it that Ruby doesn't use for
loops? Well we're going to cover that one today.
Foundational
Some knowledge required of functions in Ruby. This post focuses on foundational and fundamental knowledge for Ruby programmers.
None
Suggested to read Understanding Ruby - Blocks, Procs, and Lambdas after this article if you haven't already.
Let's start with an acknowledgement: Ruby does have a for
loop:
for item in [1, 2, 3]
puts item + 1
end
# 2
# 3
# 4
...but you're not going to see it in common use. You're going to see each
far more frequently.
each
in Ruby is the de facto way of iterating through a collection:
[1, 2, 3].each do |item|
puts item + 1
end
# 2
# 3
# 4
There are a few things here which may not be familiar, which are covered in more detail in that article mentioned above, but for now the important parts:
do ... end
starts what we call a block function, or anonymous function in other languages, and |item|
represents the arguments to that function. For each iteration of the loop each item will be fed into that function as an argument.
Ruby uses blocks heavily, and you'll find them commonly both in reading and writing code in the language. Their usage goes far beyond each
, and we'll mention that in a bit, but first let's cover a few concerns about for
in Ruby.
There are more than a few legitimate issues with for
loops in Ruby, and we'll quickly cover a few of them.
Yep. The for
loop in Ruby is using each
behind the scenes, so even if you're not using it you're still using it. That also means that it's slower:
require 'benchmark/ips'
# => true
collection = (1..100).to_a
# => [1, 2, 3, 4, 5, 6, 7, 8, 9, ...
Benchmark.ips do |bench|
bench.report("for loop") do
sum = 0
for item in collection
sum += item
end
sum
end
bench.report("each loop") do
sum = 0
collection.each do |item|
sum += item
end
sum
end
end
# Warming up --------------------------------------
# for loop 22.017k i/100ms
# each loop 23.543k i/100ms
# Calculating -------------------------------------
# for loop 218.466k (± 2.6%) i/s - 1.101M in 5.042495s
# each loop 231.274k (± 2.1%) i/s - 1.177M in 5.092110s
Granted this is not a significant difference, but it is something to keep in mind.
for
loops leak variables into their outer scope:
for item in collection
sum ||= 0
sum += item
end
item
# => 100
sum
# => 5050
That means if the code around it has an item
it'll be overwritten. Same with sum
. Contrast with each
here:
collection.each do |item2|
sum2 ||= 0
sum2 += item2
end
item2
# => nil
sum2
# NameError (undefined local variable or method `sum2' for main:Object)
We'll get into that in a moment, but for this moment know that block functions are isolated in that outside code cannot see inside of them, but they can certainly see outside code around them.
So why would one want to use anonymous functions, each
, and related methods in Ruby rather than a for
loop? This section will look into that.
Going back to the above section, let's clarify what we mean by what the function can "see" or "not see".
A block function is what's called a closure, meaning it captures the outside context (think variables) inside the function, but the outside code cannot see inside, hence sum2
being undefined here. Believe it or not that's quite useful later on, but has been known as a stumbling block to some.
Consider this code:
sum = 0
[1, 2, 3].each do |item|
sum += item
end
sum
# => 6
We can "see" sum
as it's in the context of the block function, or what's immediately around it when it runs. This can be really useful for more advanced code, as that means functions effectively have memory, and in Ruby you can even redefine where it finds its memory by changing its context, but that's considerably more advanced.
The outside code, however, cannot see item
as it's only visible inside the block function. This can present some headaches, and early on in my Ruby career this confused me to no end:
require 'net/ssh'
# Don't actually use passwords if you do this, use keys
Net::SSH.start('hostname', 'username', password: 'password') do |ssh|
config = ssh.exec! "cat /tmp/running.cfg"
end
defined?(config)
# => nil
For those cases I used global variables back then, which I would not recommend, instead prefer this pattern:
config = nil
# Don't actually use passwords if you do this, use keys
Net::SSH.start('hostname', 'username', password: 'password') do |ssh|
config = ssh.exec! "cat /tmp/running.cfg"
end
defined?(config)
# => local-variable
...or if you read the Net::SSH
docs you might find that the block isn't even entirely necessary for this and get around the issue entirely. Anyways, point being there are some traps there potentially for the unaware, so be careful on what isolated block function scopes mean.
Ruby has a collections library called Enumerable
which is one of the most powerful features of the language.
Let's say I wanted to get the sum of every even number greater than 4 in a collection, but double them as well. With a for loop that might look like this:
sum = 0
for item in 1..100
sum += item * 2 if item > 4 && item.even?
end
sum
# => 5088
Using Enumerable
we can express each one of those conditions as a distinct transformation or filtering of the list:
(1..100).select { |v| v.even? && v > 4 }.map { |v| v * 2 }.sum
# => 5088
It gives us more flexibility in expressing multiple actions we want to take against a collection as distinct pieces rather than combining them all as one.
Some of those, you'll find, can be exceptionally useful beyond the trivial, like a count of what letters words start with in some text:
words = %w(the rain in spain stays mainly on the plane)
words.map { |w| w[0] }.tally
# => {"t"=>2, "r"=>1, "i"=>1, "s"=>2, "m"=>1, "o"=>1, "p"=>1}
...or grouping a collection:
words.group_by { |w| w.size }
# => {3=>["the", "the"], 4=>["rain"], 2=>["in", "on"], 5=>["spain", "stays", "plane"], 6=>["mainly"]}
The flexibility there is really something, and because these can all be chained together you can easily break them out into separate functions and refactor out entire parts of the chain altogether if you need to.
Now there are a lot of things I could get into on where this can go and the implications, but as this is a more beginner friendly article that would not be very kind, so we'll instead hint at a few of them:
- Block functions can have their entire context changed
- A lot of Enumerable-like functions can be parallelizeable as they're functionally pure
- Closures keep context, meaning you have memory to do some real fun things
- Many Ruby classes, including your own, can be coerced into functions
- A significant number of programming patterns are made much easier by the presence of functions
...and a lot more than I have time for in this particular article, but I would highly encourage you to read into the more advanced article on the types of functions in Ruby:
This article is a very high level overview, and does definitely wave over some details I could get into. Be sure to read other parts of the series if you want to get more into the weeds on this, as there's a lot of fascinating detail there.
The intent of this article is for those coming from languages which primarily use for
loops rather than iterables or enumerable, depending on the way you describe them. That said, most all languages including Java have a Streaming type library which does something very close to this.
If you really want to get into the power of block functions and why that's significant be sure to watch out for future posts on functional programming, but until then that's all I have for today.
25