29
Gently Embracing Different Regular Expression Approaches in Ruby
This morning I was pairing with a fellow developer, and we had a small coding task we needed to complete.
Namely we needed to convert the following duration structures into their respective number of seconds.
- 1h4m2s :: 3842 seconds
- 1m :: 60 seconds
- 1s2m :: 121 seconds
- 1d :: 0 seconds (we don’t map the day unit to seconds)
- 10 :: 10 seconds
This was a small and concrete task to fix a bug on Forem. The problem was nestled inside of a Ruby object and to test required spinning up that entire context.
What I proposed during our pairing session was to jump out of the existing code base and poke at the problem without all of that mega-context. I wrote up a gist of the pure
What that looked like was me creating a plain old Ruby file (e.g., parsing-an-encoded-string.rb
), and setting up the conditions that I wanted to test.
I posted parsing-an-encoded-string.rb up on Github.
They were as follows:
[
# ["1h4m2s", 3842],
["1m", 60],
# ["1s2m", 121],
# ["1d", 0],
# ["10", 10],
].each do |given, expected|
# Do the magic
end
The above logic was my test cases, I commented out all but one of them. And from there, we began exploring the problem. When we thought we had it right, I ran ruby parsing-an-encoded-string.rb
to see if things were working.
We quickly iterated on three different solutions. Below are the three approaches, that each used different degrees of regular expressions and Ruby-isms.
The TIME_UNIT_TO_SECONDS_MAP
hash provides a lookup for a given unit and how many seconds one of those units represents.
TIME_UNIT_TO_SECONDS_MAP = {
"h" => 60 * 60, # seconds in an hour
"m" => 60, # seconds in a minute
"s" => 1 # seconds in a second
}
The convert_to_seconds_via_string_splitting
method does some simple string splitting into two separate arrays and then walks those arrays together to calculate the number of seconds.
def convert_to_seconds_via_string_splitting(input)
# Split on one or more numerals
#
# Because of the structure, the first element should always be an
# empty string. We may want to guard better. On the other hand,
# if the user's giving us junk, whatever.
units = input.split(/\d+/)[1..-1]
return input.to_i unless units
# Split on a single alpha character
times = input.split(/[a-z]/)
seconds = 0
units.each_with_index do |unit, i|
seconds += TIME_UNIT_TO_SECONDS_MAP.fetch(unit, 0) * times[i].to_i
end
return seconds
end
This approach relied on a regular expression with 9 capture regions. I assigned that regular expression to a constant: THREE_TIMES_AND_UNITS_REGEXP = /\A((\d+)([a-z]))?((\d+)([a-z]))?((\d+)([a-z]))?\Z/
The nine capture regions are as follows:
- 1,4,7 :: The scalar and it’s unit (e.g., “10s”, “1h”)
- 2,5,8 :: The scalar (e.g., “10”, “1”)
- 3,6,9 :: The unit (e.g., “s”, “h”)
And the convert_to_seconds_via_verbose_regexp
method handles those capture regions.
THREE_TIMES_AND_UNITS_REGEXP =
/\A((\d+)([a-z]))?((\d+)([a-z]))?((\d+)([a-z]))?\Z/
def convert_to_seconds_via_verbose_regexp(input)
match = THREE_TIMES_AND_UNITS_REGEXP.match(input)
seconds = 0
return input.to_i unless match
seconds += match[2].to_i *
TIME_UNIT_TO_SECONDS_MAP.fetch(match[3], 0) if match[1]
seconds += match[5].to_i *
TIME_UNIT_TO_SECONDS_MAP.fetch(match[6], 0) if match[4]
seconds += match[8].to_i *
TIME_UNIT_TO_SECONDS_MAP.fetch(match[9], 0) if match[7]
seconds
end
In this implementation, the regular express is much simpler. I assigned that regular expression to a constant: TIME_AND_UNIT_REGEXP = /(\d+)([a-z])/
There are two capture regions:
- 1 :: The scalar (e.g., “10”, “1”)
- 2 :: The unit (e.g., “s”, “h”)
And the convert_to_seconds_via_regexp_scanner
method handles those capture regions.
TIME_AND_UNIT_REGEXP = /(\d+)([a-z])/
def convert_to_seconds_via_regexp_scanner(input)
seconds = 0
matched = false
input.scan(TIME_AND_UNIT_REGEXP) do |time, unit|
matched = true
seconds += time.to_i *
TIME_UNIT_TO_SECONDS_MAP.fetch(unit, 0)
end
return seconds if matched
input.to_i
end
Below are the "tests" that I wrote to quickly affirm that things were working.
[
["1h4m2s", 3842],
["1m", 60],
["1s2m", 121],
["1d", 0],
["10", 10],
].each do |given, expected|
puts "Given: #{given}\tExpected: #{expected}"
[
:convert_to_seconds_via_string_splitting,
:convert_to_seconds_via_verbose_regexp,
:convert_to_seconds_via_regexp_scanner,
].each do |method|
returned_value = __send__(method, given)
if returned_value == expected
puts "\tSuccess for #{method}."
else
puts "\tFailure for #{method}. Got: #{returned_value}"
end
end
end
Each of the three methods get the desired results. And each demonstrates different ways to approach a similar problem.
For further work, I could refine the regular expressions to only key on the units of times defined in TIME_UNIT_TO_SECONDS_MAP
. And there are, I’m certain, many other ways to approach solving this particular thing.
29