Gently Embracing Different Regular Expression Approaches in Ruby

A Straight-Forward Problem with Three Different Approaches

This morning I was pairing with a fellow developer, and we had a small coding task we needed to complete.

Namely we needed to convert the following duration structures into their respective number of seconds.

1h4m2s :: 3842 seconds
1m :: 60 seconds
1s2m :: 121 seconds
1d :: 0 seconds (we don’t map the day unit to seconds)
10 :: 10 seconds

This was a small and concrete task to fix a bug on Forem. The problem was nestled inside of a Ruby object and to test required spinning up that entire context.

What I proposed during our pairing session was to jump out of the existing code base and poke at the problem without all of that mega-context. I wrote up a gist of the pure

What that looked like was me creating a plain old Ruby file (e.g., parsing-an-encoded-string.rb), and setting up the conditions that I wanted to test.

I posted parsing-an-encoded-string.rb up on Github.

They were as follows:

[
  # ["1h4m2s", 3842],
  ["1m", 60],
  # ["1s2m", 121],
  # ["1d", 0],
  # ["10", 10],
].each do |given, expected|
  # Do the magic
end

The above logic was my test cases, I commented out all but one of them. And from there, we began exploring the problem. When we thought we had it right, I ran ruby parsing-an-encoded-string.rb to see if things were working.

We quickly iterated on three different solutions. Below are the three approaches, that each used different degrees of regular expressions and Ruby-isms.

Diving into the Code

Defining the Unit Map

The TIME_UNIT_TO_SECONDS_MAP hash provides a lookup for a given unit and how many seconds one of those units represents.

TIME_UNIT_TO_SECONDS_MAP = {
  "h" => 60 * 60, # seconds in an hour
  "m" => 60, # seconds in a minute
  "s" => 1 # seconds in a second
}

String Splitting with Simple Regexp

The convert_to_seconds_via_string_splitting method does some simple string splitting into two separate arrays and then walks those arrays together to calculate the number of seconds.

def convert_to_seconds_via_string_splitting(input)
  # Split on one or more numerals
  #
  # Because of the structure, the first element should always be an
  # empty string. We may want to guard better. On the other hand,
  # if the user's giving us junk, whatever.
  units = input.split(/\d+/)[1..-1]

  return input.to_i unless units

  # Split on a single alpha character
  times = input.split(/[a-z]/)

  seconds = 0

  units.each_with_index do |unit, i|
    seconds += TIME_UNIT_TO_SECONDS_MAP.fetch(unit, 0) * times[i].to_i
  end

  return seconds
end

Large Regular Expression for the Match

This approach relied on a regular expression with 9 capture regions. I assigned that regular expression to a constant: THREE_TIMES_AND_UNITS_REGEXP = /\A((\d+)([a-z]))?((\d+)([a-z]))?((\d+)([a-z]))?\Z/

The nine capture regions are as follows:

1,4,7 :: The scalar and it’s unit (e.g., “10s”, “1h”)
2,5,8 :: The scalar (e.g., “10”, “1”)
3,6,9 :: The unit (e.g., “s”, “h”)

And the convert_to_seconds_via_verbose_regexp method handles those capture regions.

THREE_TIMES_AND_UNITS_REGEXP =
  /\A((\d+)([a-z]))?((\d+)([a-z]))?((\d+)([a-z]))?\Z/

def convert_to_seconds_via_verbose_regexp(input)
  match = THREE_TIMES_AND_UNITS_REGEXP.match(input)
  seconds = 0
  return input.to_i unless match

  seconds += match[2].to_i *
    TIME_UNIT_TO_SECONDS_MAP.fetch(match[3], 0) if match[1]
  seconds += match[5].to_i *
    TIME_UNIT_TO_SECONDS_MAP.fetch(match[6], 0) if match[4]
  seconds += match[8].to_i *
    TIME_UNIT_TO_SECONDS_MAP.fetch(match[9], 0) if match[7]
  seconds
end

Regular Expression and the String scan method

In this implementation, the regular express is much simpler. I assigned that regular expression to a constant: TIME_AND_UNIT_REGEXP = /(\d+)([a-z])/

There are two capture regions:

1 :: The scalar (e.g., “10”, “1”)
2 :: The unit (e.g., “s”, “h”)

And the convert_to_seconds_via_regexp_scanner method handles those capture regions.

TIME_AND_UNIT_REGEXP = /(\d+)([a-z])/

def convert_to_seconds_via_regexp_scanner(input)
  seconds = 0
  matched = false

  input.scan(TIME_AND_UNIT_REGEXP) do |time, unit|
    matched = true
    seconds += time.to_i *
      TIME_UNIT_TO_SECONDS_MAP.fetch(unit, 0)
  end
  return seconds if matched
  input.to_i
end

The Tests that Guide Me

Below are the "tests" that I wrote to quickly affirm that things were working.

[
  ["1h4m2s", 3842],
  ["1m", 60],
  ["1s2m", 121],
  ["1d", 0],
  ["10", 10],
].each do |given, expected|
  puts "Given: #{given}\tExpected: #{expected}"
  [
    :convert_to_seconds_via_string_splitting,
    :convert_to_seconds_via_verbose_regexp,
    :convert_to_seconds_via_regexp_scanner,
  ].each do |method|
    returned_value = __send__(method, given)
    if returned_value == expected
      puts "\tSuccess for #{method}."
    else
      puts "\tFailure for #{method}.  Got: #{returned_value}"

    end
  end
end

Conclusion

Each of the three methods get the desired results. And each demonstrates different ways to approach a similar problem.

For further work, I could refine the regular expressions to only key on the units of times defined in TIME_UNIT_TO_SECONDS_MAP. And there are, I’m certain, many other ways to approach solving this particular thing.