Gently Embracing Different Regular Expression Approaches in Ruby

A Straight-Forward Problem with Three Different Approaches
This morning I was pairing with a fellow developer, and we had a small coding task we needed to complete.
Namely we needed to convert the following duration structures into their respective number of seconds.
  • 1h4m2s :: 3842 seconds
  • 1m :: 60 seconds
  • 1s2m :: 121 seconds
  • 1d :: 0 seconds (we don’t map the day unit to seconds)
  • 10 :: 10 seconds
  • This was a small and concrete task to fix a bug on Forem. The problem was nestled inside of a Ruby object and to test required spinning up that entire context.
    What I proposed during our pairing session was to jump out of the existing code base and poke at the problem without all of that mega-context. I wrote up a gist of the pure
    What that looked like was me creating a plain old Ruby file (e.g., parsing-an-encoded-string.rb), and setting up the conditions that I wanted to test.
    I posted parsing-an-encoded-string.rb up on Github.
    They were as follows:
    [
      # ["1h4m2s", 3842],
      ["1m", 60],
      # ["1s2m", 121],
      # ["1d", 0],
      # ["10", 10],
    ].each do |given, expected|
      # Do the magic
    end
    The above logic was my test cases, I commented out all but one of them. And from there, we began exploring the problem. When we thought we had it right, I ran ruby parsing-an-encoded-string.rb to see if things were working.
    We quickly iterated on three different solutions. Below are the three approaches, that each used different degrees of regular expressions and Ruby-isms.
    Diving into the Code
    Defining the Unit Map
    The TIME_UNIT_TO_SECONDS_MAP hash provides a lookup for a given unit and how many seconds one of those units represents.
    TIME_UNIT_TO_SECONDS_MAP = {
      "h" => 60 * 60, # seconds in an hour
      "m" => 60, # seconds in a minute
      "s" => 1 # seconds in a second
    }
    String Splitting with Simple Regexp
    The convert_to_seconds_via_string_splitting method does some simple string splitting into two separate arrays and then walks those arrays together to calculate the number of seconds.
    def convert_to_seconds_via_string_splitting(input)
      # Split on one or more numerals
      #
      # Because of the structure, the first element should always be an
      # empty string. We may want to guard better. On the other hand,
      # if the user's giving us junk, whatever.
      units = input.split(/\d+/)[1..-1]
    
      return input.to_i unless units
    
      # Split on a single alpha character
      times = input.split(/[a-z]/)
    
      seconds = 0
    
      units.each_with_index do |unit, i|
        seconds += TIME_UNIT_TO_SECONDS_MAP.fetch(unit, 0) * times[i].to_i
      end
    
      return seconds
    end
    Large Regular Expression for the Match
    This approach relied on a regular expression with 9 capture regions. I assigned that regular expression to a constant: THREE_TIMES_AND_UNITS_REGEXP = /\A((\d+)([a-z]))?((\d+)([a-z]))?((\d+)([a-z]))?\Z/
    The nine capture regions are as follows:
  • 1,4,7 :: The scalar and it’s unit (e.g., “10s”, “1h”)
  • 2,5,8 :: The scalar (e.g., “10”, “1”)
  • 3,6,9 :: The unit (e.g., “s”, “h”)
  • And the convert_to_seconds_via_verbose_regexp method handles those capture regions.
    THREE_TIMES_AND_UNITS_REGEXP =
      /\A((\d+)([a-z]))?((\d+)([a-z]))?((\d+)([a-z]))?\Z/
    
    def convert_to_seconds_via_verbose_regexp(input)
      match = THREE_TIMES_AND_UNITS_REGEXP.match(input)
      seconds = 0
      return input.to_i unless match
    
      seconds += match[2].to_i *
        TIME_UNIT_TO_SECONDS_MAP.fetch(match[3], 0) if match[1]
      seconds += match[5].to_i *
        TIME_UNIT_TO_SECONDS_MAP.fetch(match[6], 0) if match[4]
      seconds += match[8].to_i *
        TIME_UNIT_TO_SECONDS_MAP.fetch(match[9], 0) if match[7]
      seconds
    end
    Regular Expression and the String scan method
    In this implementation, the regular express is much simpler. I assigned that regular expression to a constant: TIME_AND_UNIT_REGEXP = /(\d+)([a-z])/
    There are two capture regions:
  • 1 :: The scalar (e.g., “10”, “1”)
  • 2 :: The unit (e.g., “s”, “h”)
  • And the convert_to_seconds_via_regexp_scanner method handles those capture regions.
    TIME_AND_UNIT_REGEXP = /(\d+)([a-z])/
    
    def convert_to_seconds_via_regexp_scanner(input)
      seconds = 0
      matched = false
    
      input.scan(TIME_AND_UNIT_REGEXP) do |time, unit|
        matched = true
        seconds += time.to_i *
          TIME_UNIT_TO_SECONDS_MAP.fetch(unit, 0)
      end
      return seconds if matched
      input.to_i
    end
    The Tests that Guide Me
    Below are the "tests" that I wrote to quickly affirm that things were working.
    [
      ["1h4m2s", 3842],
      ["1m", 60],
      ["1s2m", 121],
      ["1d", 0],
      ["10", 10],
    ].each do |given, expected|
      puts "Given: #{given}\tExpected: #{expected}"
      [
        :convert_to_seconds_via_string_splitting,
        :convert_to_seconds_via_verbose_regexp,
        :convert_to_seconds_via_regexp_scanner,
      ].each do |method|
        returned_value = __send__(method, given)
        if returned_value == expected
          puts "\tSuccess for #{method}."
        else
          puts "\tFailure for #{method}.  Got: #{returned_value}"
    
        end
      end
    end
    Conclusion
    Each of the three methods get the desired results. And each demonstrates different ways to approach a similar problem.
    For further work, I could refine the regular expressions to only key on the units of times defined in TIME_UNIT_TO_SECONDS_MAP. And there are, I’m certain, many other ways to approach solving this particular thing.

    39

    This website collects cookies to deliver better user experience

    Gently Embracing Different Regular Expression Approaches in Ruby