Fixing Just One False Positive in Brakeman

A while ago, I came across a Brakeman false positive that I wanted to fix.

For a single false positive, it became a bit of an epic journey.

The code looked something like this:

class Task < ApplicationRecord
  enum status: {
    pending: 0,
    success: 1,
    failed: 3,
  }

  NOT_FAILURES = ['pending', 'success'].freeze
end

class TaskRunner
  def get_failures
    start_time = Date.beginning_of_quarter
    end_time = Date.end_of_quarter

    no_failure_enums = Task.statuses.values_at(*Task::NOT_FAILURES)

    query = <<~QUERY
      SELECT COUNT(*)
      FROM `tasks`
      WHERE `tasks`.`status` NOT IN (#{no_failure_enums.join(',')})
      AND `tasks`.`time_end` BETWEEN #{start_time} AND #{end_time}
    QUERY

    # Line below triggers a SQL injection warning
    Task.connection.select_all(query)
  end
end

(You can imagine in reality the query is a bit more complicated and justifies writing it this way.)

This code results in an SQL injection warning from Brakeman:

Confidence: High
Category: SQL Injection
Check: SQL
Message: Possible SQL injection
Code: Task.connection.select_all("SELECT COUNT(*)\nFROM `filings`\nWHERE `filings`.`status` NOT IN (#{Task.statuses.values_at(*["pending", "success"]).join(",")})\nAND `filings`.`time_end` BETWEEN #{Date.beginning_of_quarter} AND #{Date.end_of_quarter}\n")
File: app/models/task.rb
Line: 25

This is a little hard to read, so let's take a look at a better formatted version of the code Brakeman is complaining about:

Task.connection.select_all(
  "SELECT COUNT(*) \
  FROM `filings` \
  WHERE `filings`.`status` NOT IN (#{Task.statuses.values_at(*["pending", "success"]).join(",")}) \
  AND `filings`.`time_end` BETWEEN #{Date.beginning_of_quarter} \
  AND #{Date.end_of_quarter}"
)

Brakeman is warning about this SQL query because it is using string interpolation to unsafely add in values to the query. If an attacker could control those values, they could modify the SQL run by the database.

In particular, it's warning about

Task.statuses.values_at(*["pending", "success"]).join(",")

(the value in no_failure_enums.join(',')).

However, in this case, we know that Task.statuses is actually a constant - it's defined using enum in the Task class. This code is grabbing the integer values for the given enums and joining them back into a comma-separated string.

So how do we get Brakeman to understand that this value is actually safe?

Splatted Arrays

Let's dive in!

The call we care about:

Task.statuses.values_at(*["pending", "success"]).join(",")

First up is *["pending", "success"]. This code converts an array of strings to individual method arguments (i.e., values_at("pending", "success").

This is pretty easy to handle. In the case where a splatted array is the only argument to a method, we'll simply use the elements of the array as the argument list. (Check out the pull request here)

This gets us to:

Task.statuses.values_at("pending", "success").join(",")

Better!

Hash Values

In this case, values_at is Hash#values_at - it returns an array of values from the hash table for the given keys.

This is also not too difficult to implement. I went ahead and covered Hash#values at the same time. (Check out the pull request here)

Brakeman will now do something like this:

h = { a: 1, b: 2, c: 3 }
h.values_at(:a, :c)  #=> [1, 3]

Great! Back to our code sample, how does it look now?

Task.statuses.values_at("pending", "success").join(",")

Oh... it looks exactly the same because Brakeman has no idea what Task.statuses is.

Okay, no problem. We just need to implement support for ActiveRecord's enum!

Detour!

Here I took a little detour. Task.statuses is a method that returns a hash value. Instead of implementing a special case for that, I thought this would be a good time to support methods with single, simple return values.

For example:

class Dog
  def self.sound
    'bark'
  end
end

If Brakeman could know that Dog.sound returns 'bark', I could implement enums as method definitions that return simple arrays or hashes. (More on this later!)

To implement this functionality, I rewrote how Brakeman tracks methods (as real objects) and updated some method lookup code.

The details aren't particularly interesting, but the code changes are here.

Back to Enums

Calling enum essentially defines a bunch of methods.

For example this code:

class Task < ApplicationRecord
  enum status: {
    pending: 0,
    success: 1,
    failed: 3,
  }
end

Will define methods like

Task.statuses # the one we care about!
Task.status
Task#status
Task#pending?
Task#success?
# ..etc.

You can pass in an explicit hash mapping keys to values or just an array of keys and Rails will do the mapping.

To implement this in Brakeman, we simulate the creation of the status and statuses methods:

class Task < ApplicationRecord
  def self.statuses
    {
      pending: 0,
      success: 1,
      failed: 3,
    }
  end
end

Then we can rely on the previous changes to know Task.statuses returns a hard-coded hash table.

Another Detour

You might have noticed that the enum definition uses status but we need statuses.

This required a tiny tweak to Brakeman's extremely over-simplified pluralize.

Back to the False Positive

Where are we now?

With enum support and proper pluralization, this code:

Task.statuses.values_at(*["pending", "success"]).join(",")

Gets reduced like this:

{
  pending: 0,
  success: 1,
  failed: 3,
}.values_at(*["pending", "success"]).join(",")

to

{
  pending: 0,
  success: 1,
  failed: 3,
}.values_at("pending", "success").join(",")

to

[:BRAKEMAN_SAFE_LITERAL, :BRAKEMAN_SAFE_LITERAL].join(",")

to

"BRAKEMAN_SAFE_LITERAL,BRAKEMAN_SAFE_LITERAL"

Well... it's not perfect. Note that the array values are strings, but our enum uses symbol keys. But Brakeman knows the enum is all literal values which are safe, so we end up with this.

The query now looks like:

Task.connection.select_all(
  "SELECT COUNT(*) \
  FROM `filings` \
  WHERE `filings`.`status` NOT IN (#{"BRAKEMAN_SAFE_LITERAL,BRAKEMAN_SAFE_LITERAL"}) \
  AND `filings`.`time_end` BETWEEN #{Date.beginning_of_quarter} \
  AND #{Date.end_of_quarter}"
)

Again, not perfect but at least Brakeman isn't going to warn about interpolating a string literal into the query.

Not Done Yet

Ah, but wait. The warning is not gone!

Confidence: Medium
Category: SQL Injection
Check: SQL
Message: Possible SQL injection
Code: Task.connection.select_all("SELECT COUNT(*)\nFROM `filings`\nWHERE `filings`.`status` NOT IN (#{"BRAKEMAN_SAFE_LITERAL,BRAKEMAN_SAFE_LITERAL"})\nAND `filings`.`time_end` BETWEEN #{Date.beginning_of_quarter} AND #{Date.end_of_quarter}\n")
File: app/models/task.rb
Line: 25

What's wrong now?

Brakeman doesn't know what Date.beginning_of_quarter or Date.end_of_quarter are, so it generates a lower confidence warning about it. For SQL injection, Brakeman is pretty paranoid about any string interpolation, even if it's not sure the values are "dangerous".

But anything coming from Date is likely to be safe, so now Brakeman ignores Date calls in SQL.

Whew. Done?

Yep - now that code will no longer warn.

Except... in the months it took me to address this false positive, the code has changed in such a way that Brakeman now has a false negative problem (should be warning about some things, but isn't). But that's a different problem for a different day.

At least that one false positive is fixed!

34