Skip to main content

Ruby on Rails: Multiple Database Configuration and config.eager_load

I've worked on an app that started on Rails v1.23 way back in 2008. It's come a long way since then.

The app is on Rails 6.1 currently and in-progress to Rails 7.0.

When you upgrade any Ruby on Rails application, it's easy to miss things along the way. You may be aware of new features and breaking changes, but things can slip past unnoticed.

Until it catches you with your pants down.

The Triggering Change

One of the features that Rails 6.1 brought us was the ability to set up database connections for various roles.

Up to now, a database.yml file may look like this for the production environment.

production:
  adapter: mysql2
  encoding: utf8
  host: rw.dbhost.io
  username: db-write-user
  password: <%= ENV['DATABASE_PASSWORD'] %>

Rails 6.1 allows you to complicate your configurations, if needed, to work for sharding, as well as, send read-only queries to a read-only database server, which was my focus recently.

More info can be read on the Multiple Databases with ActiveRecord docs page.

The updated configuration would then be something like the following.

production:
  primary:
    adapter: mysql2
    encoding: utf8
    host: rw.dbhost.io
    username: db-write-user
    password: <%= ENV['DATABASE_PASSWORD'] %>
  primary_replica:
    adapter: mysql2
    encoding: utf8
    host: ro.dbhost.io
    username: db-read-user
    password: <%= ENV['DATABASE_READONLY_PASSWORD'] %>
    replica: true

ActiveRecord, by default, will look for the primary connection defined for an environment or the first set of connection details it can find.

Secondary role definitions only get called if you manually direct a query to that specific role or setup automatic role switching. Manual redirection is exactly our intent.

We can't quite switch yet to automatic role switching of read/write queries. When it's configured, Rails directs queries based on the HTTP request type. So, GET requests redirect all queries to a read-only database connection role. POST, PUT, PATCH, and DELETE requests go to the read/write role.

Our legacy app has a number of cases where we update data within a GET request. Whether that's bad or not, it's what it is.

But we can make use of manually sending queries to our read-only database servers where we would like.

That's a long lead up to this database configuration change uncovering the issue which is the topic of this post.

The First Problem

The above change to our database.yml file led to a weird problem. In some cases I've yet to identify, the first time the database connection is used, it fails by saying it's not connected.

This is regardless of the fact that ActiveRecord::Base.connection.active? returns true. You'd think that means the connection is there.

That's a lie. At least for Rails 6.1.

I found in this case, that while the not connected exception triggers, it results in establishing the connection so that subsequent queries work as expected.

Further, this is only an issue if your environment configuration has this setting.

config.eager_load = false

That setting tells Rails, if true, to completely load your entire application when it boots up.

Production environments are typically set to true. Development and test environments generally are set to false to allow for code reloading during development and test-driven development. For some reason, the issue doesn't seem to occur when running in development mode, but does in test.

I initially found the issue because I observed that I was getting failing specs (we use Rspec) during test runs and found it curious that it was never the same failing spec. Our tests run in a random order. But it was always the very first spec that failed.

Changing our test configuration to config.eager_load = true ended up resolving the issue, but I have to think that there's something else going on. I don't like making that change, so I'm still on the path to identify if something else is going on that better explains the behavior.

The Second Problem

In order to unblock us from using manually redirected queries, I merged the eager_load config change. I felt confident since our production environment already used true for the config.eager_load setting.

It failed to deploy.

One of the steps of our deployment is to restart a messaging server, which is done via a rake task. It was failing with the connection is not established exception mentioned above.

Turns out, at some point, a new config.rake_eager_load configuration setting was added to Rails, which we missed during whichever version introduced the setting.

Rake tasks no longer abide by the config.eager_load value in an environment's configuration and defaults to false. So you must set the following to include eager loading of rake tasks in your environment.

config.rake_eager_load = true

Once, I set config.rake_eager_load to true, I was finally able to deploy.

Next Steps

I want to dig into this a bit deeper and find out why exactly the connections for a multiple database config file cause the issue I've been experiencing.

If you've run into this issue and know more about it, please reach out via one of my methods of contact!

Update 1 Sep 2024

Haven't found a solution to this other than this to catch recurring exceptions. In our gargantuan monolith, there are only a handful of queries we've needed this for.

begin
  attempts ||= 1
  
  # whatever the query is
rescue ActiveRecord::ConnectionNotEstablished => e
  raise e if (attempts += 1) >= 3

  ActiveRecord::Base.establish_connection
  retry
end

Again, this is within local development where we don't eager_load anything. In all cases I've experienced, the first reconnect attempt works and the query retries and moves on normally. The attempts just adds some safety from potentially getting into an infinite loop of retries.

Within our monolith, we've only needed this for a handful of queries. The first that appeared were within middleware, but other devs have found a couple more in other areas of the app. I've found no commonality between the queries, however, that explains why they always exhibit the problem, while other queries are perfectly fine.