YugabyteDB JDBC Smart Driver for proxyless HA/LB

In the previous post of this series, I've created a rate-limiting function to give request tokens at a specific rate. The code was the same for PostgreSQL and YugabyteDB but the race condition behavior is different. To be scalable, on a distributed database, the application must be able to connect to any database node and handle optimistic locking transaction conflicts. This is additional code, but still a lot easier than trying to implement all synchronization in the application.

I'll put this application logic in a Java program, in the next post. And this is the occasion to introduce the JDBC driver enhancements for YugabyteDB. You can use the PostgreSQL driver to connect to any YugabyteDB node, but it is not cluster-aware. Then, to connect to a distributed database, this requires an HA proxy to detect node failures, and load-balance the connections across a list of nodes. Even this list of nodes can be a problem because, with YugabyteDB, you will add or remove nodes to adapt to the load, or allow rolling upgrades. To have the application follow the server elasticity, the easiest is using the enhanced JDBC driver, a fork from the PostgreSQL one, with cluster-aware additions.

Cluster Aware Load Balancer

You will probably follow the instructions from the documentation and get the driver through Maven dependencies. Here I'll simply get the .jar and add it to my CLASSPATH:

wget -qc -O jdbc-yugabytedb.jar https://github.com/yugabyte/pgjdbc/releases/download/v1.0.0/jdbc-yugabytedb-42.3.0.jar
export CLASSPATH=.:./jdbc-yugabytedb.jar

In my Centos7 lab I install the Java SDK:

sudo yum install -y java-1.8.0-devel wget

And create this simple program:

cat > YBDemo.java <<'JAVA'

import java.sql.*;
import com.yugabyte.ysql.YBClusterAwareDataSource ;

public class YBDemo extends Thread {

 private Connection connection; // SQL connection to the database
 private PreparedStatement sql; // SQL statement to call to function

 public YBDemo(YBClusterAwareDataSource ds) throws SQLException {
  this.connection=ds.getConnection();
  // create a sequence to see it used from multiple nodes
  this.connection.createStatement().executeUpdate("create sequence if not exists ybdemo");
  // prepare a statement to show current connection information
  this.sql = connection.prepareStatement(
     "select row_to_json(info)::text from (select *,nextval('ybdemo'), now(),pg_backend_pid() from "
    +"(select host(inet_server_addr()) host) as server"
    +" natural left join (select host,node_type,cloud,region,zone from yb_servers()) servers"
    +") info");
 }

 public void run() {
   ResultSet rs;
   for(;;) {
    try {
     //each thread runs a query every 1.5 second
     Thread.sleep(1500);
     rs = sql.executeQuery();
     rs.next();
     System.out.println(
      currentThread().getName()+"\t"+rs.getString(1)
      );
     } catch(Exception e) {
      System.out.printf("!!! Thread failed with " + e );
      return;
     }
   }
  }

  public static void main(String[] args) throws SQLException {
     // argument 1 is the number of threads (= connections)
     // argument 2 is the JDBC connection string
     YBClusterAwareDataSource ds = new YBClusterAwareDataSource();
     ds.setUrl( args[1] );
     YBDemo thread;
     for (int i=0;i<Integer.valueOf( args[0] );i++){
      thread=new YBDemo(ds);
      thread.start();
      }
     }
 }

JAVA

javac YBDemo.java

I can run it providing any endpoint of my YugabyteDB cluster (or even a comma-separated list to try in order). Here, for this demo, I'm running this on the database node itself, and I'm just using $(hostname -I | sed -e 's/ /:5433,/g' -e 's/,$//') to get the local IP address list (I did that for easy copy/paste but you should use the IP address directly):

java YBDemo 5 "jdbc:yugabytedb://$(hostname -I | sed -e 's/ /:5433,/g' -e 's/,$//')/yugabyte?user=yugabyte&password=yugabyte&ApplicationName=YBDemo&load-balance=true"

My YBDemo displays the host I'm connected to (host(inet_server_addr())) and the pid (pg_backend_pid()). I can see that connections were balanced among the 3 hosts of my cluster:

Thread-1        {"host":"172.159.18.130","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1a","nextval":8074,"now":"2022-01-03T15:39:29.329057+00:00","pg_backend_pid":2170}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8174,"now":"2022-01-03T15:39:29.373094+00:00","pg_backend_pid":6930}
Thread-3        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8274,"now":"2022-01-03T15:39:29.435793+00:00","pg_backend_pid":32062}
Thread-4        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8374,"now":"2022-01-03T15:39:29.505281+00:00","pg_backend_pid":32074}
Thread-5        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8474,"now":"2022-01-03T15:39:29.61471+00:00","pg_backend_pid":6946}
Thread-1        {"host":"172.159.18.130","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1a","nextval":8075,"now":"2022-01-03T15:39:30.832862+00:00","pg_backend_pid":2170}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8175,"now":"2022-01-03T15:39:30.87692+00:00","pg_backend_pid":6930}
Thread-3        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8275,"now":"2022-01-03T15:39:30.939687+00:00","pg_backend_pid":32062}
Thread-4        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8375,"now":"2022-01-03T15:39:31.009035+00:00","pg_backend_pid":32074}
Thread-5        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8475,"now":"2022-01-03T15:39:31.118523+00:00","pg_backend_pid":6946}
Thread-1        {"host":"172.159.18.130","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1a","nextval":8076,"now":"2022-01-03T15:39:32.336716+00:00","pg_backend_pid":2170}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8176,"now":"2022-01-03T15:39:32.380678+00:00","pg_backend_pid":6930}
Thread-3        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8276,"now":"2022-01-03T15:39:32.443618+00:00","pg_backend_pid":32062}
Thread-4        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8376,"now":"2022-01-03T15:39:32.512798+00:00","pg_backend_pid":32074}
Thread-5        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8476,"now":"2022-01-03T15:39:32.622349+00:00","pg_backend_pid":6946}
Thread-1        {"host":"172.159.18.130","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1a","nextval":8077,"now":"2022-01-03T15:39:33.840503+00:00","pg_backend_pid":2170}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8177,"now":"2022-01-03T15:39:33.884577+00:00","pg_backend_pid":6930}
Thread-3        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8277,"now":"2022-01-03T15:39:33.947474+00:00","pg_backend_pid":32062}
Thread-4        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8377,"now":"2022-01-03T15:39:34.016619+00:00","pg_backend_pid":32074}
Thread-5        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8477,"now":"2022-01-03T15:39:34.126215+00:00","pg_backend_pid":6946}
Thread-1        {"host":"172.159.18.130","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1a","nextval":8078,"now":"2022-01-03T15:39:35.344315+00:00","pg_backend_pid":2170}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8178,"now":"2022-01-03T15:39:35.388354+00:00","pg_backend_pid":6930}
Thread-3        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8278,"now":"2022-01-03T15:39:35.451321+00:00","pg_backend_pid":32062}
Thread-4        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8378,"now":"2022-01-03T15:39:35.520383+00:00","pg_backend_pid":32074}
Thread-5        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8478,"now":"2022-01-03T15:39:35.630117+00:00","pg_backend_pid":6946}
Thread-1        {"host":"172.159.18.130","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1a","nextval":8079,"now":"2022-01-03T15:39:36.848136+00:00","pg_backend_pid":2170}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8179,"now":"2022-01-03T15:39:36.892074+00:00","pg_backend_pid":6930}
Thread-3        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8279,"now":"2022-01-03T15:39:36.955163+00:00","pg_backend_pid":32062}
Thread-4        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8379,"now":"2022-01-03T15:39:37.024168+00:00","pg_backend_pid":32074}
Thread-5        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8479,"now":"2022-01-03T15:39:37.133938+00:00","pg_backend_pid":6946}
Thread-1        {"host":"172.159.18.130","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1a","nextval":8080,"now":"2022-01-03T15:39:38.351955+00:00","pg_backend_pid":2170}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8180,"now":"2022-01-03T15:39:38.395907+00:00","pg_backend_pid":6930}
Thread-3        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8280,"now":"2022-01-03T15:39:38.459082+00:00","pg_backend_pid":32062}
Thread-4        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8380,"now":"2022-01-03T15:39:38.527975+00:00","pg_backend_pid":32074}
Thread-5        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8480,"now":"2022-01-03T15:39:38.637725+00:00","pg_backend_pid":6946}
Thread-1        {"host":"172.159.18.130","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1a","nextval":8081,"now":"2022-01-03T15:39:39.855746+00:00","pg_backend_pid":2170}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8181,"now":"2022-01-03T15:39:39.899669+00:00","pg_backend_pid":6930}
Thread-3        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8281,"now":"2022-01-03T15:39:39.962947+00:00","pg_backend_pid":32062}
Thread-4        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8381,"now":"2022-01-03T15:39:40.031743+00:00","pg_backend_pid":32074}
Thread-5        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8481,"now":"2022-01-03T15:39:40.141553+00:00","pg_backend_pid":6946}
Thread-1        {"host":"172.159.18.130","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1a","nextval":8082,"now":"2022-01-03T15:39:41.35953+00:00","pg_backend_pid":2170}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8182,"now":"2022-01-03T15:39:41.403465+00:00","pg_backend_pid":6930}
Thread-3        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8282,"now":"2022-01-03T15:39:41.466752+00:00","pg_backend_pid":32062}
Thread-4        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8382,"now":"2022-01-03T15:39:41.535468+00:00","pg_backend_pid":32074}
Thread-5        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8482,"now":"2022-01-03T15:39:41.645329+00:00","pg_backend_pid":6946}
Thread-1        {"host":"172.159.18.130","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1a","nextval":8083,"now":"2022-01-03T15:39:42.863305+00:00","pg_backend_pid":2170}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8183,"now":"2022-01-03T15:39:42.907192+00:00","pg_backend_pid":6930}
Thread-3        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8283,"now":"2022-01-03T15:39:42.970651+00:00","pg_backend_pid":32062}
Thread-4        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8383,"now":"2022-01-03T15:39:43.039266+00:00","pg_backend_pid":32074}
Thread-5        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8483,"now":"2022-01-03T15:39:43.149202+00:00","pg_backend_pid":6946}
Thread-1        {"host":"172.159.18.130","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1a","nextval":8084,"now":"2022-01-03T15:39:44.367065+00:00","pg_backend_pid":2170}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8184,"now":"2022-01-03T15:39:44.410957+00:00","pg_backend_pid":6930}
Thread-3        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8284,"now":"2022-01-03T15:39:44.474523+00:00","pg_backend_pid":32062}
Thread-4        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8384,"now":"2022-01-03T15:39:44.543052+00:00","pg_backend_pid":32074}
Thread-5        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8484,"now":"2022-01-03T15:39:44.653111+00:00","pg_backend_pid":6946}
Thread-1        {"host":"172.159.18.130","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1a","nextval":8085,"now":"2022-01-03T15:39:45.870838+00:00","pg_backend_pid":2170}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8185,"now":"2022-01-03T15:39:45.91472+00:00","pg_backend_pid":6930}
Thread-3        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8285,"now":"2022-01-03T15:39:45.978353+00:00","pg_backend_pid":32062}
Thread-4        {"host":"172.159.62.137","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1c","nextval":8385,"now":"2022-01-03T15:39:46.046868+00:00","pg_backend_pid":32074}
Thread-5        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":8485,"now":"2022-01-03T15:39:46.156907+00:00","pg_backend_pid":6946}

This list of hosts has been dynamically updated once the driver was connected to the provided endpoint. Note that I've used hostname -I to get build list of IP addresses so that it tries one after the other if some are not available. You can use jdbc:yugabytedb://$(hostname)/yugabyte?user=yugabyte&password=yugabyte for simplicity which gets then round-robin.

If you connect through a public address, and the yb-tserver was not started with --server_broadcast_addresses you will get WARNING: TopologyAwareLoadBalancer: Either private or public should have matched with one of the servers. This is because the server has no way to know its own public address and then cannot build the list.

Topology Aware Load Balancer

From my YBDemo output, you can see that each node comes with its placement information, specified as cloud/region/zone when starting the nodes. The cloud provider (aws here - YugabyteDB database can be multi-cloud), a region (eu-west-1 as I've created a regional cluster in Ireland). And Availability zones (I have one node in each eu-west-1a, eu-west-1b, eu-west-1c here).

When running multiple instances of the application, you probably want to connect to the database nodes in the same AZ. One reason is latency. The other is availability: if a zone fails, both will be gone and the other remain without the need to reconnect.

The JDBC smart driver can limit the connections by filtering the list of available nodes. For example I can connect to two zones only with topology-keys=aws.eu-west-1.eu-west-1b,aws.eu-west-1.eu-west-1c:

java YBDemo 5 "jdbc:yugabytedb://$(hostname -I | sed -e 's/ /:5433,/g' -e 's/,$//')/yugabyte?user=yugabyte&password=yugabyte&ApplicationName=YBDemo&load-balance=true&topology-keys=aws.eu-west-1.eu-west-1a,aws.eu-west-1.eu-west-1b"

Or using the Amazon EC2 metadata to connect to the same AZ as my application:

java YBDemo 5 "jdbc:yugabytedb://$(hostname -I | sed -e 's/ /:5433,/g' -e 's/,$//')/yugabyte?user=yugabyte&password=yugabyte&ApplicationName=YBDemo&load-balance=true&topology-keys=aws.$(curl -s http://169.254.169.254/latest/meta-data/placement/region).$(curl -s http://169.254.169.254/latest/meta-data/placement/availability-zone)"

I have only one node per AZ. I change this by scaling our from 3 to 6 nodes:
2 nodes per AZ

Now, starting the same command line as above still distributes the connections to the two nodes of eu-west-1b:

Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43211,"now":"2022-01-04T12:57:31.091212+00:00","pg_backend_pid":27081}
Thread-3        {"host":"172.159.35.88","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43311,"now":"2022-01-04T12:57:31.164398+00:00","pg_backend_pid":1545}
Thread-4        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43411,"now":"2022-01-04T12:57:31.231342+00:00","pg_backend_pid":27096}
Thread-5        {"host":"172.159.35.88","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43511,"now":"2022-01-04T12:57:31.298354+00:00","pg_backend_pid":1557}
Thread-1        {"host":"172.159.35.88","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43112,"now":"2022-01-04T12:57:32.530259+00:00","pg_backend_pid":1532}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43212,"now":"2022-01-04T12:57:32.594813+00:00","pg_backend_pid":27081}
Thread-3        {"host":"172.159.35.88","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43312,"now":"2022-01-04T12:57:32.668005+00:00","pg_backend_pid":1545}
Thread-4        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43412,"now":"2022-01-04T12:57:32.73484+00:00","pg_backend_pid":27096}
Thread-5        {"host":"172.159.35.88","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43512,"now":"2022-01-04T12:57:32.801996+00:00","pg_backend_pid":1557}
Thread-1        {"host":"172.159.35.88","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43113,"now":"2022-01-04T12:57:34.033956+00:00","pg_backend_pid":1532}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43213,"now":"2022-01-04T12:57:34.098408+00:00","pg_backend_pid":27081}
Thread-3        {"host":"172.159.35.88","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43313,"now":"2022-01-04T12:57:34.171554+00:00","pg_backend_pid":1545}
Thread-4        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43413,"now":"2022-01-04T12:57:34.238357+00:00","pg_backend_pid":27096}
Thread-5        {"host":"172.159.35.88","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43513,"now":"2022-01-04T12:57:34.305574+00:00","pg_backend_pid":1557}

If one of the two nodes in this AZ fail, my program jsut stops the thread when connection is lost:

Thread-5        {"host":"172.159.35.88","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43513,"now":"2022-01-04T12:57:34.305574+00:00","pg_backend_pid":1557}
Thread-1        {"host":"172.159.35.88","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43114,"now":"2022-01-04T12:57:35.537595+00:00","pg_backend_pid":1532}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43214,"now":"2022-01-04T12:57:35.601979+00:00","pg_backend_pid":27081}
Thread-3        {"host":"172.159.35.88","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43314,"now":"2022-01-04T12:57:35.675178+00:00","pg_backend_pid":1545}
Thread-4        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43414,"now":"2022-01-04T12:57:35.741823+00:00","pg_backend_pid":27096}
Thread-5        {"host":"172.159.35.88","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43514,"now":"2022-01-04T12:57:35.809218+00:00","pg_backend_pid":1557}
Thread-1        {"host":"172.159.35.88","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43115,"now":"2022-01-04T12:57:37.041351+00:00","pg_backend_pid":1532}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43215,"now":"2022-01-04T12:57:37.105528+00:00","pg_backend_pid":27081}
Thread-3        {"host":"172.159.35.88","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43315,"now":"2022-01-04T12:57:37.178864+00:00","pg_backend_pid":1545}
Thread-4        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43415,"now":"2022-01-04T12:57:37.245432+00:00","pg_backend_pid":27096}
Thread-5        {"host":"172.159.35.88","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43515,"now":"2022-01-04T12:57:37.312849+00:00","pg_backend_pid":1557}
!!! Thread failed with com.yugabyte.util.PSQLException: FATAL: terminating connection due to administrator commandThread-2      {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43216,"now":"2022-01-04T12:57:38.609038+00:00","pg_backend_pid":27081}
!!! Thread failed with com.yugabyte.util.PSQLException: FATAL: terminating connection due to administrator commandThread-4      {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43416,"now":"2022-01-04T12:57:38.748899+00:00","pg_backend_pid":27096}
!!! Thread failed with com.yugabyte.util.PSQLException: FATAL: terminating connection due to administrator commandThread-2      {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43217,"now":"2022-01-04T12:57:40.112609+00:00","pg_backend_pid":27081}
Thread-4        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43417,"now":"2022-01-04T12:57:40.252475+00:00","pg_backend_pid":27096}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43218,"now":"2022-01-04T12:57:41.616226+00:00","pg_backend_pid":27081}
Thread-4        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43418,"now":"2022-01-04T12:57:41.755983+00:00","pg_backend_pid":27096}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43219,"now":"2022-01-04T12:57:43.119764+00:00","pg_backend_pid":27081}
Thread-4        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43419,"now":"2022-01-04T12:57:43.259628+00:00","pg_backend_pid":27096}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43220,"now":"2022-01-04T12:57:44.623276+00:00","pg_backend_pid":27081}
Thread-4        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43420,"now":"2022-01-04T12:57:44.763154+00:00","pg_backend_pid":27096}
Thread-2        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43221,"now":"2022-01-04T12:57:46.12689+00:00","pg_backend_pid":27081}
Thread-4        {"host":"172.159.45.55","node_type":"primary","cloud":"aws","region":"eu-west-1","zone":"eu-west-1b","nextval":43421,"now":"2022-01-04T12:57:46.266763+00:00","pg_backend_pid":27096}

Ideally, the program should re-connect to load balance on the remaining nodes, or adjust because the failure was transient. I could add this in my demo program but this is a job for the connection pool. We will see that in another blog post.

Troubleshooting

The goal of the Smart Driver is to make the configuration easy. But if troubleshooting is needed you can add &loggerLevel=TRACE to the url. This can show:

Jan 04, 2022 1:21:37 PM com.yugabyte.ysql.TopologyAwareLoadBalancer$CloudPlacement equals
FINE: equals called for this: Placement: aws.eu-west-1.eu-west-1b and other = Placement: aws.eu-west-1.eu-west-1b
Jan 04, 2022 1:21:37 PM com.yugabyte.ysql.TopologyAwareLoadBalancer$CloudPlacement equals
FINE: equals returning: true
Jan 04, 2022 1:21:37 PM com.yugabyte.ysql.TopologyAwareLoadBalancer getCurrentServers
FINE: TopologyAwareLoadBalancer: allowedPlacements set: [Placement: aws.eu-west-1.eu-west-1b] returned contains true for cp: Placement: aws.eu-west-1.eu-west-1b
Jan 04, 2022 1:21:37 PM com.yugabyte.jdbc.PgResultSet getString
FINEST:   getString columnIndex: 1
Jan 04, 2022 1:21:37 PM com.yugabyte.jdbc.PgResultSet getString
FINEST:   getString columnIndex: 8
Jan 04, 2022 1:21:37 PM com.yugabyte.jdbc.PgResultSet getString
FINEST:   getString columnIndex: 5
Jan 04, 2022 1:21:37 PM com.yugabyte.jdbc.PgResultSet getString
FINEST:   getString columnIndex: 6
Jan 04, 2022 1:21:37 PM com.yugabyte.jdbc.PgResultSet getString
FINEST:   getString columnIndex: 7
Jan 04, 2022 1:21:37 PM com.yugabyte.jdbc.PgResultSet getString
FINEST:   getString columnIndex: 2
Jan 04, 2022 1:21:37 PM com.yugabyte.ysql.TopologyAwareLoadBalancer getCurrentServers
FINE: TopologyAwareLoadBalancer: allowedPlacements set: [Placement: aws.eu-west-1.eu-west-1b] returned contains false for cp: Placement: aws.eu-west-1.eu-west-1a
Jan 04, 2022 1:21:37 PM com.yugabyte.ysql.ClusterAwareLoadBalancer getPrivateOrPublicServers
WARNING: TopologyAwareLoadBalancer: Either private or public should have matched with one of the servers
Jan 04, 2022 1:21:37 PM com.yugabyte.Driver getConnectionBalanced
WARNING: yb_servers() refresh failed in first attempt itself. Falling back to default behaviour
Jan 04, 2022 1:21:37 PM com.yugabyte.Driver makeConnection
WARNING: Failed to apply load balance. Trying normal connection
Jan 04, 2022 1:21:37 PM com.yugabyte.jdbc.PgConnection <init>
FINE: PostgreSQL JDBC Driver 42.3.0

This was an example where load balancing failed and the connection continues only on the provided endpoint. If you want to know how the SmartDriver discovers the nodes, here is the view used by it:

yugabyte=# select * from yb_servers();

      host      | port | num_connections | node_type | cloud |  region   |    zone    | public_ip
----------------+------+-----------------+-----------+-------+-----------+------------+-----------
 172.159.51.69  | 5433 |               0 | primary   | aws   | eu-west-1 | eu-west-1c |
 172.159.35.88  | 5433 |               0 | primary   | aws   | eu-west-1 | eu-west-1b |
 172.159.62.137 | 5433 |               0 | primary   | aws   | eu-west-1 | eu-west-1c |
 172.159.18.130 | 5433 |               0 | primary   | aws   | eu-west-1 | eu-west-1a |
 172.159.45.55  | 5433 |               0 | primary   | aws   | eu-west-1 | eu-west-1b |
 172.159.18.58  | 5433 |               0 | primary   | aws   | eu-west-1 | eu-west-1a |
(6 rows)

yugabyte=#

This is dynamically updated when scaling in and out.

The JDBC Smart Driver is currently in beta. Many features will come to adapt the connection balancing to the load, or to follow the placement of tablets, and retry the transactions conflicts that can be restarted transparently.

24