@DanLebrero.

software, simply

Prometheus in a Clojure stack: Duct, Jetty, Compojure/Reitit and Hugsql

How to integrate a Clojure HTTP service with Prometheus

We are going to explain how to integrate Prometheus, one of the most popular open source monitoring systems, into a Clojure stack.

We will use my current Clojure stack of choice:

  1. Duct for dependency injection.
  2. Jetty as the HTTP server.
  3. Reitit for HTTP server routing. We will also see Compojure due to its popularity.
  4. HikariCP for DB connection pooling.
  5. HugSQL for SQL access.

We will use the iapetos Prometheus client library.

All the code is available at https://github.com/dlebrero/clojure-prometheus-example.

The basics

For starters, we will collect the monitoring information for the JVM plus some basic Ring stats.

The dependencies:

[clj-commons/iapetos "0.1.11"]
[io.prometheus/simpleclient_hotspot "0.9.0"]
[duct/core "0.8.0"]
[ring/ring-jetty-adapter "1.8.2"]
[org.eclipse.jetty/jetty-server "9.4.31.v20200723"] ;; See Gotcha section at bottom 
[org.eclipse.jetty/jetty-servlet "9.4.31.v20200723"]

The Duct configuration required is:

:prometheus-example.handler.monitoring-basic/collector {} ;; (1)
:prometheus-example.handler.monitoring-basic/middleware ;; (2) 
{:collector #ig/ref :prometheus-example.handler.monitoring-basic/collector}

:duct.handler/root
{:middleware ^:distinct
           [#ig/ref :prometheus-example.handler.monitoring-basic/middleware]}

The Prometheus Client requires you to register all the metrics that you are going to collect beforehand, which is what we are doing at (1) in :prometheus-example.handler.monitoring-basic/collector:

(defmethod ig/init-key :prometheus-example.handler.monitoring-basic/collector [_ config] ;; This is (1)
  (->
    (iapetos.core/collector-registry)
    (iapetos.collector.jvm/initialize)
    (iapetos.collector.ring/initialize)))

At (2), we add the Ring middleware that:

  1. Collects the actual timing and errors for the HTTP requests.
  2. Exposes a “/metrics” endpoint for Prometheus to collect the data.
(defmethod ig/init-key :prometheus-example.handler.monitoring-basic/middleware [_ {:keys [collector]}] ;; This is (2)
  #(-> %
     (iapetos.collector.ring/wrap-metrics collector {:path "/metrics"})))

After this, we have in our /metrics endpoint the following stats:

  • JVM threads and their state.
  • JVM GC and memory stats.
  • CPU usage.
  • HTTP histograms per path and return code.

Jetty HTTP server

As we are going to be using Jetty, we need to monitor its health. The monitoring provided by the Prometheus-Jetty integration is:

  1. Basic HTTP metrics from the point of view of Jetty itself. Note that the Ring metrics are captured several layers down the stack, while the Jetty metrics will provide us with a more complete view if there is something wrong going on between the Jetty code and the Ring metrics code.
  2. Health metrics about the Jetty thread pool that is serving the HTTP requests.

The dependencies to add are:

[io.prometheus/simpleclient_jetty "0.9.0"]
[io.prometheus/simpleclient_jetty_jdk8 "0.9.0"]

And the Duct configuration:

:prometheus-example.handler.monitoring-basic/jetty-configurator ;; (3)
{:collector #ig/ref :prometheus-example.handler.monitoring-basic/collector}

:duct.server.http/jetty ;; (4)
{:configurator #ig/ref :prometheus-example.handler.monitoring-basic/jetty-configurator}

At (3), the :prometheus-example.handler.monitoring-basic/jetty-configurator will add the required Jetty configuration plus register the metrics.

Then at (4) we have to configure the default Jetty server provided by Duct Web module to use this configurator.

REST routes

The basic Ring configuration that we have seen so far has the problem that if you use path parameters in your routes then you are going to end up with one metric for every and each different value of that path parameter.

For example, if you have the following Compojure route:

(GET "/some-path/:path-param/before" [path-param]
  {:body {:example "data" :param path-param}})

Two HTTP call to /user/some-path/dan/before and /user/some-path/lebrero/before, then your metrics are going to have:

http_requests_total{method="GET",status="200",statusClass="2XX",path="/some-path/dan/before",} 1.0
http_requests_total{method="GET",status="200",statusClass="2XX",path="/some-path/lebrero/before",} 1.0

What you really want is just one metric like:

http_requests_total{method="GET",status="200",statusClass="2XX",path="/some-path/:path-param/before",} 1.0

Compojure

To know which route was matched we need to use Compojure’s wrap-routes, which allows to inject a middleware just after the matching route has been found. The matched route will be available in the request :compojure/route key.

Now is a matter of bubbling up that information to the metrics middleware:

(defmethod ig/init-key ::middleware [_ {:keys [collector]}]
  #(-> %
     (ring/wrap-metrics collector {:path "/metrics"
                                   :label-fn (fn [req resp]
                                               (if resp
                                                 {:path (str (::context resp) (second (::route resp)))}
                                                 {:path (:uri req)}))})
     (compojure/wrap-routes (fn [handler]
                              (fn [req]
                                (assoc
                                  (handler req)
                                  ::route (:compojure/route req)
                                  ::context (:context req)))))))
  1. In the wrap-routes we capture the route and context.
  2. In the wrap-metrics we use that information to build the proper metrics path.

There are two important quirks with this code:

  1. Exceptions do not get the proper path. To fix, you would need to:
    • Catch any exception in wrap-routes and throw some slingshot or custom exception.
    • Write your own ring/wrap-metrics to handle those exceptions.
    • Note that if the error happens before the route matching, you cannot really do anything about it.
  2. If you have nested Compojure contexts like this example, then you end up with basically the same path parameters issue as in the basic configuration, but for context. To fix it, you need:
    • Write a Compojure context-like macro that preserves all the contexts matched, like this one.
    • Adjust the metrics collector to something like this.

You also have the option to do the metrics instrumentation inside the wrap-routes code as in here but then you are not measuring any middleware or code in a Compojure context, plus you will still need to deal with both quirks above.

Reitit

All the quirks with Compojure show us that, at least for this particular problem, a data-driven library ends up allowing a simpler and more elegant solution.

Apart from adding the Reitit dependency, you will need the following Duct configuration:

  :prometheus-example.handler.example-reitit/router ;; (5)
  {:collector #ig/ref :prometheus-example.handler.monitoring-basic/collector
   :db #ig/ref :duct.database/sql}

  :duct.router/cascading
  [#ig/ref [:prometheus-example.handler.example-reitit/router]]

So we are integrating Reitit as a simple middleware, which has a very simple implementation:

(defmethod ig/init-key :prometheus-example.handler.example-reitit/router [_ {:keys [collector db]}] ;; this is (5)
  (let [router (reitit.ring/router (routes db)
                 {:data {:middleware [(fn [handler] ;; (6)
                                        (iapetos.collector.ring/wrap-instrumentation handler collector 
                                          {:path-fn (fn [req] (:template (reitit.ring/get-match req)))}))]}})] ;; (7)
    (reitit.ring/ring-handler router nil
      {:middleware [(fn [handler]
                      (iapetos.collector.ring/wrap-metrics-expose handler collector {:path "/metrics"}))]})))

So to our Reitit routes to we are injecting at (6) the wrap-instrumentation middleware, that at (7) simply extracts the matched path from the request.

Neither of the issues mention with Compojure exist, and the code ends up being simpler.

Database

For the database we need to instrument two things: the database connection pool and the execution of the SQL queries.

HikariCP

Duct by default comes with dependencies for HikariCP that is an excellent database connection pool.

As HikariCP comes with support for Prometheus, we just need to initialize it properly:

  :prometheus-example.handler.monitoring-database/hikaricp
  {:hikari-cp #ig/ref :duct.database.sql/hikaricp
   :metrics-collector #ig/ref :prometheus-example.handler.monitoring-basic/collector}

And the initalization code:

(defmethod ig/init-key :prometheus-example.handler.monitoring-database/hikaricp
   [_ {:keys [hikari-cp metrics-collector] :as options}]
   (-> hikari-cp
      :spec
      :datasource
      (.setMetricsTrackerFactory
         (com.zaxxer.hikari.metrics.prometheus.PrometheusMetricsTrackerFactory.
            (iapetos.registry/raw metrics-collector))))
   hikari-cp)

See the HikariCP documentation for the metrics collected.

HugSQL

To instrument HugSQL, we need to create a HugSQL adapter that will decorate the underlying HugSQL adapter with the required metrics instrumentation:

(defmacro metrics
  [metrics-collector options & body]
  `(if ~metrics-collector
     (let [labels# {:query (:fn-name ~options), :result "success"}
           failure-labels# {:query (:fn-name ~options), :result "failure"}]
       (iapetos.core/with-success-counter (~metrics-collector :sql/run-total labels#)
         (iapetos.core/with-failure-counter (~metrics-collector :sql/run-total failure-labels#)
           (iapetos.collector.exceptions/with-exceptions (~metrics-collector :sql/exceptions-total labels#)
             (iapetos.core/with-duration (~metrics-collector :sql/run-duration labels#)
               ~@body)))))
     (do ~@body)))

(deftype MetricsAdapter [metrics-collector jdbc-adapter]

  hugsql.adapter/HugsqlAdapter
  (execute [_ db sqlvec options]
    (metrics metrics-collector options
      (hugsql.adapter/execute jdbc-adapter db sqlvec options)))

  (query [_ db sqlvec options]
    (metrics metrics-collector options
      (hugsql.adapter/query jdbc-adapter db sqlvec options)))

...
  )

And then just set the adapter as the default HugSQL adapter:

(hugsql.core/set-adapter!
   (MetricsAdapter.
      (iapetos.core/register metrics-collector ;; See Gotcha section at bottom
         (iapetos.core/histogram
            :sql/run-duration
            {:description "SQL query duration"
             :labels [:query]})
         (iapetos.core/counter
            :sql/run-total
            {:description "the total number of finished runs of the observed sql query."
             :labels [:query :result]})
         (iapetos.collector.exceptions/exception-counter
            :sql/exceptions-total
            {:description "the total number and type of exceptions for the observed sql query."
             :labels [:query]}))
      (hugsql-adapter-case.adapters/kebab-adapter)))

In the example project we do this as part of the :prometheus-example.handler.monitoring-database/hikaricp init key.

The SQL metrics will be collected per HugSQL function name as per:

sql_run_duration_count{query="upsert-user!",} 4.0
sql_run_duration_count{query="get-user-by-email",} 2.0



All together

Putting it all together, you will have an endpoint to expose all the Prometheus metrics to know the health of your service:

  • JVM threads and their state.
  • JVM GC and memory stats.
  • CPU usage.
  • Performance and error metrics per HTTP route.
  • Basic HTTP metrics from the point of view of Jetty itself.
  • Health metrics about the Jetty thread pool.
  • Health and some performance metrics for the HickariCP connection pool.
  • Performance and error metrics per SQL call.

Collecting is just the first step on the road. Now you need to visualize the data and create alerts to know when something requires your attention, which require their own blog posts.


Gotchas

Here is a list of bumps on the road while building the example application.

ClassNotFoundException (org.eclipse.jetty.util.ssl.SslContextFactory$Server)

Symptom: Application not starting because:

Syntax error (ClassNotFoundException) compiling at (ring/adapter/jetty.clj:1:1).
org.eclipse.jetty.util.ssl.SslContextFactory$Server

Cause: Latest Version of [io.prometheus/simpleclient_jetty_jdk8 "0.9.0"] brings [org.eclipse.jetty/jetty-server "9.4.4.v20170414"] which does not come with that class and that ring-jetty-adapter depends on.

Fix: explicitly depend on [org.eclipse.jetty/jetty-server "9.4.31.v20200723"] and [org.eclipse.jetty/jetty-servlet "9.4.31.v20200723"], or exclude those dependencies from simpleclient_jetty_jdk8

ClassCastException when starting Duct

Symptom: Application not starting because:

Execution error (ClassCastException) at duct.core/eval4586$fn (core.clj:258).
class clojure.lang.PersistentList cannot be cast to class clojure.lang.Associative (clojure.lang.PersistentList and clojure.lang.Associative are in unnamed module of loader 'app')

Cause: I had deleted the dev/resources/dev.edn as it was empty. This caused the :duct.profile/dev #duct/include "dev" in the resources/prometheus_example/config.edn file to include the dev/src/dev.clj file in the Duct config.

Fix: Add empty dev/resources/dev.edn or delete #duc/include "dev" from the resources/prometheus_example/config.edn

No implementation of method: :increment* of protocol: #’iapetos.operations/IncrementableCollector found for class: nil

Symptom: Any call to iapetos.core/inc or similar functions fail with:

No implementation of method: :increment* of protocol: #'iapetos.operations/IncrementableCollector found for class: nil

Cause: iapetos wraps the mutable Prometheus CollectorRegistry with its own immutable abstraction, which will create “phantom” metrics: a metric that exists in the Prometheus CollectorRegistry but not in the iapetos one.

This confused me as the /metrics endpoint uses the underlying Prometheus registry, so “phantom” metrics will be shown.

Fix: Have one place to register all the metrics in the iapetos registry, or be aware of the fact when registering the metrics.


Did you enjoy it? or share!

Tagged in : Architecture Clojure