Encoding Validation

Ensuring that your configurations are valid can be a tricky challenge. What we’re trying to avoid is latent configuration errors1 which occur because configuration values are not validated upfront. When trying to use these values, we realize they are unusable, potentially causing all sorts of problems. For example, as seen below, we might accidentally use weak secret keys in our production environment, or try to start our service on ports we should never occupy in the first place.

Ciris approach to avoiding latent configuration errors is to use more precise types for your configuration values, only allowing values which you know are useable to exist in the application. Essentially, values are validated as they are loaded, as part of the configuration loading process, and you’ll end up with a configuration you know is useable. As you’ll see later on, determining what useable means can be difficult on its own, and we’ll discuss how to reason about the concept.

The main thing to remember is that we’re trying to prevent errors where possible, and reduce the possibility of errors where they cannot be fully prevented. Ideally, we want to make only valid configurations representable, and discover invalid configuration values as early as possible. The ultimate goal is to make working with configurations more safe.

Precise Configurations

One challenge with loading configuration values is that most values are interpreted as Strings, but that’s rarely the type we want, or should, use to represent values. For example, you probably don’t want to use any String as an API key (surely not the empty String, and not too weak keys), and not String or any Int for the port number (many port numbers are reserved or require sudo permissions to use).

Ciris encourages you to encode validation by using more precise types, and integrates with several external libraries, like enumeratum, refined, and squants, to be able to decode values into types provided by those libraries. One of the easiest and most convenient ways to use more precise types, is to use refined and refinement types.

Using refinement types, we can create a type which refines an existing base type by applying a predicate type, which represents the validation logic. For example, we could express a type ApiKey, which, in this case, is any String with a length between 25 and 40 characters, and which only contains alphanumeric characters.

import eu.timepit.refined.api.Refined
import eu.timepit.refined.string.MatchesRegex
import eu.timepit.refined.W

type ApiKey = String Refined MatchesRegex[W.`"[a-zA-Z0-9]{25,40}"`.T]

By using the ApiKey type instead of String whenever we deal with an API key, we can now be confident that the value is not an invalid variant (like the empty String, or a too weak key, for example). Ciris integrates with refined, so you can load configuration values of type ApiKey without writing any additional code.

import ciris.{env, prop}
// import ciris.{env, prop}

import ciris.refined._
// import ciris.refined._

// res0: ciris.ConfigValue[ciris.api.Id,Option[ApiKey]] = ConfigValue$832516911

Refinement types are also useful for ensuring that configuration values residing in code are valid. Thanks to refined providing an auto macro, we can ensure that literal configuration values conform to their predicates at compile-time, and all we have to do is to use the appropriate import. Note that the actual ApiKey (or any other secret values) shouldn’t be included in code, but rather loaded from, for example, a vault service. The ApiKey below could, for example, be used in local tests, and would there not be seen as a secret, and could therefore reside in code.

import eu.timepit.refined.auto._
// import eu.timepit.refined.auto._

val apiKey: ApiKey = "RacrqvWjuu4KVmnTG9b6xyZMTP7jnX"
// apiKey: ApiKey = RacrqvWjuu4KVmnTG9b6xyZMTP7jnX

If the ApiKey is not valid, we’ll get an error at compile-time.

scala> val apiKey: ApiKey = "changeme"
<console>:23: error: Predicate failed: "changeme".matches("[a-zA-Z0-9]{25,40}").
       val apiKey: ApiKey = "changeme"

If we need to use libraries which doesn’t support our ApiKey type, we can retrieve the underlying String value.

// res1: String = RacrqvWjuu4KVmnTG9b6xyZMTP7jnX

Also, if we want to avoid accidentally logging secrets, we can use Secret.

import ciris.Secret
// import ciris.Secret

// res2: ciris.ConfigValue[ciris.api.Id,Option[ciris.Secret[ApiKey]]] = ConfigValue$1714997436

For more information about Secret and logging, refer to the logging configurations section.

Refinement types are not limited to Strings, and refined already includes many common refinement types. One example is UserPortNumber for Ints representing port numbers in the closed interval 1024 to 49151. This is a more precise definition of port numbers than Int, and lets us avoid many reserved port numbers.

import eu.timepit.refined.types.net.UserPortNumber
// import eu.timepit.refined.types.net.UserPortNumber

// res3: ciris.ConfigValue[ciris.api.Id,Option[eu.timepit.refined.types.net.UserPortNumber]] = ConfigValue$194704172

Putting everything together, we’re left with a more precise configuration, with validation encoded in the types.

import eu.timepit.refined.types.numeric.PosInt
import eu.timepit.refined.types.string.NonEmptyString

final case class ApiConfig(
  key: Secret[ApiKey],
  port: UserPortNumber,
  timeoutSeconds: PosInt

final case class Config(
  appName: NonEmptyString,
  api: ApiConfig

The literal, and default, configuration values are also validated at compile-time. Ciris helps you load refinement types without having to write any additional code, and we’ve already drastically reduced the risk of latent configuration errors.

import ciris.loadConfig
// import ciris.loadConfig

val config =
  ) { (apiKey, port) =>
      appName = "my-api",
      api = ApiConfig(
        key = apiKey,
        timeoutSeconds = 10,
        port = port getOrElse 4000
// config: ciris.ConfigResult[ciris.api.Id,Config] = ConfigResult$2120654772

Useable Configurations

An interesting question arises when using refinement types: how far should we go to ensure that our configuration values are useable? For example, despite having restricted port numbers to UserPortNumbers, there is nothing that guarantees that the specified port is actually available, as another service might already be using the port. Being familiar with refinement types, you might be tempted to write an OpenPort predicate, which checks whether the port is open or not by creating a socket and immediately closing it.

import eu.timepit.refined.api.Validate
import java.net.ServerSocket

final case class OpenPort()

implicit val openPortValidate: Validate.Plain[Int, OpenPort] =
  Validate.fromPartial(new ServerSocket(_).close(), "OpenPort", OpenPort())

We’ll then check whether some Ints conform to the OpenPort predicate.

import eu.timepit.refined.refineV
// import eu.timepit.refined.refineV

// System port number, requires sudo permissions
// res5: Either[String,eu.timepit.refined.api.Refined[Int,OpenPort]] = Right(989)

// User port number, can be used, and is not already used
// res7: Either[String,eu.timepit.refined.api.Refined[Int,OpenPort]] = Right(10000)

// Port number outside range, cannot be used
// res9: Either[String,eu.timepit.refined.api.Refined[Int,OpenPort]] = Left(OpenPort predicate failed: Port value out of range: 65536)

While this might seem like a good idea at first, when used in conjunction with the auto macro, for compile-time safe literal configuration values, we are actually performing the OpenPort check during compile-time. This means that the port values you specify in code, need to be open on the machine compiling the code, which is not what you would expect.

Maybe it’s not such a good idea to use impure functions in our predicates. There are still some configuration values for which we’ll have to guard against errors when using the values (binding a port number, for example). However, we can still reduce the possibility of errors by being more precise in the definition of the values. For port numbers, for example, it means that we can prevent attempts to use unuseable port number at compile-time (for port numbers specified in code), or as part of the configuration loading process (for port numbers loaded from the environment). If we’re able to detect unuseable configuration values as early as at compile-time, or during configuration loading, we’ve saved valuable time by preventing errors as early as possible.

In general, it’s recommended to only use pure functions in predicates, and to try and be as precise as is practically possible when defining configuration value types – you’ll have to use your own judgement when it comes to this. It might take considerable effort to create very precise predicate types, but it can also pay off in terms of fewer errors and failures. Sometimes it is enough to use a more precise type than you normally would, for example NonEmptyString instead of String, which might not be as precise as possible, but still eliminates some invalid variants.

External Libraries

When interacting with other libraries, you’ll often see uses of imprecise types, like String, even though a more precise type is expected. Often there is validation logic behind the scenes, which can be extracted to a predicate type, to avoid unexpected errors. An example is the name of a Kafka topic, where Kafka libraries typically accept a String for the topic name, but checks to ensure that it follows some validation rules. Depending on the library, these rules may or may not be well documented, and sometimes you’ll have to dive into the code to find them.

For reference, following is an example of how to express the Kafka topic name validation rules.

def isKafkaTopicName(topic: String): Boolean =
  1 <= topic.size && topic.size <= 249 && (
    topic != "." && topic != ".." && (
      topic.forall(c => c.isLetterOrDigit || c == '.' || c == '_' || c == '-')

For comparison, following is an example of how to express the validation rules with refinement types.

import eu.timepit.refined.boolean.{And, Not, Or}
import eu.timepit.refined.char.LetterOrDigit
import eu.timepit.refined.collection.{Forall, Size}
import eu.timepit.refined.generic.Equal
import eu.timepit.refined.numeric.Interval

type KafkaTopicName = String Refined
  And[Size[Interval.Closed[W.`1`.T, W.`249`.T]],

Note the similarities between working at the value-level with isKafkaTopicName, and representing the same validation rules at the type-level with KafkaTopicName. While the type signature above might look complicated at first glance, there is quite often a straightforward translation between validation rules at the value-level and the equivalent rules at the type-level. Note that we instead could have chosen to represent the rules with a regular expression, both at the value-level and type-level (using the MatchesRegex predicate).

Kafka topic names are generally not secret, and can therefore reside as configuration values in code. With the refinement type KafkaTopicName, we benefit from being able to validate our Kafka topic names at compile-time, meaning we can be sure at compile-time that our topic names are useable.

val kafkaTopicName: KafkaTopicName = "my-topic-v2"
// kafkaTopicName: KafkaTopicName = my-topic-v2

  1. For more information on latent configuration errors, refer to the paper Early Detection of Configuration Errors to Reduce Failure Damage and Leif Wickland’s presentation Defusing the Configuration Time Bomb on the subject.