Before I delve into what I would like to see in a type system, let me explain the type of code I write and the data I use. I am a data scientist and I write code that analyzes data at rest; the data is in a spreadsheet or in a database that is fed by another system. My programs are not continuously run; they are scripts that have can a do fail.

For that type of work, I want a type system that allows me to put a lot of detail into what the data I am expecting should look like. I want a type system that can help me detect data with impossible values at the earliest possible moment (such as a negative age) [1]. That is simply not possible when working with languages that restrict you to integers, floats, strings, etc.

And in case it was not clear from the title and liberal use words like 'like', this is a very opinionated article about my preference and light on proof and detail.

Constraints

Even if you use a language with strong, static typing (such as Haskell or an ML), you can encounter situations where a variable has the proper type, but there is no assurance that the values it can take are valid.

Consider the following OCaml code where we define a type, body_dim that holds measurements for a persons height and weight:

type body_dim =
  { height = float;
    weight = float;
  }

In this code, we have declared both height and weight to hold float values. That means we can happily set height to be less than zero and weight to hold some ridiculous amount like 10,000kg.

Several years ago, I worked on one study that required the calculation of BMI. When I calculated the BMI for the subjects, one subjects BMI close to 250 [2]. This was caused by the nurse listing the height in inches instead of centimeters.

In both cases, the strong type system of OCaml will not catch the error. These errors would be caught later when the data is visualized or summarized.

What I would like to be able to do is to encode certain known constraints directly into the type system. Extending on the OCaml example above, I would like to write code like this (in pseudo-OCaml):

type body_dim =
  { height = float;
    weight = float;
    weight > 0.0;
    height > 0.0;
    weight < 400.0;
    height < 300.0;
    weight /. (height /. 100.0) ** 2.0 < 100.0;
  }

This code does two things: it sets bounds on individual variables and it sets constraints on the value of the variables in relation to each other. In this way, it serves are an extension to the data dictionary in the code.

So, Objects?

Sure, I can use objects to enforce these constraints. In python, I could write my body_dim type as a class like so:

class BodyDim(object):
    def __init__(self, height, weight):
        if not (isinstance(height, float) or isinstance(height, int)):
            raise ValueError('height is not numeric')
        if not (isinstance(weight, float) or isinstance(weight, int)):
            raise ValueError('weight is not numeric')
        if height < 0 or height > 300:
            raise ValueError('height is out of range')
        if weight < 0 or weight > 400:
            raise ValueError('weight is out of range')
        if weight / (height / 100)**2 > 100:
            raise ValueError('BMI is too large')

        self.height = height
        self.weight = weight

This is really not that much more code than the pseudo-OCaml code. The reason I do not like this approach is separation of concerns. I would rather use types to encode intrinsic attributes of some data and use objects when I need to mix details about a type with methods on that type. In other words, I would want the class BodyDim to take a single value in its constructor that has the type body_dim as defined above [3].

Advantages and Disadvantages

By moving the constraints on data into the type system, we use the type system to describe what values a variable can take. This allows us to encapsulate all the knowledge about the data in the same place: the type.

I would love to say this sort of type system does not have its disadvantages. Two I can think of right of the bat is that it encodes a lot of 'business logic' in the type system and that such a complex type system would make it harder to do quick analysis scripts. These issues are a matter of preference.

Furthermore, I am not sure that adding constraints to a type system and maintaining static typing is realistic. The compiler would have to trace back all calculations and create constrained types for all previous calculations to maintain the constraints on a given variable. For example, consider the following type in pseudo-Ocaml:

type int_gt_10 =
  { x = int;
    x > 10;
  }

If we create a new variable with the type int_gt_10 by subtracting two ints, we would need to constrain the values of the ints so that the result of subtracting one from the other would be greater than 10. Alternatively, we could allow dynamic typing in such instances and if the subtraction resulted in an integer less than 10, there would be a runtime error.

The possibility of creating such a type system is better answered by Computer Scientists. I would just like to see new ways to correctness in the code I write.

[1]If you are using a language, such a C, with unsigned ints, then you can prevent a negative age, but you can't prevent an age of 200.
[2]The highest BMI recorded is just over 200; values over 100 are very rare.
[3]I know, I'm mixing languages but I think it gets the point across.

© 2018 Christopher Louden