The "de Bruijn factor"



In "A survey of the project Automath" N.G. de Bruijn wrote:
A very important thing that can be concluded from all writing experiments is the constancy of the loss factor. The loss factor expresses what we loose in shortness when translating very meticulous "ordinary" mathematics into Automath. This factor may be quite big, something like 10 or 20, but it is constant: it does not increase if we go further in the book. It would not be too hard to push the constant factor down by efficient abbreviations.

Because of this quote, I defined the de Bruijn factor to be the quotient of the size of a formalization of a mathematical text and the size of its informal original. To make this specific, I made the following choices:

  • The size of the informal text is the size of a TeX encoding.
  • To be independent of arbitrary factors like lengths of identifiers and amount of whitespace, the files should be compared compressed with the Unix utility gzip.
I investigated the exact value of de Bruijn factor in a number of specific cases:

Apparently people generally select texts for formalization that are matched to their system in difficulty such that the de Bruijn factor becomes around four.

I did a poster session about the de Bruijn factor at the TPHOLs 2000 conference (here are the ps.gz, dvi and tex versions of the text that I submitted there.)

(last modification 2012-03-01)