when nil is not nil

This week I encountered one of the most awful bugs. After spending hours finding it and then more hours understanding what was happening I felt it was my responsibility to share this knowledge so hopefully others may benefit from my misfortune.

The complete story involves multiple libraries and many different layers of code and abstraction which made this bug very hard to diagnose. Ultimately it came down to the question, what do you do when nil != nil.

What is nil anyway?

nil is a rather unusual thing in and of itself. The word itself is strange, it refers to something that is nothing. Or another way to word this is something that has no value or existence.

Nil in Go

In Go nil is even more unusual. As it turns out there are not one but two different nils in Go. In Go you can have nothing of something and you can also have nothing of nothing.

Go Types and their Empty Values

Go is a statically typed language meaning each variable or value is of a specific type and format. While most languages have a somewhat universal nil (or null) in Go there are multiple kinds of empty values. Some types can have nil values while other types can have zero or empty values.

Zero Values

In Go many of the types do not have nil values. Instead they have what is referred to as ‘Zero value’ in Go. The Zero value is more or less the default value of a non-nillable type. Non-nillable types include strings, booleans, numbers, arrays and structs.

To illustrate this point further, an example will be used. In Go you cannot have a string that is equal to nil. You can have a string which has a length of 0 (var name string = “”), but that is not equal to nil. Because string doesn’t have a nil value the compiler won’t even permit you to check if a string == nil.

var name string

if name == nil {
   fmt.Println("the string is nil")
}

If you try this you will get an ‘invalid operation’ invalid operation: name == nil (mismatched types string and nil) error. Try it yourself at http://play.golang.org/p/O0dIgwvPAU

Zero values make a lot of sense from a design standpoint. A variable which is one of these types represents directly the value being stored with minimal indirection. There is no need or even possibility of nil for these kind of types because there is no indirection. The object in question either exists or it does not exist, there is no third state.

Nil Values

The remaining types in Go do have nil values. These include slices, pointers, channels, maps and functions. These types all have one thing in common, they all have to do with referencing a location in memory. As they are referencing a location in memory which may or may not be currently set a nil value is required to represent an unset reference. So each of these types is either nil or set to the memory location of one of the values of one of the zero-able types.

Because we now have a level of indirection we now require a nil value.

As before, here are a couple a simple examples of a variable of different nillable types.

var nillable func()

if nillable == nil {
   fmt.Println("the var is nil")
}

http://play.golang.org/p/JhKaj873Su

var nillable chan int

if nillable == nil {
   fmt.Println("the var is nil")
}

http://play.golang.org/p/aLMC5-dG6c

What may seem even stranger is you can even use methods on something that is nil.

type NothingAtAll []struct{}

func (e NothingAtAll) IsNil() {
    if e == nil {
        fmt.Println("I am Nil")
    } else {
       fmt.Println("I am Not Nil")
    }
}

func main() {
    var e NothingAtAll
    e.IsNil()
}

http://play.golang.org/p/kYxIrG4lb2

Nil, Interfaces and You

What about interfaces? Interfaces remain one of the most challenging concepts and consequently underutilized tools in Go in spite of their apparent simplicity. Interfaces in Go provide a way to specify the behavior of an object. Interfaces provide a mechanism where objects of various types can be used as long as they provide the required behavior.

Interfaces can be used as both input and output of a function.

Now here’s where it really becomes interesting. Interfaces add another level of indirection to the mix. A pointer can be nil because it may or may not refer to a value (of type T). Interfaces can be thought of as a similar abstraction but instead of the value being abstract, interfaces abstract the type.

Just like the first level of indirection required a nil value, this second level of indirection requires a second kind of nil, the nil type.

Because of this second level of indirection it not poses a problem for the Go programmer. If a value (pointer) is only equal to nil if it’s value is nil, then when is an interface equal to nil?

There are two possible options for an interface that the Go authors could have chosen:

  1. like a pointer, an interface could be nil if it has a nil value
  2. an interface could be nil if it has a nil type

The authors of Go chose the second option.

An interface is only equal to nil when it is a nil type.

“I’ve never heard of a nil type” you say with confidence, “Go has no such type.” While it is not documented as a type and there is no way in Go to directly create a variable of type nil, Go does indeed have a nil type and chances are that if you are reading this you’ve already used them more often than you can count.

Interfaces and Nil Types everywhere

I would postulate that the most common line of Go code written is as follows:

if err != nil { ... }

or in context:

func doSomething() error {
    return nil
}

func main() {
    err := doSomething()

    if err != nil {
        fmt.Println("An Error Happened")
    }
}

We use this construct so blindly that we rarely look to see what is actually happening. Error is not a type, but an interface. We we return nil as an error we are returning an interface of nil type and nil value.

We never question this because it works like how we expect it to work. We return nil (set to a variable) and then check to see if that variable is nil.

Because the nature of how errors are used, in practice this use of interfaces and specifically the nil type it just works and we don’t really need to think about what’s actually going on. This is largely because an error typically has a very short life span and is always an error.

Beyond Errors

Using interfaces beyond just for errors becomes much more muddled. In practical use it’s often not obvious if a function (especially one from another library) is returning an interface or a pointer.

Nil types also have a magical property in that they adhere to any interface in spite of not actually satisfying any of the requirements defined in that interface.

When writing functions that return interfaces and wrap other functions it is quite easy to return a value that satisfies the interface not realizing that it could be nil and that that nil would be converted into a (!nil)nil.

One must be extra cautious when accepting interfaces as a nil value passed into a function through an interface will not be nil anymore.

func main() {

    fmt.Println("## nil value:")
    var x []struct{}
    fmt.Printf("type(val): %#v\n", x)
    check := (x == nil)
    fmt.Println("does value == nil", check)
    foo(x)

    fmt.Println("\n## nil type:")
    fmt.Printf("type(val): %#v\n", nil)
    fmt.Println("does value == nil", true)
    foo(nil)
}


func foo(in interface{}) {
    if in != nil {
        fmt.Println("Not Nil")
    } else {
        fmt.Println("Nil")
    }
}

http://play.golang.org/p/_Lsc7N3-U_

One might expect that both the first block and the second block of code would both print “Nil”, but this is not the case.

The dreaded nil != nil bug

I’m the primary author of a website engine called Hugo written in Go. One of Hugo’s features is that Hugo permits a user to download a theme and then supply their own versions of the same files in the theme without touching the theme and thus keeping the theme in sync with upstream. Hugo accomplishes this by utilizing an overlay technique where the theme files are copied to the destination first and then the local files are copied on top of of that. This technique has worked well, but has a few drawbacks. The most obvious one is that some of the operations happen needless times.

Historically Hugo has performed a full sync of these static files whenever a single file was changed. The sync doesn’t re-copy over identical files that already exist, but in the case of a file that existed in both the theme static folder and in the local static folder it would first copy over the theme and then the local one every single time.

Thinking there was a more efficient way of doing this I sought out to rewrite the way Hugo responds to a file system event to only copy over the file(s) that the operating system tells us have changed. While this sounds simple, do to our overlay approach it can be quite complex. We needed a way to have a unified view of the two different directories overlayed before we performed the sync. To accomplish this Hugo uses Afero, a powerful filesystem abstraction library which provides a variety of interoperable backends. One of the backends does exactly what Hugo needed. It provides a single view of two different filesystems. In our case we utilized the operating system backend and a basePath backend stacked together. While this may sound complex, it is actually quite simple.

base := afero.NewReadOnlyFs(afero.NewBasePathFs(hugofs.SourceFs, themeDir))
overlay := afero.NewReadOnlyFs(afero.NewBasePathFs(hugofs.SourceFs, staticDir))
return afero.NewCopyOnWriteFs(base, overlay)

All of the tests for these backends passed just fine. I wrote the code to use this new Afero feature as the single source for our sync. I went to try this new code on real site to really put it to the test and a very unexpected thing happened. The operation stopped at the first directory it encountered during the sync process and the following error message was generated:

ReadDir Error: invalid argument

The way Afero works with a union file system is that it checks to see if the directory is present in both sources. If it is present in both and is a directory in both then it will merge the two directories together and present a unified view.

It does this by using the following code:

if f.layer != nil { ... }
if f.base != nil { ... }

This seems totally fine but when inspecting the code further I discovered that even when layer or base was equal to nil it still executed the code inside of the if statement. I dug a bit deeper and discovered that both layer and base were of type afero.File and both were set by calling Open on the source afero.Filesystem.

bfile, _ := u.base.Open(name)
lfile, err := u.layer.Open(name)
&UnionFile{base: bfile, layer: lfile}

I eventually discovered the culprit.

func (OsFs) Open(name string) (File, error) {
    return os.Open(name)
}

The OsFs backend is just a very thin wrapper around the functions provided by the os package. This enables afero to be a drop in replacement for anywhere where the os package is being used while providing all the extra functionality like memory backed storage for testing. This backend has been in heavy use for over a year and not a single issue has been reported in using it. Every function looks just like that with a single line simply returning the identical values from os packages functions… at least that’s what I thought. As it turns out there was one very critical difference.

The function signature of os.Open

func Open(name string) (*File, error)

The os package returned a pointer to a value of type os.File. os.File completely satisfies the interface defined by afero.File, in fact it was the inspiration for it. The os.Open will return a nil in the place of the *File if there isn’t an actual file. All of the afero backends follow this same behavior of returning nil if the file isn’t present. However all of the other afero backends are not wrappers, but implementations themselves. So while os.Open returned nil, OsFs.Open wrapped that in a type when it returned where the other backends returned nil directly.

The solution turned out to be much simpler than I could have possibly imagined. It was to check to see if the value returned is nil and if it is, return nil instead of the value.

func (OsFs) Open(name string) (File, error) {
    f, e := os.Open(name)

    if f == nil {
        return nil, e
    }
    return f, e
}

Even though I understand this code (and so do you now) I still find it quite strange.