TranslateProject/sources/tech/20200516 Fatih-s question.md
DarkSun a074eda472 选题: 20200516 Fatih’s question
sources/tech/20200516 Fatih-s question.md
2020-05-25 19:32:56 +08:00

9.4 KiB
Raw Blame History

Fatihs question

A few days ago Fatih posted this question on twitter.

Im going to attempt to give my answer, however to do that I need to apply some simplifications as my previous attempts to answer it involved a lot of phrases like a pointer to a pointer, and other unhelpful waffling. Hopefully my simplified answer can be useful in building a mental framework to answer Fatihs original question.

Restating the question

Fatihs original tweet showed four different variations of json.Unmarshal. Im going to focus on the last two, which Ill rewrite a little:

package main

import (
    "encoding/json"
    "fmt"
)

type Result struct {
    Foo string `json:"foo"`
}

func main() {
    content := []byte(`{"foo": "bar"}`)
    var result1, result2 *Result

    err := json.Unmarshal(content, &result1)
    fmt.Println(result1, err) // &{bar} <nil>

    err = json.Unmarshal(content, result2)
    fmt.Println(result2, err) // <nil> json: Unmarshal(nil *main.Result)
}

Restated in words, result1 and result2 are the same type; *Result. Decoding into result1 works as expected, whereas decoding into result2 causes the json package to complain that the value passed to Unmarshal is nil. However, both values were declared without an initialiser so both would have taken on the types zero value, nil.

Eagle eyed readers will have spotted that the reason for the difference is the first invocation is passed &result1, while the second is passed result2, but this explanation is unsatisfactory because the documentation for json.Unmarshal states:

Unmarshal parses the JSON-encoded data and stores the result in the value pointed to by v. If v is nil or not a pointer, Unmarshal returns an InvalidUnmarshalError.

Which is confusing because result1 and result2 are pointers. Furthermore, without initialisation, both are nil. Now, the documentation is correct (as youd expect from a package that has been hammered on for a decade), but explaining why takes a little more investigation.

Functions receive a copy of their arguments

Every assignment in Go is a copy, this includes function arguments and return values.

package main

import (
    "fmt"
)

func increment(v int) {
    v++
}

func main() {
    v := 1
    increment(v)
    fmt.Println(v) // 1
}

In this example, increment is operating on a copy of mains v. This is because the v declared in main and increments v parameter have different addresses in memory. Thus changes to increments v cannot affect the contents of mains v.

package main

import (
    "fmt"
)

func increment(v *int) {
    *v++
}

func main() {
    v := 1
    increment(&v)
    fmt.Println(v) // 2
}

If we wanted to write increment in a way that it could affect the contents of its caller we would need to pass a reference, a pointer, to main.v.1 This example demonstrates why json.Unmarshal needs a pointer to the value to decode JSON into.

Pointers to pointers

Returning to the original question, both result1 and result2 are declared as *Result, that is, pointers to a Result value. We established that you have to pass the address of callers value to json.Unmarshal otherwise it wont be able to alter the contents of the callers value. Why then must we pass the address of result1, a **Result, a pointer to a pointer to a Result, for the operation to succeed.

To explain this another detour is required. Consider this code:

package main

import (
    "encoding/json"
    "fmt"
)

type Result struct {
    Foo *string `json:"foo"`
}

func main() {
    content := []byte(`{"foo": "bar"}`)
    var result1 *Result

    err := json.Unmarshal(content, &result1)
    fmt.Printf("%#v %v", result1, err) // &main.Result{Foo:(*string)(0xc0000102f0)} <nil>
}

In this example Result contains a pointer typed field, Foo *string. During JSON decoding Unmarshal allocated a new string value, stored the value bar in it, then placed the address of the string in Result.Foo. This behaviour is quite handy as it frees the caller from having to initialise Result.Foo and makes it easier to detect when a field was not initialised because the JSON did not contain a value. Beyond the convenience this offers for simple examples it would be prohibitively difficult for the caller to properly initialise all the reference type fields in a structure before decoding unknown JSON without first inspecting the incoming JSON which itself may be problematic if the input is coming from an io.Reader without the ability to rewind the input.

To unmarshal JSON into a pointer, Unmarshal first handles the case of the JSON being the JSON literal null. In that case, Unmarshal sets the pointer to nil. Otherwise, Unmarshal unmarshals the JSON into the value pointed at by the pointer. If the pointer is nil, Unmarshal allocates a new value for it to point to.

json.Unmarshals handling of pointer fields is clearly documented, and works as you would expect, allocating a new value whenever there is a need to decode into a pointer shaped field. It is this behaviour that gives us a hint to what is happening in the original example.

Weve seen that when json.Unmarshal encounters a field which points to nil it will allocate a new value of the correct type and assign its address the field before proceeding. Not only is does behaviour is applied recursivelyfor example in the case of a complex structure which contains pointers to other structuresbut it also applies to the value passed to Unmarshal.

package main

import (
    "encoding/json"
    "fmt"
)

func main() {
    content := []byte(`1`)
    var result *int

    err := json.Unmarshal(content, &result)
    fmt.Println(*result, err) // 1 <nil>
}

In this example result is not a struct, but a simple *int which, lacking an initialiser, defaults to nil. After JSON decoding, result now points to an int with the value 1.

Putting the pieces together

Now I think Im ready to take a shot at answering Fatihs question.

json.Unmarshal requires the address of the variable you want to decode into, otherwise it would decode into a temporary copy which would be discard on return. Normally this is done by declaring a value, then passing its address, or explicitly initialising the the value

var result1 Result
err := json.Unmarshal(content, &result1) // this is fine

var result2 = new(Result)
err = json.Unmarshal(content, result2) // and this

var result3 = &Result{}
err = json.Unmarshal(content, result3) // this is also fine

In all three cases the address that the *Result points too is not nil, it points to initialised memory that json.Unmarshal decodes into.

Now consider what happens when json.Unmarshal encounters this

var result4 *Result
err = json.Unmarshal(content, result4) // err json: Unmarshal(nil *main.Result)

result2, result3, and the expression &result1 point to a Result. However result4, even though it has the same type as the previous three, does not point to initialised memory, it points to nil. Thus, according to the examples we saw previously, before json.Unmarshal can decode into it, the memory result4 points too must be initialised.

However, because each function receives a copy of its arguments, the callers result4 variable and the copy inside json.Unmarshal are unique. json.Unmarshal can allocate a new Result value and decode into it, but it cannot alter result4 to point to this new value because it was not provided with a reference to result4, only a copy of its contents.

  1. This does not violate the everything is a copy rule, a copy of a pointer to main.v still points to main.v.
  1. Should methods be declared on T or *T
  2. Ice cream makers and data races
  3. Understand Go pointers in less than 800 words or your money back
  4. Slices from the ground up

via: https://dave.cheney.net/2020/05/16/fatihs-question

作者:Dave Cheney 选题:lujun9972 译者:译者ID 校对:校对者ID

本文由 LCTT 原创编译,Linux中国 荣誉推出