Batch importing 6.8k entries in go

What it's all about

I've recently decided to import 6.8k entries of crypto currencies into a database - namely Rethink.

I have an old side-hustle, that never got into business, but I still have fun making it. I started on this project four years ago, and just yet decided to pick it back up. Back then I were just getting started with Go, and wasn't a particularly seasoned go-dev.

In this project I had this one function that bothered the heck out of me, as it took forever to run, alas I only had to run it once every time I re-created the database. But still too much waiting time for little moi.

Read along in the next chapter.

The Structs

All of the scructs used in the below code are as follows:

type Coin struct {
    ID          string  `json:"id" gorethink:"id,omitempty"`
    Name        string  `json:"name" gorethink:"name" mapstructure:"CoinName"`
    Symbol      string  `json:"symbol" gorethink:"symbol"`
    Algorithm   string  `json:"algorithm" gorethink:"algorithm"`
    Price       float64 `json:"price" gorethink:"price"`
}

type Data struct {
    Data map[string]Coin
}

The Usual Suspect

This one function, I decided to time using time. This would be done like so:

time go run . -fetch

The fetch-flag were to tell it to populate my database, stupid name now that I think about it, but it's not the one to be questioned as for now.

The timing returned 2.5m, that's a lot of waiting time a busy little bee like me. The total response is below:

go run . -fetch  2.24s user 1.12s system 2% cpu 2:25.31 total

As you see it didn't take much CPU, but a hecka long time! So I decided to re-write it as in the next chapter.

Code

func FetchCrypto(d *db.DB) {
    dat, _ := ioutil.ReadFile("coins.json")

    var data Data
    json.Unmarshal(dat, &data)

    for _, c := range data.Data {
        c.ID = ""
        if c.Algorithm == "N/A" {
            c.Algorithm = ""
        }

        r.Table("coin").
            Insert(c).
            Exec(d.S)
    }
}

The Batch Answer

Now, how would one go along and rewrite such a beautifully old function? The answer might surprise you!

Batching! Why, just because I wanted to see if I could. And success! I done it! It takes batches of one hundred currencies for every go-routine and process that, and waits for the rest to complete.

The timing of this one is as follows:

go run . -fetch  1.89s user 0.81s system 47% cpu 5.683 total

Easy peasy, right? It was - A tad more complicated though, but I love how it turned out.

Code

func FetchCryptoV2(d *db.DB) {
    dat, _ := ioutil.ReadFile("coins.json")

    var data Data
    json.Unmarshal(dat, &data)

    var wg sync.WaitGroup
    batch := 100

    coins := []coin.Coin{}
    for _, c := range data.Data {
        coins = append(coins, c)
    }

    length := len(coins)
    for i := 0; i < length; i += batch {
        wg.Add(1)
        go func(i int) {
            b := coins[i:]
            if len(b) > batch {
                b = b[:batch]
            }

            for _, c := range b {
                c.ID = ""
                if c.Algorithm == "N/A" {
                    c.Algorithm = ""
                }

                r.Table("coin").
                    Insert(c).
                    Exec(d.S)
            }

            wg.Done()
        }(i)
    }

    wg.Wait()
}

Bonus Chapter

I also decided to make use of the slice insertion of Rethink. This is super fast!

Even faster than mine, I haven't tweaked mine to use anything other than one hundred in batch, but I suspect that lowering the batch-size will speed it up a bit.

The timing of this is:

go run . -fetch  1.02s user 0.26s system 28% cpu 4.488 total

Code

func FetchCryptoV3(d *db.DB) {
    dat, _ := ioutil.ReadFile("coins.json")

    var data Data
    json.Unmarshal(dat, &data)

    coins := []coin.Coin{}
    for _, c := range data.Data {
        c.ID = ""
        if c.Algorithm == "N/A" {
            c.Algorithm = ""
        }

        coins = append(coins, c)
    }

    r.Table("coin").
        Insert(coins).
        Exec(d.S)
}

Final thoughts

No error checking has been taken into account, but if you use any of these functions, you probably should, especially in production environments.

Again, you should play around with the batching sizes too, if you want to use anything long the lines of the second function.

Edit

I just did a new test with a batching size of 10, this more than halved the time it took to insert into the database, thus making it faster than the standard one now.

Making it a meer 2.9s.

go run . -fetch  1.67s user 0.49s system 74% cpu 2.917 total

Best,
Mads Cordes

23