Home > database >  How to make an api call faster in Golang?
How to make an api call faster in Golang?

Time:12-04

I am trying to upload bunch of files using the company's api to the storage service they provide. (basically to my account). I have got lots of files like 40-50 or something. I got the full path of the files and utilize the os.Open, so that, I can pass the io.Reader. I did try to use client.Files.Upload() without goroutines but it took so much time to upload them and decided to use goroutines. Here the implementation that I tried. When I run the program it just uploads one file which is the one that has the lowest size or something that it waits for a long time. What is wrong with it? Is it not like every time for loops run it creates a goroutine continue its cycle and creates for every file? How to make it as fast as possible with goroutines?

var filePaths []string
var wg sync.WaitGroup

// fills the string of slice with fullpath of files.
func fill() {
    filepath.Walk(rootpath, func(path string, info os.FileInfo, err error) error {
        if !info.IsDir() {
            filePaths = append(filePaths, path)
        }
        if err != nil {
            fmt.Println("ERROR:", err)
        }
        return nil
    })
}

func main() {
    fill()

    tokenSource := oauth2.StaticTokenSource(&oauth2.Token{AccessToken: token})
    oauthClient := oauth2.NewClient(context.TODO(), tokenSource)
    client := putio.NewClient(oauthClient)

    for _, path := range filePaths {
        wg.Add(1)

        go func() {
            defer wg.Done()

            f, err := os.Open(path)
            if err != nil {
                log.Println("err:OPEN", err)
            }

            upload, err := client.Files.Upload(context.TODO(), f, path, 0)
            if err != nil {
                log.Println("error uploading file:", err)
            }
            fmt.Println(upload)
        }()
    }
    wg.Wait()
}

CodePudding user response:

Consider a worker pool pattern like this: https://go.dev/play/p/p6SErj3L6Yc

In this example application, I've taken out the API call and just list the file names. That makes it work on the playground.

  • A fixed number of worker goroutines are started. We'll use a channel to distribute their work and we'll close the channel to communicate the end of the work. This number could be 1 or 1000 routines, or more. The number should be chosen based on how many concurrent API operations your putio API can reasonably be expected to support.
  • paths is a chan string we'll use for this purpose.
  • workers range over paths channel to receive new file paths to upload
package main

import (
    "fmt"
    "os"
    "path/filepath"
    "sync"
)

func main() {
    paths := make(chan string)
    var wg = new(sync.WaitGroup)
    for i := 0; i < 10; i   {
        wg.Add(1)
        go worker(paths, wg)
    }
    if err := filepath.Walk("/usr", func(path string, info os.FileInfo, err error) error {
        if err != nil {
            return fmt.Errorf("Failed to walk directory: %T %w", err, err)
        }
        if info.IsDir() {
            return nil
        }
        paths <- path
        return nil
    }); err != nil {
        panic(fmt.Errorf("failed Walk: %w", err))
    }
    close(paths)
    wg.Wait()
}

func worker(paths <-chan string, wg *sync.WaitGroup) {
    defer wg.Done()
    for path := range paths {
        // do upload.
        fmt.Println(path)
    }
}

This pattern can handle an indefinitely large amount of files without having to load the entire list in memory before processing it. As you can see, this doesn't make the code more complicated - actually, it's simpler.

When I run the program it just uploads one file which is the one

Function literals inherit the scope in which they were defined. This is why our code only listed one path - the path variable scope in the for loop was shared to each go routine, so when that variable changed, all routines picked up the change.

Avoid function literals unless you actually want to inherit scope. Functions defined at the global scope don't inherit any scope, and you must pass all relevant variables to those functions instead. This is a good thing - it makes the functions more straightforward to understand and makes variable "ownership" transitions more explicit.

An appropriate case to use a function literal could be for the os.Walk parameter; its arguments are defined by os.Walk so definition scope is one way to access other values - such as paths channel, in our case.

Speaking of scope, global variables should be avoided unless their scope of usage is truly global. Prefer passing variables between functions to sharing global variables. Again, this makes variable ownership explicit and makes it easy to understand which functions do and don't access which variables. Neither your wait group nor your filePaths have any cause to be global.

            f, err := os.Open(path)

Don't forget to close any files you open. When you're dealing with 40 or 50 files, letting all those open file handles pile up until the program ends isn't so bad, but it's a time bomb in your program that will go off when the number of files exceeds the ulimit of allowed open files. Because the function execution greatly exceeds the part where the file needs to be open, defer doesn't make sense in this case. I would use an explicit f.Close() after uploading the file.

  • Related