Home > Back-end >  How to convert parquet-go []interface{} to slice of structs?
How to convert parquet-go []interface{} to slice of structs?

Time:10-25

I am working on reading parquet file as shown below. Below code read parquet file and converts them to ParquetProduct struct which I use it later on to get data out of it.

func (r *clientRepository) read(logg log.Prot, file string, bucket string) error {
    var err error
    fr, err := pars3.NewS3FileReader(context.Background(), bucket, file, r.s3Client.GetSession().Config)
    if err != nil {
        return errs.Wrap(err)
    }
    defer xio.CloseIgnoringErrors(fr)

    pr, err := reader.NewParquetReader(fr, nil, int64(r.cfg.Workers))
    if err != nil {
        return errs.Wrap(err)
    }

    if pr.GetNumRows() == 0 {
        logg.Infof("Skipping %s due to 0 rows", file)
        return nil
    }

    for {
        rows, err := pr.ReadByNumber(r.cfg.RowsToRead)
        if err != nil {
            return errs.Wrap(err)
        }
        if len(rows) <= 0 {
            break
        }

        // doing Marshal here first
        byteSlice, err := json.Marshal(rows)
        if err != nil {
            return errs.Wrap(err)
        }

        var productRows []ParquetProduct
        // and then Unmarshal here
        err = json.Unmarshal(byteSlice, &productRows)
        if err != nil {
            return errs.Wrap(err)
        }

        //.....
        // use productRows here
        //.....
    
    }
    return nil
}

Problem Statement

I am doing Marshal first and then Unmarshalling to get the required object. Is there any way to avoid all this. ReadByNumber function (of parquet-go library) returns []interface{} so is there anyway to get my []ParquetProduct struct back just from the []interface{}?

I am using go 1.19. This is the library I am using to read parquet file - https://github.com/xitongsys/parquet-go

Is there any better and efficient way to do this overall?

CodePudding user response:

Instead of using ReadByNumer, make a slice of []ParquetProduct with the desired length and use Read.

products := make([]ParquetProduct, r.cfg.RowsToRead) 
// ^ slice with length and capacity equal to r.cfg.RowsToRead
err = pr.Read(&products)
if err != nil {
    // ...
}   
  • Related