Home > Mobile >  Is there a way to provide schema or auto-detect schema when uploading csv from GCS to BigQuery?
Is there a way to provide schema or auto-detect schema when uploading csv from GCS to BigQuery?

Time:03-31

I am trying to upload a csv file from Google Cloud Storage (GCS) to BigQuery (BQ) and auto-detect schema.

What I tried to do is enable auto-detect schema and enter the number of rows to skip in "Header rows to skip" option.

According to Google's documentation in: https://cloud.google.com/bigquery/docs/schema-detect#auto-detect:

"The field types are based on the rows having the most fields. Therefore, auto-detection should work as expected as long as there is at least one row of data that has values in every column/field."

The problem with my CSV is that the above condition is not met. Also, my CSV contains many rows which do not include any numerical values which I think adds an extra complexity for Google's schema auto detection.

The auto detect is not detecting the correct column names nor the correct field types. BQ is detecting all my field types as strings and assigning column names as such: string_field_0 , string_field_1, string_field_3 ,...etc. It is also passing the column names of my CSV as a row of data.

I would like to know what I can do to correctly upload this CSV to BQ with skipping the leading unwanted rows and having the correct schema (field names and field types).

CodePudding user response:

You can try using tools like bigquery-schema-generator to generate the schema from your csv file and then use it in a bq load job for example.

CodePudding user response:

After reading some of the documentation, specifically the CVS header section I think what you're observing is the expected behavior.

An alternative would be to manually specify the schema for the data.

  • Related