Home > Back-end >  Convert CSV to JSON, split to x JSON files and store the result to a minio bucket
Convert CSV to JSON, split to x JSON files and store the result to a minio bucket

Time:03-16

Inside of a function I have the following code that loads a CSV, converts it to JSON and uploads the converted file to a minio bucket.

    df = pd.read_csv('data.csv').to_json().encode("utf-8")   
    client.put_object(
        "bucket",
        "test.json",
        data=BytesIO(df),
    length=len(df),
    content_type='application/csv'
    )

Is it possible to iterate through the data and split the data into X json files? I tried with pandas read_csv(..iterator=False, chunksize=x) but had no luck so far.

CodePudding user response:

Something like this should probably work for you. Here the code splits the dataframe up into groups of 1000 rows and writes each group to its own JSON file in the the bucket.

df = pd.read_csv('data.csv')

X = 1000
groups = [g for _, g in df.groupby(df.index // X)]

for i, sub_df in enumerate(group):
    data = sub_df.to_json().encode("utf-8")
    client.put_object(
        "bucket",
        f"test_{i}.json",
        data=BytesIO(data),
        length=len(data),
        content_type='application/csv'
    )
    ```
  • Related