How to convert dataframe column into list of lists json format?-CodePudding

My dataframe looks like below and I want to use sales column and convert it into json files for each month as a list of lists.

sales	dt
156	2022-01
192	2022-01
147	2022-02
192	2022-02

for date in date_range:
        df.loc[df['dt'] == date]['sales'].to_json(f"{out_path}/sales_{date.replace('-', '_')}.json", orient='values',indent=2)

Using this, I am getting this format of json files:

[
156,
192
]

However, I want to have:

[
 [156],
 [192]
]

I have created this as a toy example. I want to implement this on a very large dataframe for several months data. Can anyone point me how to achieve this? Thank you.

CodePudding user response：

I don't know the reason of the single value in a array, but this code works:

First cell:

months_sales = []
sales = []

for index, row in df.iterrows():
    if  [row["dt"]] in months_sales:
        index_month = months_sales.index([row["dt"]])
        sales[index_month].append([row["sales"]])
    else:
        months_sales.append([row["dt"]])
        sales.append([  [row["sales"]]  ])

Second cell:

for index, month in enumerate(months_sales):
   with open(str(month[0]) '.json', 'w') as f:
     f.write(str(sales[index]))
     files.download(str(month[0]) '.json')

You just need to change the path of the file (idk what environment/libs you are using).

The result is a file with the name of the 'dt' column, with the values. Ex:

edit:* libs:

  import pandas as pd
  from google.colab import files
  import json
  import io

CodePudding user response：

You can try:

    df = df.groupby('dt',as_index=False).agg({'sales':list})
    df['sales'] = df['sales'].apply(lambda x: [[e] for e in x])
    df.apply(lambda row: pd.Series(row['sales']).to_json(
             f"{out_path}/sales_{row['dt'].replace('-','_')}.json",
             orient='values',indent=2), axis=1)