My dataframe looks like below and I want to use sales column and convert it into json files for each month as a list of lists.
sales | dt |
---|---|
156 | 2022-01 |
192 | 2022-01 |
147 | 2022-02 |
192 | 2022-02 |
for date in date_range:
df.loc[df['dt'] == date]['sales'].to_json(f"{out_path}/sales_{date.replace('-', '_')}.json", orient='values',indent=2)
Using this, I am getting this format of json files:
[
156,
192
]
However, I want to have:
[
[156],
[192]
]
I have created this as a toy example. I want to implement this on a very large dataframe for several months data. Can anyone point me how to achieve this? Thank you.
CodePudding user response:
I don't know the reason of the single value in a array, but this code works:
First cell:
months_sales = []
sales = []
for index, row in df.iterrows():
if [row["dt"]] in months_sales:
index_month = months_sales.index([row["dt"]])
sales[index_month].append([row["sales"]])
else:
months_sales.append([row["dt"]])
sales.append([ [row["sales"]] ])
Second cell:
for index, month in enumerate(months_sales):
with open(str(month[0]) '.json', 'w') as f:
f.write(str(sales[index]))
files.download(str(month[0]) '.json')
You just need to change the path of the file (idk what environment/libs you are using).
The result is a file with the name of the 'dt' column, with the values. Ex:
[
[156],
[192]
]
edit:* libs:
import pandas as pd
from google.colab import files
import json
import io
CodePudding user response:
You can try:
df = df.groupby('dt',as_index=False).agg({'sales':list})
df['sales'] = df['sales'].apply(lambda x: [[e] for e in x])
df.apply(lambda row: pd.Series(row['sales']).to_json(
f"{out_path}/sales_{row['dt'].replace('-','_')}.json",
orient='values',indent=2), axis=1)