How transform data from dataframe
pd.DataFrame(
[
['2021-12-14 12:00:00','subgroup_1','group_1','Subgroup 1'],
['2021-12-14 12:15:00','subgroup_1','group_1','Subgroup 1'],
['2021-12-14 12:15:00','subgroup_1','group_1','Subgroup 1'],
['2021-12-14 12:30:00','subgroup_1','group_1','Subgroup 1'],
['2021-12-14 12:45:00','subgroup_1','group_1','Subgroup 1'],
['2021-12-14 13:00:00','subgroup_1','group_1','Subgroup 1'],
['2021-12-14 12:30:00','subgroup_3','group_2','Subgroup 3'],
['2021-12-14 12:45:00','subgroup_3','group_2','Subgroup 3'],
['2021-12-14 13:00:00','subgroup_3','group_2','Subgroup 3'],
], columns=['timestamp','subgroup','group','name']
)
to json
use pandas. Please, help me.
CodePudding user response:
My solution :
df["timestamp"] = pd.to_datetime(df.timestamp)
out = {}
for k, df_group in df.groupby("group"):
out[k] = []
for _, df_subgroup in df_group.groupby("subgroup"):
name = df_subgroup["name"].values[0] # Assuming `name` is unique in this group
count = len(df_subgroup)
timegroup = [
{"index": k, "value": v}
for k, v in df_subgroup.groupby("timestamp")
.name.count()
.to_dict()
.items()
]
# Complete with missing timestamps
for ts in df.timestamp.unique():
if ts not in [t["index"] for t in timegroup]:
timegroup.append({"index": pd.Timestamp(ts), "value": 0})
# Sort by timestamp
timegroup = sorted(timegroup, key=lambda x: x["index"])
# Convert timestamp into strings
for t in timegroup:
t["index"] = t["index"].strftime("%Y-%m-%d %H:%M:%S")
out[k].append({"name": name, "count": count, "timegroup": timegroup})
Result in out :
{
"group_1": [
{
"name": "Subgroup 1",
"count": 6,
"timegroup": [
{
"index": "2021-12-14 12:00:00",
"value": 1
},
{
"index": "2021-12-14 12:15:00",
"value": 2
},
{
"index": "2021-12-14 12:30:00",
"value": 1
},
{
"index": "2021-12-14 12:45:00",
"value": 1
},
{
"index": "2021-12-14 13:00:00",
"value": 1
}
]
}
],
"group_2": [
{
"name": "Subgroup 3",
"count": 3,
"timegroup": [
{
"index": "2021-12-14 12:00:00",
"value": 0
},
{
"index": "2021-12-14 12:15:00",
"value": 0
},
{
"index": "2021-12-14 12:30:00",
"value": 1
},
{
"index": "2021-12-14 12:45:00",
"value": 1
},
{
"index": "2021-12-14 13:00:00",
"value": 1
}
]
}
]
}
CodePudding user response:
Here to get your desire output I've applied 4 steps:
Code:
#STEP 1 ---- to group by group
df = df.groupby(['group','name']).agg(list).reset_index('name')
#STEP 2 ---Add insert column which will be the total of timstamp list elements
df['count'] = df.apply(lambda x: len(x.timestamp),axis=1)
#STEP 3 --timestamp list value element to dictionary where add the index and value
for r,v in enumerate(df.index):
l=[]
for i in set(df['timestamp'][r]):
l.append({'index' : i, 'value' : df['timestamp'][r].count(i)})
df.at[v, 'timestamp'] = l
#STEP 4 --CONVERTNG TO JSON BY INDEX
[json.loads(df[['name','count','timestamp']].to_json(orient="index"))]
Output:
[{'group_1': {'name': 'Subgroup 1',
'count': 6,
'timestamp': [{'index': '2021-12-14 12:00:00', 'value': 1},
{'index': '2021-12-14 12:30:00', 'value': 1},
{'index': '2021-12-14 13:00:00', 'value': 1},
{'index': '2021-12-14 12:15:00', 'value': 2},
{'index': '2021-12-14 12:45:00', 'value': 1}]},
'group_2': {'name': 'Subgroup 3',
'count': 3,
'timestamp': [{'index': '2021-12-14 12:30:00', 'value': 1},
{'index': '2021-12-14 12:45:00', 'value': 1},
{'index': '2021-12-14 13:00:00', 'value': 1}]}}]
CodePudding user response:
have you tried the .to_json() method?
df = pd.DataFrame( ... )
df.to_json()