Home > OS >  Transform data from dataframe to json use pandas
Transform data from dataframe to json use pandas

Time:09-10

How transform data from dataframe

pd.DataFrame(
    [
        ['2021-12-14 12:00:00','subgroup_1','group_1','Subgroup 1'],
        ['2021-12-14 12:15:00','subgroup_1','group_1','Subgroup 1'],
        ['2021-12-14 12:15:00','subgroup_1','group_1','Subgroup 1'],
        ['2021-12-14 12:30:00','subgroup_1','group_1','Subgroup 1'],
        ['2021-12-14 12:45:00','subgroup_1','group_1','Subgroup 1'],
        ['2021-12-14 13:00:00','subgroup_1','group_1','Subgroup 1'],
        ['2021-12-14 12:30:00','subgroup_3','group_2','Subgroup 3'],
        ['2021-12-14 12:45:00','subgroup_3','group_2','Subgroup 3'],
        ['2021-12-14 13:00:00','subgroup_3','group_2','Subgroup 3'],
    ], columns=['timestamp','subgroup','group','name']
)

to json

enter image description here

use pandas. Please, help me.

CodePudding user response:

My solution :

df["timestamp"] = pd.to_datetime(df.timestamp)

out = {}

for k, df_group in df.groupby("group"):
    out[k] = []
    for _, df_subgroup in df_group.groupby("subgroup"):
        name = df_subgroup["name"].values[0]  # Assuming `name` is unique in this group
        count = len(df_subgroup)
        timegroup = [
            {"index": k, "value": v}
            for k, v in df_subgroup.groupby("timestamp")
            .name.count()
            .to_dict()
            .items()
        ]

        # Complete with missing timestamps
        for ts in df.timestamp.unique():
            if ts not in [t["index"] for t in timegroup]:
                timegroup.append({"index": pd.Timestamp(ts), "value": 0})

        # Sort by timestamp
        timegroup = sorted(timegroup, key=lambda x: x["index"])
        
        # Convert timestamp into strings
        for t in timegroup:
            t["index"] = t["index"].strftime("%Y-%m-%d %H:%M:%S")
  
        out[k].append({"name": name, "count": count, "timegroup": timegroup})

Result in out :

{
   "group_1": [
      {
         "name": "Subgroup 1",
         "count": 6,
         "timegroup": [
            {
               "index": "2021-12-14 12:00:00",
               "value": 1
            },
            {
               "index": "2021-12-14 12:15:00",
               "value": 2
            },
            {
               "index": "2021-12-14 12:30:00",
               "value": 1
            },
            {
               "index": "2021-12-14 12:45:00",
               "value": 1
            },
            {
               "index": "2021-12-14 13:00:00",
               "value": 1
            }
         ]
      }
   ],
   "group_2": [
      {
         "name": "Subgroup 3",
         "count": 3,
         "timegroup": [
            {
               "index": "2021-12-14 12:00:00",
               "value": 0
            },
            {
               "index": "2021-12-14 12:15:00",
               "value": 0
            },
            {
               "index": "2021-12-14 12:30:00",
               "value": 1
            },
            {
               "index": "2021-12-14 12:45:00",
               "value": 1
            },
            {
               "index": "2021-12-14 13:00:00",
               "value": 1
            }
         ]
      }
   ]
}

CodePudding user response:

Here to get your desire output I've applied 4 steps:

Code:

#STEP 1 ---- to group by group
df = df.groupby(['group','name']).agg(list).reset_index('name')

#STEP 2 ---Add insert column which will be the total of timstamp list elements
df['count'] = df.apply(lambda x: len(x.timestamp),axis=1)

#STEP 3 --timestamp list value element to dictionary where add the index and value
for r,v in enumerate(df.index):
    l=[]
    for i in set(df['timestamp'][r]):
        l.append({'index' : i, 'value' : df['timestamp'][r].count(i)})
    df.at[v, 'timestamp'] = l
    
    
#STEP 4 --CONVERTNG TO JSON BY INDEX
[json.loads(df[['name','count','timestamp']].to_json(orient="index"))]

Output:

[{'group_1': {'name': 'Subgroup 1',
   'count': 6,
   'timestamp': [{'index': '2021-12-14 12:00:00', 'value': 1},
    {'index': '2021-12-14 12:30:00', 'value': 1},
    {'index': '2021-12-14 13:00:00', 'value': 1},
    {'index': '2021-12-14 12:15:00', 'value': 2},
    {'index': '2021-12-14 12:45:00', 'value': 1}]},
  'group_2': {'name': 'Subgroup 3',
   'count': 3,
   'timestamp': [{'index': '2021-12-14 12:30:00', 'value': 1},
    {'index': '2021-12-14 12:45:00', 'value': 1},
    {'index': '2021-12-14 13:00:00', 'value': 1}]}}]

CodePudding user response:

have you tried the .to_json() method?

df = pd.DataFrame( ... )
df.to_json()
  • Related