How to convert a column in a dataframe to a nested dictionary in python?-CodePudding

I have a column named work records have data like this :

Records
Name: hours on date, Name: hours on date
Aya: 20 on 18/9/2021, Asmaa: 10 on 20/9/2021, Aya: 20 on 20/9/2021

I want to reach a structure for this column, as so when i try to aggregate on a range of dates say (from 1/9/2021 until 30/9/2021), gives me the total hours spent by each name.

I tried changing the column to a list then to a dictionary, but it is not working.

how can i change this column structure in python? should i use regex?

{18/9/2021 : {Aya:20}, 20/9/2021 : {Asmaa:10}, 20/9/2021 : {Aya:20} }

CodePudding user response：

You can use a dict here, but it will have to be nested, because you have multiple entries per date.

import pandas as pd
df = pd.DataFrame({'Records': ['Name: hours on date, Name: hours on date',
  'Aya: 20 on 18/9/2021, Asmaa: 10 on 20/9/2021, Aya: 20 on 20/9/2021']})

# Keep only rows that have the actual data
data = df.loc[~df['Records'].str.contains('Name')]

# Split on the comma delimiter and explode into a unique row per employee
data = data['Records'].str.split(',').explode()

# Use regex to capture the relevant data and construct the dictionary
data = data.str.extract('([a-zA-z] )\:\s(\d{1,2})\son\s(\d{1,2}\/\d{1,2}\/\d{4})').reset_index(drop=True)

data.groupby(2).apply(lambda x: dict(zip(x[0],x[1]))).to_dict()

Output

{'18/9/2021': {'Aya': '20'}, '20/9/2021': {'Asmaa': '10', 'Aya': '20'}}