I have a column named work records have data like this :
Records |
---|
Name: hours on date, Name: hours on date |
Aya: 20 on 18/9/2021, Asmaa: 10 on 20/9/2021, Aya: 20 on 20/9/2021 |
I want to reach a structure for this column, as so when i try to aggregate on a range of dates say (from 1/9/2021 until 30/9/2021), gives me the total hours spent by each name.
I tried changing the column to a list then to a dictionary, but it is not working.
how can i change this column structure in python? should i use regex?
{18/9/2021 : {Aya:20}, 20/9/2021 : {Asmaa:10}, 20/9/2021 : {Aya:20} }
CodePudding user response:
You can use a dict here, but it will have to be nested, because you have multiple entries per date.
import pandas as pd
df = pd.DataFrame({'Records': ['Name: hours on date, Name: hours on date',
'Aya: 20 on 18/9/2021, Asmaa: 10 on 20/9/2021, Aya: 20 on 20/9/2021']})
# Keep only rows that have the actual data
data = df.loc[~df['Records'].str.contains('Name')]
# Split on the comma delimiter and explode into a unique row per employee
data = data['Records'].str.split(',').explode()
# Use regex to capture the relevant data and construct the dictionary
data = data.str.extract('([a-zA-z] )\:\s(\d{1,2})\son\s(\d{1,2}\/\d{1,2}\/\d{4})').reset_index(drop=True)
data.groupby(2).apply(lambda x: dict(zip(x[0],x[1]))).to_dict()
Output
{'18/9/2021': {'Aya': '20'}, '20/9/2021': {'Asmaa': '10', 'Aya': '20'}}