I have a dataframe in which one column contains day and its time, I want to put that each day and its time in its respective column.
I have put a '$' in each day to either split or use it to put it in its respective column.
import pandas as pd
data = [{'timings' : 'Friday 10 am - 6:30 pm$Saturday 10am-6:30pm$Sunday Closed$Monday 10am-6:30pm$Tuesday 10am-6:30pm$Wednesday 10am-6:30pm$Thursday 10am-6:30pm',
'monday':'','tuesday':'','wednesday':'','thursday':'','friday':'','saturday':'','sunday':''
}]
df = pd.DataFrame.from_dict(data)
For e.g.: Data contains df['timing'] = "friday 10 am, saturday 6:30pm", then in df['friday'] = '10 am' and df['saturday'] = '6:30pm'.
I dont know how to put it in words.
Please me solve this problem.
CodePudding user response:
Use nested list comprehension for list of dictionaries, then pass to DataFrame
constructor:
L = [dict(y.split(maxsplit=1) for y in x.split('$')) for x in df['timings']]
df = pd.DataFrame(L, index=df.index)
print (df)
Friday Saturday Sunday Monday Tuesday \
0 10 am - 6:30 pm 10am-6:30pm Closed 10am-6:30pm 10am-6:30pm
Wednesday Thursday
0 10am-6:30pm 10am-6:30pm
CodePudding user response:
You can use str.extractall
to extract the day name and times and then reshaping the DataFrame:
(df['timings'].str.extractall(r'(?P<day>[^$\s] )\s ([^$] )')
.droplevel('match')
.set_index('day', append=True)[1].unstack('day')
)
Output:
day Friday Monday Saturday Sunday Thursday Tuesday Wednesday
0 10 am - 6:30 pm 10am-6:30pm 10am-6:30pm Closed 10am-6:30pm 10am-6:30pm 10am-6:30pm
If you want to keep the original order of the days:
(df['timings'].str.extractall('(?P<day>[^$\s] )\s ([^$] )')
.set_index('day', append=True)[1].unstack(['match', 'day'])
.droplevel('match', axis=1)
)
Output:
day Friday Saturday Sunday Monday Tuesday Wednesday Thursday
0 10 am - 6:30 pm 10am-6:30pm Closed 10am-6:30pm 10am-6:30pm 10am-6:30pm 10am-6:30pm
Alternative to sort based on a custom order (here Friday first):
from calendar import day_name
sorter = pd.Series({d: (i 3)%7 for i,d in enumerate(day_name)})
out = (df['timings']
.str.extractall('(?P<day>[^$\s] )\s ([^$] )')
.droplevel('match')
.set_index('day', append=True)[1].unstack('day')
.sort_index(axis=1, key=sorter.get)
)