Home > Net >  Pivot and combine data, with duplicate IDs
Pivot and combine data, with duplicate IDs

Time:10-06

I have the following dataframe:

ID     Day             StartTime  EndTime
819    Monday          12:30:00   19:00:00
819    Tuesday         12:30:00   19:00:00
819    Wednesday       12:30:00   19:00:00
819    Thursday        12:30:00   19:00:00
819    Friday          12:30:00   19:00:00
823    Monday          09:00:00   12:00:00
823    Tuesday         09:00:00   12:00:00
823    Wednesday       09:00:00   12:00:00
823    Thursday        09:00:00   12:00:00
823    Friday          09:00:00   12:00:00

You can reproduce it by passing inside a df = pd.DataFrame(sample_dict), the following dictionary:

[{'ID': 819, 'Day': 'Monday', 'StartTime': '12:30:00', 'EndTime': '19:00:00'},
 {'ID': 819, 'Day': 'Tuesday', 'StartTime': '12:30:00', 'EndTime': '19:00:00'},
 {'ID': 819,
  'Day': 'Wednesday',
  'StartTime': '12:30:00',
  'EndTime': '19:00:00'},
 {'ID': 819,
  'Day': 'Thursday',
  'StartTime': '12:30:00',
  'EndTime': '19:00:00'},
 {'ID': 819, 'Day': 'Friday', 'StartTime': '12:30:00', 'EndTime': '15:30:00'},
 {'ID': 823, 'Day': 'Monday', 'StartTime': '09:00:00', 'EndTime': '12:00:00'},
 {'ID': 823, 'Day': 'Tuesday', 'StartTime': '09:00:00', 'EndTime': '12:00:00'},
 {'ID': 823,
  'Day': 'Wednesday',
  'StartTime': '09:00:00',
  'EndTime': '12:00:00'},
 {'ID': 823,
  'Day': 'Thursday',
  'StartTime': '09:00:00',
  'EndTime': '12:00:00'},
 {'ID': 823, 'Day': 'Friday', 'StartTime': '09:00:00', 'EndTime': '12:00:00'}]

What I'm trying to achieve, is to keep a single row for each ID. I built the table manually, that should look like this:

ID  MondayStartTime MondayEndTime   TuesdayStartTime    TuesdayEndTime  WednesdayStartTime  WednesdayEndTime    ThursdayStartTime   ThursdayEndTime FridayStartTime FridayEndTime
819 12:30:00        19:00:00        12:30:00            19:00:00        12:30:00     19:00:00   12:30:00    19:00:00    12:30:00    19:00:00
823 09:00:00        12:00:00        09:00:00            12:00:00        09:00:00     12:00:00   09:00:00    12:00:00    09:00:00    12:00:00

Maybe I'm not even close. But I was trying to build something with this code:

df.pivot(index=['SiteID', 'DayID'], columns=['StartTime', 'EndTime'])

But I get nothing from the columns. Any ideas?

CodePudding user response:

#pivot to reformat the DF
df2=df.pivot(index='ID', columns=['Day'], values=['StartTime','EndTime']).reset_index()

# reformat the columns
df2.columns=[ (col[1]   col[0]) for col in df2.columns]
df2


    ID  FridayStartTime     MondayStartTime     ThursdayStartTime   TuesdayStartTime    WednesdayStartTime  FridayEndTime   MondayEndTime   ThursdayEndTime     TuesdayEndTime  WednesdayEndTime
0   819     12:30:00    12:30:00    12:30:00    12:30:00    12:30:00    19:00:00    19:00:00    19:00:00    19:00:00    19:00:00
1   823     09:00:00    09:00:00    09:00:00    09:00:00    09:00:00    12:00:00    12:00:00    12:00:00    12:00:00    12:00:00

CodePudding user response:

One option is with pivot_wider from pyjanitor, specifically the dev version:

# pip install git https://github.com/pyjanitor-devs/pyjanitor.git
import pandas as pd
import janitor

df.pivot_wider(
    index = 'ID', 
    names_from = 'Day',
    values_from = ['StartTime', 'EndTime'],
    names_glue = "{Day}{_value}"
   )

    ID FridayStartTime MondayStartTime ThursdayStartTime TuesdayStartTime WednesdayStartTime FridayEndTime MondayEndTime ThursdayEndTime TuesdayEndTime WednesdayEndTime
0  819        12:30:00        12:30:00          12:30:00         12:30:00           12:30:00      15:30:00      19:00:00        19:00:00       19:00:00         19:00:00
1  823        09:00:00        09:00:00          09:00:00         09:00:00           09:00:00      12:00:00      12:00:00        12:00:00       12:00:00         12:00:00

You can use the names_glue to reorder the column names - _value represents column names in values_from

  • Related