I have the data frame as follows:
df = pd.DataFrame({
'ID': [12, 12, 15, 15, 16, 17, 17],
'Name': ['A', 'A', 'B', 'B', 'C', 'D', 'D'],
'Date':['2019-12-20' ,'2018-12-20' ,'2017-12-20' , '2016-12-20', '2015-12-20', '2014-12-20', '2013-12-20'],
'Color':['Black', 'Blue', 'Red' , 'Yellow' , 'White' , 'Sky' , 'Green']
})
or data table:
ID Name Date Color
0 12 A 2019-12-20 Black
1 12 A 2018-12-20 Blue
2 15 B 2017-12-20 Red
3 15 B 2016-12-20 Yellow
4 16 C 2015-12-20 White
5 17 D 2014-12-20 Sky
6 17 D 2013-12-20 Green
My desired result would be as below table. How could I get that?
ID Name Date Color Date_ Color_
0 12 A 2019-12-20 Black 2018-12-20 Blue
1 15 B 2017-12-20 Red 2016-12-20 Yellow
2 16 C 2015-12-20 White 2015-12-20 White
3 17 D 2014-12-20 Sky 2013-12-20 Green
I need your help, thanks in advance!
CodePudding user response:
Use virtual groups to set each row to a column. The rest is just formatting.
# Identify target column for each row
out = df.assign(col=df.groupby('Name').cumcount().astype(str)) \
.pivot(index=['ID', 'Name'], columns='col', values=['Date', 'Color']) \
.ffill(axis=1)
# Sort columns according your output
out = out.sort_index(level=[1, 0], axis=1, ascending=[True, False])
# Flat the multiindex column
out.columns = out.columns.to_flat_index().str.join('_')
# Reset index
out = out.reset_index()
Output:
>>> out
ID Name Date_0 Color_0 Date_1 Color_1
0 12 A 2019-12-20 Black 2018-12-20 Blue
1 15 B 2017-12-20 Red 2016-12-20 Yellow
2 16 C 2015-12-20 White 2015-12-20 White
3 17 D 2014-12-20 Sky 2013-12-20 Green
After pivot
, your dataframe looks like:
>>> df.assign(col=df.groupby('Name').cumcount().astype(str)) \
.pivot(index=['ID', 'Name'], columns='col', values=['Date', 'Color']) \
.ffill(axis=1)
Date Color
col 0 1 0 1
ID Name
12 A 2019-12-20 2018-12-20 Black Blue
15 B 2017-12-20 2016-12-20 Red Yellow
16 C 2015-12-20 2015-12-20 White White
17 D 2014-12-20 2013-12-20 Sky Green
CodePudding user response:
Try with this approach:
result = (
df.merge(df, on=['ID', 'Name'])
.drop_duplicates(['ID', 'Name', 'Date_x', 'Color_x'], keep='last')
.drop_duplicates(['ID', 'Name', 'Date_y', 'Color_y'])
.rename(columns={'Date_x': 'Date',
'Color_x': 'Color',
'Date_y': 'Date_',
'Color_y': 'Color_'})
)
This is the result:
ID Name Date Color Date_ Color_
1 12 A 2019-12-20 Black 2018-12-20 Blue
5 15 B 2017-12-20 Red 2016-12-20 Yellow
8 16 C 2015-12-20 White 2015-12-20 White
10 17 D 2014-12-20 Sky 2013-12-20 Green
CodePudding user response:
Algorithmic rather than a pythonic way:
df['count'] = df['ID'].map(df['ID'].value_counts())
df = pd.concat([df, df[df['count'] == 1]]).drop(columns=['count'])
df['s'] = range(len(df))
df = df.merge(df, left_on='ID', right_on='ID')
df = df[df['s_x'] < df['s_y']].drop(columns=['s_x', 's_y', 'Name_y']).rename(
columns={
'Name_x': 'Name',
'Date_x': 'Date',
'Color_x': 'Color',
'Date_y': 'Date_',
'Color_y': 'Color_'})
print (df)
Output:
ID Name Date Color Date_ Color_
1 12 A 2019-12-20 Black 2018-12-20 Blue
5 15 B 2017-12-20 Red 2016-12-20 Yellow
9 16 C 2015-12-20 White 2015-12-20 White
13 17 D 2014-12-20 Sky 2013-12-20 Green