Suppose we have the dataframe like below.
ID time feature1 feature2 feature3
0 1 1 10.2 2.3 60
1 2 3 8.9 -1.2 90
3 1 3 3.9 6.2 80
I want to merge all the rows corresponding to each ID into one row with the feature name modified by time column. For example, let's take ID = 1
for example
Before merge:
ID time feature1 feature2 feature3
0 1 1 10.2 2.3 60
3 1 3 3.9 6.2 80
After merge:
ID feature1_1 feature2_1 feature3_1 feature1_3 feature2_3 feature3_1
0 1 10.2 2.3 60 3.9 6.2 80
One of the problem is that, for the time column, lets say the complete set would be [1, 3, 5, 7, 9]
. However, for each ID
, it might contains only subset of [1, 3, 5, 7, 9]
, for example, [1, 3, 5]
So the final result would looks like follows
ID feature1_1 feature2_1 feature3_1 feature1_3 feature2_3 feature3_3
0 1 10.2 2.3 60 3.9 6.2 80
...
1 168 8.7 6.2 45 NaN NaN NaN
CodePudding user response:
You can use melt
to flatten your dataframe then process data and finally reshape your dataframe with pivot
:
rename_feature = lambda x: x['feature'] '_' x.groupby(['ID', 'feature']).cumcount().add(1).astype(str)
out = (df.melt(['ID', 'time'], var_name='feature')
.assign(feature=rename_feature)
.pivot('ID', 'feature', 'value')
.reset_index().rename_axis(columns=None))
Output:
>>> out
ID feature1_1 feature1_2 feature2_1 feature2_2 feature3_1 feature3_2
0 1 10.2 3.9 2.3 6.2 60.0 80.0
1 2 8.9 NaN -1.2 NaN 90.0 NaN