I have a dataframe as follows:
data = {
'Title': ['001C', '001C', '004C', '001C', '004C', '004C', '007C', '010C'],
'Items': ['A', 'B', 'D', 'A', 'A', 'K', 'L', 'M']
}
df = pd.DataFrame(data)
df
Title Items
0 001C A
1 001C B
2 004C D
3 001C A
4 004C A
5 004C K
6 007C L
7 010C M
I want to get the Items
under each Title
without any redundancy. The expected output is
001C 004C 007C 010C
0 A D L M
1 B A
2 K
CodePudding user response:
You can drop_duplicates
, assign
a helper column with increasing indexes per Item per group, and pivot
:
(df.drop_duplicates(subset=['Title', 'Items'])
.assign(index=df.groupby('Title').cumcount())
.pivot(index='index', columns='Title', values='Items')
.rename_axis(index=None, columns=None)
#.fillna('') # uncomment if you want empty strings in place of NaNs
)
output:
001C 004C 007C 010C
0 A D L M
1 B A NaN NaN
2 NaN K NaN NaN
CodePudding user response:
You can also use .drop_duplicates()
.pivot()
. Then, relocate the non-NaN values of each column to the top by .dropna()
, as follows:
(df.drop_duplicates()
.pivot(columns='Title', values='Items')
.apply(lambda x: pd.Series(x.dropna().values))
.rename_axis(columns=None)
)
Result:
001C 004C 007C 010C
0 A D L M
1 B A NaN NaN
2 NaN K NaN NaN