I have the following dataframe:
import pandas as pd
df = pd.DataFrame.from_dict({'data_point_key': {28: 'a',
30: 'b',
31: 'c',
32: 'd',
33: 'e',
34: 'f',
55: 'a',
56: 'b',
57: 'c',
58: 'd',
59: 'e',
60: 'f',
61: 'a',
63: 'b',
64: 'c',
65: 'd',
66: 'e',
67: 'f',
88: 'a',
89: 'b',
90: 'c',
91: 'd',
92: 'e',
93: 'f'},
'val': {28: '8',
30: 'A',
31: '8',
32: 'kuku',
33: 'lulu',
34: 'DE',
55: '8',
56: 'B',
57: '8',
58: 'kuku',
59: 'lulu',
60: 'DE',
61: '8',
63: 'A',
64: '8',
65: 'kuku',
66: 'lulu',
67: 'DE',
88: '8',
89: 'B',
90: '8',
91: 'kuku',
92: 'lulu',
93: 'DE'},
'request_id': {28: 4,
30: 4,
31: 4,
32: 4,
33: 4,
34: 4,
55: 4,
56: 4,
57: 4,
58: 4,
59: 4,
60: 4,
61: 7,
63: 7,
64: 7,
65: 7,
66: 7,
67: 7,
88: 7,
89: 7,
90: 7,
91: 7,
92: 7,
93: 7}})
I am trying to do the following:
pd.crosstab(index=df['request_id'],
columns=df['data_point_key'],
values=df['val'],
aggfunc=','.join)
And getting the following duplicates due to the aggfunc:
How can I avoid these duplicates with commas?
CodePudding user response:
You can remove duplicates before pivoting seems cleanest way in my opinion:
df = df.drop_duplicates(['request_id','data_point_key','val'])
df1 = pd.crosstab(index=df['request_id'],
columns=df['data_point_key'],
values=df['val'],
aggfunc=','.join)
print (df1)
data_point_key a b c d e f
request_id
4 8 A,B 8 kuku lulu DE
7 8 A,B 8 kuku lulu DE
Alternative is change aggfunc
- if order is important:
df1 = (df.pivot_table(index='request_id',
columns='data_point_key',
values='val',
aggfunc=lambda x: ','.join(dict.fromkeys(x).keys())))
print (df1)
data_point_key a b c d e f
request_id
4 8 A,B 8 kuku lulu DE
7 8 A,B 8 kuku lulu DE
Or if order isnot important:
df1 = pd.crosstab(index=df['request_id'],
columns=df['data_point_key'],
values=df['val'],
aggfunc=lambda x: ','.join(set(x)))
print (df1)
data_point_key a b c d e f
request_id
4 8 B,A 8 kuku lulu DE
7 8 B,A 8 kuku lulu DE