Input dataframe
A B
n1 "joe,jack"
n2 "kelly,john"
n3 "adam,sam"
n1 "jack,frank"
n3 "rita"
n4 "steve, buck"
n2 "john, kelly, peter"
Based on index column A
, I want to concat text, seperated with comma
(,). So the expected output would look like(any instance of repetition is taken only once)
A B
n1 joe,jack,frank
n2 kelly,john,peter
n3 adam,sam,rita
n4 steve, buck
CodePudding user response:
Use GroupBy.agg
with custom function with split, set comprehension and join if order is not important:
f = lambda x: ','.join(set([z for y in x for z in y.replace(', ',',').split(',')]))
df = df.groupby('A')['B'].agg(f).reset_index()
print (df)
A B
0 n1 jack,joe,frank
1 n2 john,kelly,peter
2 n3 adam,rita,sam
3 n4 steve,buck
If order is important for remove duplicated use dict.fromkeys
trick:
f = lambda x:','.join(dict.fromkeys([z for y in x for z in y.replace(', ',',').split(',')]))
df = df.groupby('A')['B'].agg(f).reset_index()
print (df)
A B
0 n1 joe,jack,frank
1 n2 kelly,john,peter
2 n3 adam,sam,rita
3 n4 steve,buck