I have a datframe like this:
import pandas as pd
df = pd.DataFrame({'val':['a', 'a', 'b', 'a', 'c'], 'g_1':[0, 0, 1,0,2], 'g_2':[0, 0, 0,0,1]})
Now, to get the unique values of column g_1
for all unique values of column val
, I do something like this:
print(df['g_1'].groupby(df['val']).unique().apply(pd.Series))
0
val
a 0
b 1
c 2
However, I would like to add column g_2
as well, but seems I get this error:
print(df[['g_1', 'g_2']].groupby(df['val']).unique().apply(pd.Series))
I am looking to get something like this:
g_1 g_2
val
a 0 0
b 1 0
c 2 1
CodePudding user response:
Just pull the non-duplicates using df.duplicated()
.
df[~df.duplicated()].set_index('val')
CodePudding user response:
Use np.unique
as agg
function of groupby
:
import numpy as np
>>> df.groupby('val')[['g_1', 'g_2']].agg(np.unique)
g_1 g_2
val
a 0 0
b 1 0
c 2 1
CodePudding user response:
print(df.groupby(df['val']).agg({g:lambda x:x.unique() for g in df.columns[1:]}))
# g_1 g_2
# val
# a 0 0
# b 1 0
# c 2 1