How to get the unique values multiple columns for a unique value of another column in Pandas?-CodePudding

I have a datframe like this:

import pandas as pd
df = pd.DataFrame({'val':['a', 'a', 'b', 'a', 'c'], 'g_1':[0, 0, 1,0,2], 'g_2':[0, 0, 0,0,1]})

Now, to get the unique values of column g_1 for all unique values of column val, I do something like this:

print(df['g_1'].groupby(df['val']).unique().apply(pd.Series))
     0
val   
a    0
b    1
c    2

However, I would like to add column g_2 as well, but seems I get this error:

print(df[['g_1', 'g_2']].groupby(df['val']).unique().apply(pd.Series))

I am looking to get something like this:

    g_1  g_2
val   
a    0    0
b    1    0
c    2    1

CodePudding user response：

Just pull the non-duplicates using df.duplicated().

df[~df.duplicated()].set_index('val')

CodePudding user response：

Use np.unique as agg function of groupby:

import numpy as np

>>> df.groupby('val')[['g_1', 'g_2']].agg(np.unique)
     g_1  g_2
val          
a      0    0
b      1    0
c      2    1

CodePudding user response：


print(df.groupby(df['val']).agg({g:lambda x:x.unique() for g in df.columns[1:]}))

#      g_1  g_2
# val          
# a      0    0
# b      1    0
# c      2    1