Home > OS >  how to get unique value in the pandas column?
how to get unique value in the pandas column?

Time:03-31

I have 2 dataframe as below:

df.head(10)
        key program
    0   A   emp
    1   A   dep
    2   A   emp
    3   A   dep
    4   A   dep
    5   B   emp
    6   B   dep
    7   B   emp
    8   B   emp
    9   B   emp
df1.head()
key program value1  value2
0   A   emp 10000   100000
1   A   dep 5000    30000
2   B   emp 20000   40000
3   B   dep 3000    6000

then I merge 2 df by 'key' and 'program'

df_merge = df.merge(df1,how='left',left_on=['key','program'],right_on=['key','program'])
df_merge.head(10)
    key program value1  value2
0   A   emp 10000   100000
1   A   dep 5000    30000
2   A   emp 10000   100000
3   A   dep 5000    30000
4   A   dep 5000    30000
5   B   emp 20000   40000
6   B   dep 3000    6000
7   B   emp 20000   40000
8   B   emp 20000   40000
9   B   emp 20000   40000

I would like to keep unique value in column 'value1' and 'values' base 'key' and 'program', could you please assist how I can do that ? output expected like below:

 key program value1  value2
0   A   emp 10000   100000
1   A   dep 5000    30000
2   A   emp    
3   A   dep     
4   A   dep     
5   B   emp 20000   40000
6   B   dep 3000    6000
7   B   emp    
8   B   emp   
9   B   emp    

CodePudding user response:

You can modify your merge by creating a new index column:

df_merge = (
  df.merge(df1, how='left',
           left_on=['key', 'program', df.groupby(['key', 'program']).cumcount()],
           right_on=['key', 'program', df1.groupby(['key', 'program']).cumcount()])
   .drop(columns='key_2')
)

Output:

>>> df_merge
  key program   value1    value2
0   A     emp  10000.0  100000.0
1   A     dep   5000.0   30000.0
2   A     emp      NaN       NaN
3   A     dep      NaN       NaN
4   A     dep      NaN       NaN
5   B     emp  20000.0   40000.0
6   B     dep   3000.0    6000.0
7   B     emp      NaN       NaN
8   B     emp      NaN       NaN
9   B     emp      NaN       NaN

CodePudding user response:

Use DataFrame.duplicated with set values of columns value1, value2 to empty strings:

df_merge.loc[df_merge.duplicated(['key','value1','value2']), ['value1','value2']] = ''
print (df_merge)
  key program value1  value2
0   A     emp  10000  100000
1   A     dep   5000   30000
2   A     emp               
3   A     dep               
4   A     dep               
5   B     emp  20000   40000
6   B     dep   3000    6000
7   B     emp               
8   B     emp               
9   B     emp               
  • Related