Home > Software engineering >  Add value to new column depending on values in another in pandas
Add value to new column depending on values in another in pandas

Time:12-01

I have a dataframe such as

Names Values 
A     0.20
A     1.30
A     1.2
B     0.30
B     0.40
C     1.2
D     0.70
E     0.12
E     1.3
F     0.90
F     0.78
F     0.88

And I would like to add to a New_col the number :

  • 1 where for each Names with at least one Values > 0.75 and one Values < 0.75
  • 0 for each Names with only Values > 0.75
  • 2 for each Names with only Values < 0.75

I should then get:

Names Values New_col
A     0.20   1
A     1.30   1
A     1.2    1
B     0.30   2
B     0.40   2
C     1.2    0
D     0.70   2
E     0.12   1
E     1.3    1
F     0.90   2
F     0.78   2
F     0.88   2

CodePudding user response:

First test by condition for compare threshold 0.75, get names if match at least one value, compare again membership of Names and last pass to numpy.select:

m = df.Values > 0.75

s1 = df.loc[m, 'Names'].unique()
s2 = df.loc[~m, 'Names'].unique()

m1 = df['Names'].isin(s1)
m2 = df['Names'].isin(s2)

df['New_col'] = np.select([m1 & ~m2, ~m1 & m2], [0, 2], default=1)
print (df)
   Names  Values  New_col
0      A    0.20        1
1      A    1.30        1
2      A    1.20        1
3      B    0.30        2
4      B    0.40        2
5      C    1.20        0
6      D    0.70        2
7      E    0.12        1
8      E    1.30        1
9      F    0.90        0
10     F    0.78        0
11     F    0.88        0

If need another ouput for only 0.75 values per names use:

print (df)
   Names  Values
0      A    0.20
1      A    1.30
2      A    1.20
3      B    0.30
4      B    0.40
5      C    1.20
6      D    0.70
7      E    0.12
8      E    1.30
9      F    0.90
10     F    0.78
11     F    0.88
12     G    0.75
13     G    0.75

m1 = df.Values > 0.75
m2 = df.Values < 0.75

s1 = df.loc[m1, 'Names'].unique()
s2 = df.loc[m2, 'Names'].unique()


m1 = df['Names'].isin(s1)
m2 = df['Names'].isin(s2)

df['New_col'] = np.select([m1 & ~m2, ~m1 & m2, m1 & m2], 
                          [0, 2, 1], default=None)

print (df)
 Names  Values New_col
0      A    0.20       1
1      A    1.30       1
2      A    1.20       1
3      B    0.30       2
4      B    0.40       2
5      C    1.20       0
6      D    0.70       2
7      E    0.12       1
8      E    1.30       1
9      F    0.90       0
10     F    0.78       0
11     F    0.88       0
12     G    0.75    None
13     G    0.75    None

CodePudding user response:

df = pd.DataFrame({"Names":['A','A','A','B','B','C','D','E','E','F','F','F'], "Values":[0.20,1.30,1.2,0.30,0.40,1.2,0.70,0.12,1.3,0.90,0.78,0.88]})

df["New_col"] = None
for val in set(df.Names):
    flags = [True if x>0.75 else False for x in df[df['Names']==val].Values ]
    
    if sum(flags)==0: 
        df.loc[ df['Names']==val, "New_col"] = 2
        
    elif sum(flags)==len(df[df['Names']==val]): 
        df.loc[ df['Names']==val, "New_col"] = 0
        
    else:
        df.loc[ df['Names']==val, "New_col"] = 1

Output:

    Names   Values  New_col
0   A   0.20    1
1   A   1.30    1
2   A   1.20    1
3   B   0.30    2
4   B   0.40    2
5   C   1.20    0
6   D   0.70    2
7   E   0.12    1
8   E   1.30    1
9   F   0.90    0
10  F   0.78    0
11  F   0.88    0

Respect to your question the values for the "F" Nnames columns should be 0 instead of 2

CodePudding user response:

Am a bit late to the party, but you could use a groupby approach:

df = df.merge(df.groupby(by="Names").apply(lambda x: 0 if all(x['Values']>0.75) else (2 if all(x['Values']<0.75) else 1)).reset_index())

Here is the full code:

import pandas as pd
import numpy as np

df = pd.DataFrame({ 'Names': ['A', 'A', 'A', 'B', 'B', 'C', 'D', 'E', 'E', 'F', 'F', 'F'],
                    'Values': [0.2, 1.3, 1.2, 0.3, 0.4, 1.2, 0.7, 0.12, 1.3, 0.9, 0.78, 0.88]})

df = df.merge(df.groupby(by="Names").apply(lambda x: 0 if all(x['Values']>0.75) else (2 if all(x['Values']<0.75) else 1)).reset_index())
df.columns = ['Names', 'Values', 'New_col']

print(df)

OUTPUT:

   Names  Values  New_col
0      A    0.20        1
1      A    1.30        1
2      A    1.20        1
3      B    0.30        2
4      B    0.40        2
5      C    1.20        0
6      D    0.70        2
7      E    0.12        1
8      E    1.30        1
9      F    0.90        0
10     F    0.78        0
11     F    0.88        0
  • Related