I have a dataframe such as
Names Values
A 0.20
A 1.30
A 1.2
B 0.30
B 0.40
C 1.2
D 0.70
E 0.12
E 1.3
F 0.90
F 0.78
F 0.88
And I would like to add to a New_col
the number :
1
where for eachNames
with at least oneValues > 0.75
and oneValues < 0.75
0
for eachNames
with onlyValues > 0.75
2
for eachNames
with onlyValues < 0.75
I should then get:
Names Values New_col
A 0.20 1
A 1.30 1
A 1.2 1
B 0.30 2
B 0.40 2
C 1.2 0
D 0.70 2
E 0.12 1
E 1.3 1
F 0.90 2
F 0.78 2
F 0.88 2
CodePudding user response:
First test by condition for compare threshold 0.75
, get names if match at least one value, compare again membership of Names
and last pass to numpy.select
:
m = df.Values > 0.75
s1 = df.loc[m, 'Names'].unique()
s2 = df.loc[~m, 'Names'].unique()
m1 = df['Names'].isin(s1)
m2 = df['Names'].isin(s2)
df['New_col'] = np.select([m1 & ~m2, ~m1 & m2], [0, 2], default=1)
print (df)
Names Values New_col
0 A 0.20 1
1 A 1.30 1
2 A 1.20 1
3 B 0.30 2
4 B 0.40 2
5 C 1.20 0
6 D 0.70 2
7 E 0.12 1
8 E 1.30 1
9 F 0.90 0
10 F 0.78 0
11 F 0.88 0
If need another ouput for only 0.75
values per names use:
print (df)
Names Values
0 A 0.20
1 A 1.30
2 A 1.20
3 B 0.30
4 B 0.40
5 C 1.20
6 D 0.70
7 E 0.12
8 E 1.30
9 F 0.90
10 F 0.78
11 F 0.88
12 G 0.75
13 G 0.75
m1 = df.Values > 0.75
m2 = df.Values < 0.75
s1 = df.loc[m1, 'Names'].unique()
s2 = df.loc[m2, 'Names'].unique()
m1 = df['Names'].isin(s1)
m2 = df['Names'].isin(s2)
df['New_col'] = np.select([m1 & ~m2, ~m1 & m2, m1 & m2],
[0, 2, 1], default=None)
print (df)
Names Values New_col
0 A 0.20 1
1 A 1.30 1
2 A 1.20 1
3 B 0.30 2
4 B 0.40 2
5 C 1.20 0
6 D 0.70 2
7 E 0.12 1
8 E 1.30 1
9 F 0.90 0
10 F 0.78 0
11 F 0.88 0
12 G 0.75 None
13 G 0.75 None
CodePudding user response:
df = pd.DataFrame({"Names":['A','A','A','B','B','C','D','E','E','F','F','F'], "Values":[0.20,1.30,1.2,0.30,0.40,1.2,0.70,0.12,1.3,0.90,0.78,0.88]})
df["New_col"] = None
for val in set(df.Names):
flags = [True if x>0.75 else False for x in df[df['Names']==val].Values ]
if sum(flags)==0:
df.loc[ df['Names']==val, "New_col"] = 2
elif sum(flags)==len(df[df['Names']==val]):
df.loc[ df['Names']==val, "New_col"] = 0
else:
df.loc[ df['Names']==val, "New_col"] = 1
Output:
Names Values New_col
0 A 0.20 1
1 A 1.30 1
2 A 1.20 1
3 B 0.30 2
4 B 0.40 2
5 C 1.20 0
6 D 0.70 2
7 E 0.12 1
8 E 1.30 1
9 F 0.90 0
10 F 0.78 0
11 F 0.88 0
Respect to your question the values for the "F" Nnames columns should be 0 instead of 2
CodePudding user response:
Am a bit late to the party, but you could use a groupby
approach:
df = df.merge(df.groupby(by="Names").apply(lambda x: 0 if all(x['Values']>0.75) else (2 if all(x['Values']<0.75) else 1)).reset_index())
Here is the full code:
import pandas as pd
import numpy as np
df = pd.DataFrame({ 'Names': ['A', 'A', 'A', 'B', 'B', 'C', 'D', 'E', 'E', 'F', 'F', 'F'],
'Values': [0.2, 1.3, 1.2, 0.3, 0.4, 1.2, 0.7, 0.12, 1.3, 0.9, 0.78, 0.88]})
df = df.merge(df.groupby(by="Names").apply(lambda x: 0 if all(x['Values']>0.75) else (2 if all(x['Values']<0.75) else 1)).reset_index())
df.columns = ['Names', 'Values', 'New_col']
print(df)
OUTPUT:
Names Values New_col
0 A 0.20 1
1 A 1.30 1
2 A 1.20 1
3 B 0.30 2
4 B 0.40 2
5 C 1.20 0
6 D 0.70 2
7 E 0.12 1
8 E 1.30 1
9 F 0.90 0
10 F 0.78 0
11 F 0.88 0