I have a dataframe similar to below-given dataframe. I need to add a value in Validated column that matches the below condition: If there are multiple rows with the same values in State, ColorName, and Code columns then at least one row should contain a positive value in the Value column. If there is no row with a positive value in Value column, I need to add "Invalid" in the Validated column for all the matching rows.Is there a way I can do it without iterating over each row?
State ColorName Code Value Validated
Arizona Yellow A 50
Alabama Orange A 150
Arkansas Red B -500
Kentuky Green M -40
Ohio Blue X 100
Alabama Orange A -30
Arizona Yellow A 100
California Blue C 100
California Blue C -100
Arkansas Red B 500
Ohio Yellow X 100
California Blue C 100
CodePudding user response:
df = pd.DataFrame({'State': ['Arizona', 'Alabama', 'Arkansas', 'Kentuky', 'Ohio', 'Alabama', 'Arizona', 'California',
'California', 'Arkansas', 'Ohio', 'California'],
'ColorName': ['Yellow', 'Orange', 'Red', 'Green', 'Blue', 'Orange', 'Yellow', 'Blue', 'Blue', 'Red',
'Yellow', 'Blue'],
'Code': ['A', 'A', 'B', 'M', 'X', 'A', 'A', 'C', 'C', 'B', 'X', 'C'],
'Value': [50, 150, -500, -40, 100, -30, 100, 100, -100, 500, 100, 100]})
df['Validated'] = df.groupby(['State', 'ColorName', 'Code'])['Value'].transform(lambda x: 'Valid' if x.shape[0] > 1 and x.max() > 0 else 'Invalid')
print(df)
State ColorName Code Value Validated
0 Arizona Yellow A 50 Valid
1 Alabama Orange A 150 Valid
2 Arkansas Red B -500 Valid
3 Kentuky Green M -40 Invalid
4 Ohio Blue X 100 Invalid
5 Alabama Orange A -30 Valid
6 Arizona Yellow A 100 Valid
7 California Blue C 100 Valid
8 California Blue C -100 Valid
9 Arkansas Red B 500 Valid
10 Ohio Yellow X 100 Invalid
11 California Blue C 100 Valid
CodePudding user response:
Assuming you want more than one value and at least one positive:
g = (df.assign(flag=df['Value'].gt(0))
.groupby(['State', 'ColorName', 'Code'])
)
m1 = g.transform('size').gt(1)
m2 = g['flag'].transform('any')
df['Validated'] = np.where(m1&m2, 'Valid', 'Invalid')
Output:
State ColorName Code Value Validated
0 Arizona Yellow A 50 Valid
1 Alabama Orange A 150 Valid
2 Arkansas Red B -500 Valid
3 Kentuky Green M -40 Invalid
4 Ohio Blue X 100 Invalid
5 Alabama Orange A -30 Valid
6 Arizona Yellow A 100 Valid
7 California Blue C 100 Valid
8 California Blue C -100 Valid
9 Arkansas Red B 500 Valid
10 Ohio Yellow X 100 Invalid
11 California Blue C 100 Valid
If you just want at least one positive value:
df['Validated'] = np.where(m2, 'Valid', 'Invalid')
Output:
State ColorName Code Value Validated
0 Arizona Yellow A 50 Valid
1 Alabama Orange A 150 Valid
2 Arkansas Red B -500 Valid
3 Kentuky Green M -40 Invalid
4 Ohio Blue X 100 Valid
5 Alabama Orange A -30 Valid
6 Arizona Yellow A 100 Valid
7 California Blue C 100 Valid
8 California Blue C -100 Valid
9 Arkansas Red B 500 Valid
10 Ohio Yellow X 100 Valid
11 California Blue C 100 Valid