Home > database >  Pandas - Update Column Values If Values in Rows Are Partially Matching
Pandas - Update Column Values If Values in Rows Are Partially Matching

Time:11-24

I have a dataframe similar to below-given dataframe. I need to add a value in Validated column that matches the below condition: If there are multiple rows with the same values in State, ColorName, and Code columns then at least one row should contain a positive value in the Value column. If there is no row with a positive value in Value column, I need to add "Invalid" in the Validated column for all the matching rows.Is there a way I can do it without iterating over each row?

State     ColorName    Code    Value      Validated
Arizona    Yellow        A       50 
Alabama    Orange        A      150 
Arkansas   Red           B      -500    
Kentuky    Green         M      -40 
Ohio       Blue          X      100
Alabama    Orange        A      -30 
Arizona    Yellow        A      100 
California Blue          C      100 
California Blue          C     -100 
Arkansas   Red           B      500 
Ohio       Yellow        X      100 
California Blue          C      100

CodePudding user response:

df = pd.DataFrame({'State': ['Arizona', 'Alabama', 'Arkansas', 'Kentuky', 'Ohio', 'Alabama', 'Arizona', 'California',
                             'California', 'Arkansas', 'Ohio', 'California'],
                   'ColorName': ['Yellow', 'Orange', 'Red', 'Green', 'Blue', 'Orange', 'Yellow', 'Blue', 'Blue', 'Red',
                                 'Yellow', 'Blue'],
                   'Code': ['A', 'A', 'B', 'M', 'X', 'A', 'A', 'C', 'C', 'B', 'X', 'C'],
                   'Value': [50, 150, -500, -40, 100, -30, 100, 100, -100, 500, 100, 100]})

df['Validated'] = df.groupby(['State', 'ColorName', 'Code'])['Value'].transform(lambda x: 'Valid' if x.shape[0] > 1 and x.max() > 0 else 'Invalid')
print(df)
         State ColorName Code  Value Validated
0      Arizona    Yellow    A     50     Valid
1      Alabama    Orange    A    150     Valid
2     Arkansas       Red    B   -500     Valid
3      Kentuky     Green    M    -40   Invalid
4         Ohio      Blue    X    100   Invalid
5      Alabama    Orange    A    -30     Valid
6      Arizona    Yellow    A    100     Valid
7   California      Blue    C    100     Valid
8   California      Blue    C   -100     Valid
9     Arkansas       Red    B    500     Valid
10        Ohio    Yellow    X    100   Invalid
11  California      Blue    C    100     Valid

CodePudding user response:

Assuming you want more than one value and at least one positive:

g = (df.assign(flag=df['Value'].gt(0))
       .groupby(['State', 'ColorName', 'Code'])
     )

m1 = g.transform('size').gt(1)
m2 = g['flag'].transform('any')

df['Validated'] = np.where(m1&m2, 'Valid', 'Invalid')

Output:

         State ColorName Code  Value Validated
0      Arizona    Yellow    A     50     Valid
1      Alabama    Orange    A    150     Valid
2     Arkansas       Red    B   -500     Valid
3      Kentuky     Green    M    -40   Invalid
4         Ohio      Blue    X    100   Invalid
5      Alabama    Orange    A    -30     Valid
6      Arizona    Yellow    A    100     Valid
7   California      Blue    C    100     Valid
8   California      Blue    C   -100     Valid
9     Arkansas       Red    B    500     Valid
10        Ohio    Yellow    X    100   Invalid
11  California      Blue    C    100     Valid

If you just want at least one positive value:

df['Validated'] = np.where(m2, 'Valid', 'Invalid')

Output:

         State ColorName Code  Value Validated
0      Arizona    Yellow    A     50     Valid
1      Alabama    Orange    A    150     Valid
2     Arkansas       Red    B   -500     Valid
3      Kentuky     Green    M    -40   Invalid
4         Ohio      Blue    X    100     Valid
5      Alabama    Orange    A    -30     Valid
6      Arizona    Yellow    A    100     Valid
7   California      Blue    C    100     Valid
8   California      Blue    C   -100     Valid
9     Arkansas       Red    B    500     Valid
10        Ohio    Yellow    X    100     Valid
11  California      Blue    C    100     Valid
  • Related