Home > Software engineering >  PANDAS : Assigning a value for each row in certain column based on multiple if condition
PANDAS : Assigning a value for each row in certain column based on multiple if condition

Time:03-14

I'm really new to python codes and having trouble with applying some of the answers for similar questions for my own case. Please help

So I have a dataframe with Column A and B, with numerous rows

Both contains negative and positive numbers, and I'm trying to make a new Column C with following conditions

If "Value of row 1, Column A is less than 0" & "Value of row 1, Column B is less than 0", return -100 in "row 1, Column C"

Elif "Value of row 1, Column A is less than 0" & "Value of row 1, Column B is greater than 0", return 100 in "row 1, Column C"

Elif "Value of row 1, Column A is greater than 0" & "Value of row 1, Column B is less than 0", return 100 in "row 1, Column C"

Else : return (Column A.Value / Column B.Value) in Column C

Thanks a lot

CodePudding user response:

I think you are looking for np.select:

condlist = [(df['A'] < 0) & (df['B'] < 0),
            (df['A'] < 0) & (df['B'] > 0),
            (df['A'] > 0) & (df['B'] < 0)]

choicelist = [-100, 100, 100]

default = df['A'] / df['B']

df['C'] = np.select(condlist, choicelist, default)

Output:

>>> df
          A          B           C
0 -0.002639  -1.374507 -100.000000
1 -0.696428   9.923431  100.000000
2  1.410547   3.804043    0.370802
3  1.504908   2.701486    0.557067
4  1.867486   1.889067    0.988576
5 -0.451066 -11.529716 -100.000000
6  5.713800  -7.678271  100.000000
7 -4.318760   5.082725  100.000000
8  5.169819  -4.122461  100.000000
9  0.094524  -1.916718  100.000000

Setup a MRE

import pandas as pd
import numpy as np

np.random.seed(2022)
df = pd.DataFrame(np.random.normal(0, 5, (10, 2)), columns=['A', 'B'])

CodePudding user response:

import pandas as pd

# initialize as list
data = [[1, 10], [3, 45], [56, -6], [-96, -65]] 

# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['A', 'B'])
print(df)
print('*****************')

df['C'] = None

if df.iat[0,0] < 0 and df.iat[0,1] < 0:
  df.iat[0,2] = -100

elif df.iat[0,0] < 0 and df.iat[0,1] > 0:
  df.iat[0,2]  = 100

elif df.iat[0,0] > 0 and df.iat[0,1] < 0:
  df.iat[0,2]  = 100

else:
  df['C'] = df['A']/df['B'] 

print('New dataFrame:')
print(df)

OUTPUT:

    A   B
0   1  10
1   3  45
2  56  -6
3 -96 -65
*****************
New dataFrame:
    A   B         C
0   1  10  0.100000
1   3  45  0.066667
2  56  -6 -9.333333
3 -96 -65  1.476923

Let me know if any confusion!

CodePudding user response:

One way you could do this is by defining a function with the desired logic and then passing it to the apply function along axis 1 (row-wise). In this case, that function might look like the following:

def f(x):

    if x["A"] < 0 and x["B"] < 0:
        return -100
    elif x["A"] < 0 and x["B"] > 0:
        return 100
    elif x["A"] > 0 and x["B"] < 0:
        return 100
    else:
        return x["A"] / x["B"]

We can then generate some sample data for testing purposes:

>>> import numpy as np
>>> import pandas as pd
>>> data = np.random.randint(-50, 50, size = (10, 2))
>>> df   = pd.DataFrame(data, columns = ["A", "B"])
>>> df
    A   B
0  23   4
1  10  25
2 -14  45
3  31  32
4  49  32
5 -23  34
6 -10 -29
7  10 -19
8 -45 -48
9  31 -31

Finally, we can apply our function to the sample data:

>>> df["C"] = df.apply(f, axis = 1)
>>> df
    A   B          C
0  23   4    5.75000
1  10  25    0.40000
2 -14  45  100.00000
3  31  32    0.96875
4  49  32    1.53125
5 -23  34  100.00000
6 -10 -29 -100.00000
7  10 -19  100.00000
8 -45 -48 -100.00000
9  31 -31  100.00000
  • Related