Home > database >  Pandas create a new column based on existing column with chain rule
Pandas create a new column based on existing column with chain rule

Time:04-05

I have below code

import pandas as pd
import numpy as np
df = pd.DataFrame({"A":[12, 4, 5, 3, 1],"B":[7, 2, 54, 3, None],"C":[20, 16, 11, 3, 8],"D":[14, 3, None, 2, 6]})
df['A1'] = np.where(df['A'] > 10, 10, np.where(df['A'] < 3, 3, df['A']))

While this is okay, I want create the final dataframe (i.e. 2nd line of code) using chain rule from the first line. I want to achieve this to increase readability.

Could you please help how can I achieve this?

CodePudding user response:

You can use clip here:

df.assign(A1=df['A'].clip(upper=10,lower=3))

    A     B   C     D  A1
0  12   7.0  20  14.0  10
1   4   2.0  16   3.0   4
2   5  54.0  11   NaN   5
3   3   3.0   3   2.0   3
4   1   NaN   8   6.0   3

If you really need to do this in one line (note that I dont find this readable)

pd.DataFrame({"A":[12, 4, 5, 3, 1],
              "B":[7, 2, 54, 3, None],
              "C":[20, 16, 11, 3, 8],
              "D":[14, 3, None, 2, 6]}).assign(A1=lambda x:x['A'].clip(upper=10,lower=3))

CodePudding user response:

You could use np.select() like the following. It makes the conditions and choices very readable.

conditions = [df['A'] > 10,
              df['A'] < 3]

choices = [10,3]

df['A2'] = np.select(conditions, choices, default = df['A'])
print(df)

    A     B   C     D  A1
0  12   7.0  20  14.0  10
1   4   2.0  16   3.0   4
2   5  54.0  11   NaN   5
3   3   3.0   3   2.0   3
4   1   NaN   8   6.0   3
  • Related