Home > other >  Creating New Column with value of other columns based on condition on a third Column in Pandas DataF
Creating New Column with value of other columns based on condition on a third Column in Pandas DataF

Time:10-25

My dataframe looks like this

category value1 value2 
A.       20.      30.
B.       40.      50.
A.       60.      70. 
B.       80.      90.
C.       10.      10. 
D.       20.      20.  

I want to create a new column that has the value of either value1 or value2 based on a condition relative to category. For example, if category is A then store value1, if category is B then store value2,if else then store nan. I expect an output like this:

category value1 value2 new_col
A.       20.      30.   20.
B.       40.      50.   50.
A.       60.      70.   60.  
B.       80.      90.   90.
C.       10.      10.   nan
D.       20.      20.   nan
 

How can I do that?

CodePudding user response:

# np.select and define list of condition with corresponding values
df['value3']=(np.select([df['category'].eq('A.'), # condition #1
                         df['category'].eq('B.')],# condition #2
                        [df['value1'],            # value when #1 is true
                         df['value2']],           # value when #2 is true
                        np.nan))                  # default value
df
    category    value1  value2  value3
0          A.   20.0    30.0    20.0
1          B.   40.0    50.0    50.0
2          A.   60.0    70.0    60.0
3          B.   80.0    90.0    90.0
4          C.   10.0    10.0    NaN
5          D.   20.0    20.0    NaN

CodePudding user response:

Another possible solution, based on numpy.where:

df['new_col'] = np.where(df.category.eq('A.'), df.value1, np.where(
    df.category.eq('B.'), df.value2, np.nan))

Output:

  category  value1  value2  new_col
0       A.    20.0    30.0     20.0
1       B.    40.0    50.0     50.0
2       A.    60.0    70.0     60.0
3       B.    80.0    90.0     90.0
4       C.    10.0    10.0      NaN
5       D.    20.0    20.0      NaN

Yet another possible solution, based on pandas.DataFrame.update:

df['new_col'] = df.value1.loc[df.category.eq('A.')]
df['new_col'].update(df.value2.loc[df.category.eq('B.')])

Output:

# same

CodePudding user response:

One option is with case_when from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor

(df
.case_when(
    df.category.eq('A.'), df.value1, # condition, result
    df.category.eq('B.'), df.value2,  
    np.nan, # default
    column_name = 'new_col')
)

  category  value1  value2  new_col
0       A.    20.0    30.0     20.0
1       B.    40.0    50.0     50.0
2       A.    60.0    70.0     60.0
3       B.    80.0    90.0     90.0
4       C.    10.0    10.0      NaN
5       D.    20.0    20.0      NaN
  • Related