My dataframe looks like this
category value1 value2
A. 20. 30.
B. 40. 50.
A. 60. 70.
B. 80. 90.
C. 10. 10.
D. 20. 20.
I want to create a new column that has the value of either value1
or value2
based on a condition relative to category
.
For example, if category
is A
then store value1
, if category
is B
then store value2
,if else
then store nan
. I expect an output like this:
category value1 value2 new_col
A. 20. 30. 20.
B. 40. 50. 50.
A. 60. 70. 60.
B. 80. 90. 90.
C. 10. 10. nan
D. 20. 20. nan
How can I do that?
CodePudding user response:
# np.select and define list of condition with corresponding values
df['value3']=(np.select([df['category'].eq('A.'), # condition #1
df['category'].eq('B.')],# condition #2
[df['value1'], # value when #1 is true
df['value2']], # value when #2 is true
np.nan)) # default value
df
category value1 value2 value3
0 A. 20.0 30.0 20.0
1 B. 40.0 50.0 50.0
2 A. 60.0 70.0 60.0
3 B. 80.0 90.0 90.0
4 C. 10.0 10.0 NaN
5 D. 20.0 20.0 NaN
CodePudding user response:
Another possible solution, based on numpy.where
:
df['new_col'] = np.where(df.category.eq('A.'), df.value1, np.where(
df.category.eq('B.'), df.value2, np.nan))
Output:
category value1 value2 new_col
0 A. 20.0 30.0 20.0
1 B. 40.0 50.0 50.0
2 A. 60.0 70.0 60.0
3 B. 80.0 90.0 90.0
4 C. 10.0 10.0 NaN
5 D. 20.0 20.0 NaN
Yet another possible solution, based on pandas.DataFrame.update
:
df['new_col'] = df.value1.loc[df.category.eq('A.')]
df['new_col'].update(df.value2.loc[df.category.eq('B.')])
Output:
# same
CodePudding user response:
One option is with case_when from pyjanitor:
# pip install pyjanitor
import pandas as pd
import janitor
(df
.case_when(
df.category.eq('A.'), df.value1, # condition, result
df.category.eq('B.'), df.value2,
np.nan, # default
column_name = 'new_col')
)
category value1 value2 new_col
0 A. 20.0 30.0 20.0
1 B. 40.0 50.0 50.0
2 A. 60.0 70.0 60.0
3 B. 80.0 90.0 90.0
4 C. 10.0 10.0 NaN
5 D. 20.0 20.0 NaN