I'm trying to compare two columns r1
and r2
in the pandas dataframe by row, essentially I want to return the 'red' when value in r1
is greater than that of r2
in the same row, while the outcome is that the comparison only returns yellow
in the else condition even two values are provided.
Below is the dataset for reproducing the issue.
import pandas as pd
import numpy as np
from numpy import nan
time_idx = pd.date_range('2021-01-05', periods=8, freq='3H')
r1 = np.concatenate([[nan,nan,nan],235,324,125,[nan,nan]], axis=None)
r2 = np.concatenate([114,nan,345,235,nan,125,178,324], axis=None)
df_r = pd.DataFrame({'r1': r1,'r2':r2},index = time_idx)
df_r["Light"]=['red' if df_r.loc[t,'r1'] > df_r.loc[t,'r2']
else 'yellow' for t in df_r.index]
print(df_r)
And here are the outputs:
r1 r2 Light
2021-01-05 00:00:00 NaN 114.0 yellow
2021-01-05 03:00:00 NaN NaN yellow
2021-01-05 06:00:00 NaN 345.0 yellow
2021-01-05 09:00:00 235.0 235.0 yellow
2021-01-05 12:00:00 324.0 NaN yellow
2021-01-05 15:00:00 125.0 125.0 yellow
2021-01-05 18:00:00 NaN 178.0 yellow
2021-01-05 21:00:00 NaN 324.0 yellow
My question goes to as follows:
- How can I achieve the goal intended while sticking at using list comprehension?
Thank you.
CodePudding user response:
To have red
where r1==324
, replace np.nan
by -np.inf
:
df_r['Light'] = df_r['r1'].gt(df_r['r2'].fillna(-np.inf)) \
.replace({True: 'red', False: 'yellow'})
print(df)
# Output:
r1 r2 Light
2021-01-05 00:00:00 NaN 114.0 yellow
2021-01-05 03:00:00 NaN NaN yellow
2021-01-05 06:00:00 NaN 345.0 yellow
2021-01-05 09:00:00 235.0 235.0 yellow
2021-01-05 12:00:00 324.0 NaN red
2021-01-05 15:00:00 125.0 125.0 yellow
2021-01-05 18:00:00 NaN 178.0 yellow
2021-01-05 21:00:00 NaN 324.0 yellow
.gt
is equivalent to the operator >
.