I have to replace the values from the first three columns with NaN if they are >=
than fence_high
or <=
than fence_low
.
I have a dataframe like this:
col1 col2 col3 fence_high fence_low
0 1 3 9 9 1.5
1 2 4 6 7 1
2 4 7 -1 6.5 0
This is what I would like to achieve:
col1 col2 col3 fence_high fence_low
0 NaN 3 NaN 9 1.5
1 2 4 6 7 1
2 4 NaN NaN 6.5 0
So far I tried df_new = df[(df < df["fence_high"]) & (df > df["fence_low"])]
, but this gives me all NaN.
CodePudding user response:
We can simply keep values where they fall between fence_low
and fence_high
using gt and lt to maintain index alignment:
df.loc[:, 'col1':'col3'] = df.loc[:, 'col1':'col3'].where(
lambda x: x.gt(df['fence_low'], axis=0) & x.lt(df['fence_high'], axis=0)
)
df
col1 col2 col3 fence_high fence_low
0 NaN 3.0 NaN 9.0 1.5
1 2.0 4.0 6.0 7.0 1.0
2 4.0 NaN NaN 6.5 0.0
If needing a new DataFrame we can join after where to restore the columns that were not considered:
new_df = df.loc[:, 'col1':'col3'].where(
lambda x: x.gt(df['fence_low'], axis=0) & x.lt(df['fence_high'], axis=0)
).join(df[['fence_high', 'fence_low']])
new_df
:
col1 col2 col3 fence_high fence_low
0 NaN 3.0 NaN 9.0 1.5
1 2.0 4.0 6.0 7.0 1.0
2 4.0 NaN NaN 6.5 0.0
CodePudding user response:
One of the ways is to use apply
See if this helps:
import pandas as pd
import numpy as np
cols_list = ["col1", "col2", "col3"]
def compare_val(val, high, low):
if val >= high or val <= low:
return np.nan
return val
def compare(row):
result = []
for i in cols_list:
result.append(
compare_val(val=row[i], high=row["fence_high"], low=row["fence_low"])
)
return pd.Series(result)
data = [[1, 3, 9, 9, 1.5], [2, 4, 6, 7, 1], [4, 7, -1, 6.5, 0]]
df = pd.DataFrame(data, columns=[*cols_list, "fence_high", "fence_low"])
print("Original:\n", df.head())
df[cols_list] = df.apply(compare, axis=1)
print("Transformed:\n", df.head())
Output:
Original:
col1 col2 col3 fence_high fence_low
0 1 3 9 9.0 1.5
1 2 4 6 7.0 1.0
2 4 7 -1 6.5 0.0
Transformed:
col1 col2 col3 fence_high fence_low
0 NaN 3.0 NaN 9.0 1.5
1 2.0 4.0 6.0 7.0 1.0
2 4.0 NaN NaN 6.5 0.0