I have to either normalize or replace some values in a dataframe so that, if the value is 0, it keeps being 0. If the value is 1, it turns to 0 and if it's more than 1, it turns to be 1. I tried using the min-max function but it does not seem to have some kind of possible modification for this purpose, so I tried to use this code (look below) to replace the values but it prints the columns separately, and when it printed it with the rest of the columns, no value had changed. This makes me doubt whether I should use normalization or value replacing here.
def min_max_scaling(df):
# copy the dataframe
df_min_max_scaled = df.copy()
# define specific columns
# apply filter of 0, 1, 1
columns = df_min_max_scaled["cholesterol","gluc", "smoke", "alco", "active", "cardio"]
df_min_max_scaled[columns] = (df_min_max_scaled[columns] - df_min_max_scaled[columns].min()) / (
df_min_max_scaled[columns].max() - df_min_max_scaled[columns].min())
if columns["cholesterol", "glue"].value() > 1:
columns.replace(0, 1)
print(columns)
else:
pass
print(df_min_max_scaled)
CodePudding user response:
IIUC:
Your conditions:
- If the value is 0, it keeps being 0
- If the value is 1, it turns to 0
- If it's more than 1, it turns to be 1.
df['col2'] = df['col1'].gt(1).astype(int)
print(df)
# Output
col1 col2
0 0.0 0
1 0.5 0
2 1.5 1
Details:
>>> df # .gt(1)? -> .astype(int)
col1
0 0.0 # 0.0 > 1? False -> int(False) -> 0
1 0.5 # 0.5 > 1? False -> int(False) -> 0
2 1.5 # 1.5 > 1? True -> int(True) -> 1
# For a scalar value, you can try:
>>> int(0.5 > 1)
0
>>> int(1.5 > 1)
1
Update
I need to see how to replace the values and not to create a new column
cols = ['cholesterol', 'gluc', 'smoke', 'alco', 'active', 'cardio']
df[cols] = df[cols].gt(1).astype(int)