I am experimenting with "flaging" some data with a 1
or 0
in a separated df column based on a condition, but could use some tips...
EDIT, this question is NOT looking up data in a dataframe but is attempting to look for a solution modify values in the dataframe for each row based on row conditions.
Made up data:
import pandas as pd
import numpy as np
rows,cols = 8760,3
data = np.random.rand(rows,cols)
tidx = pd.date_range('2019-01-01', periods=rows, freq='1T')
df = pd.DataFrame(data, columns=['cooling_sig','heating_sig','economizer_sig'], index=tidx)
This is some extra parameters and columns for my application:
# params for air handling unit (ahu)
ahu_min_oa = .2
# make columns out of params
df['ahu_min_oa'] = ahu_min_oa
df['heating_mode'] = 0
df['econ_mode'] = 0
df['econ mech_cooling'] = 0
df['mech_cooling'] = 0
A function to process the data but it doesn't work. Any better practices greatly appreciated other than hammering through each row of the dataframe. I am trying "flag" a mode with a value of 1
based on a condition. For example, for each row in the data the heating_mode
would be True or 1
if the heating_sig
is greater than zero.
def data_preprocess(dataframe):
for index, row in dataframe.iterrows():
# OS1, the AHU is heating
if row.heating_sig > 0:
row['heating_mode'] = 1
# OS2, the AHU is using free cooling only
if row.economizer_sig > row.ahu_min_oa and row.cooling_sig == 0:
row['econ_mode'] = 1
# OS3, the AHU is using free and mechanical cooling
if row.economizer_sig > row.ahu_min_oa and row.cooling_sig > 0:
row['econ mech_cooling'] = 1
# OS4, the AHU is using mechanical cooling only
if row.economizer_sig <= row.ahu_min_oa and row.cooling_sig > 0:
row['mech_cooling'] = 1
return dataframe
Sorry probably sort of a strange application and question but thanks for any tips. My attempt at Flagging some data isnt working, all of the value_counts()
are zero.
df['heating_mode'].value_counts()
df['mech_cooling'].value_counts()
df['econ_mode'].value_counts()
df['econ mech_cooling'].value_counts()
CodePudding user response:
You don't need to (and shouldn't) iterate over your DataFrame.
Instead, try:
df.loc[df["heating_sig"].eq(1), "heating_mode"] = 1
df.loc[df["economizer_sig"].gt(df["ahu_min_oa"]) & df["cooling_sig"].eq(0), "econ_mode"] = 1
df.loc[df["economizer_sig"].gt(df["ahu_min_oa"]) & df["cooling_sig"].gt(0), "econ mech_cooling"] = 1
df.loc[df["economizer_sig"].le(df["ahu_min_oa"]) & df["cooling_sig"].gt(0), "mech_cooling"] = 1
CodePudding user response:
There might be more efficient ways of doing the same, but if you really need to use iterrows(), then follow the following approach:
def data_preprocess(dataframe):
for index, row in dataframe.iterrows():
# OS1, the AHU is heating
if row.heating_sig > 0:
dataframe.at[index, 'heating_mode'] = 1
# OS2, the AHU is using free cooling only
if row.economizer_sig > row.ahu_min_oa and row.cooling_sig == 0:
dataframe.at[index, 'econ_mode'] = 1
# OS3, the AHU is using free and mechanical cooling
if row.economizer_sig > row.ahu_min_oa and row.cooling_sig > 0:
dataframe.at[index, 'econ mech_cooling'] = 1
# OS4, the AHU is using mechanical cooling only
if row.economizer_sig <= row.ahu_min_oa and row.cooling_sig > 0:
dataframe.at[index, 'mech_cooling'] = 1
return dataframe