pandas change values in dataframe with iterrows()?-CodePudding

I am experimenting with "flaging" some data with a 1 or 0 in a separated df column based on a condition, but could use some tips...

EDIT, this question is NOT looking up data in a dataframe but is attempting to look for a solution modify values in the dataframe for each row based on row conditions.

Made up data:

import pandas as pd
import numpy as np


rows,cols = 8760,3
data = np.random.rand(rows,cols) 
tidx = pd.date_range('2019-01-01', periods=rows, freq='1T') 
df = pd.DataFrame(data, columns=['cooling_sig','heating_sig','economizer_sig'], index=tidx)

This is some extra parameters and columns for my application:

# params for air handling unit (ahu)
ahu_min_oa = .2

# make columns out of params
df['ahu_min_oa'] = ahu_min_oa
df['heating_mode'] = 0
df['econ_mode'] = 0
df['econ mech_cooling'] = 0
df['mech_cooling'] = 0

A function to process the data but it doesn't work. Any better practices greatly appreciated other than hammering through each row of the dataframe. I am trying "flag" a mode with a value of 1 based on a condition. For example, for each row in the data the heating_mode would be True or 1 if the heating_sig is greater than zero.

def data_preprocess(dataframe):
    
    for index, row in dataframe.iterrows():
        
        # OS1, the AHU is heating
        if row.heating_sig > 0:
            row['heating_mode'] = 1

        # OS2, the AHU is using free cooling only
        if row.economizer_sig > row.ahu_min_oa and row.cooling_sig == 0:
            row['econ_mode'] = 1

        # OS3, the AHU is using free and mechanical cooling
        if row.economizer_sig > row.ahu_min_oa and row.cooling_sig > 0:
            row['econ mech_cooling'] = 1

        # OS4, the AHU is using mechanical cooling only
        if row.economizer_sig <= row.ahu_min_oa and row.cooling_sig > 0:
            row['mech_cooling'] = 1

        return dataframe

Sorry probably sort of a strange application and question but thanks for any tips. My attempt at Flagging some data isnt working, all of the value_counts() are zero.

df['heating_mode'].value_counts()
df['mech_cooling'].value_counts()
df['econ_mode'].value_counts()
df['econ mech_cooling'].value_counts()

CodePudding user response：

You don't need to (and shouldn't) iterate over your DataFrame.

Instead, try:

df.loc[df["heating_sig"].eq(1), "heating_mode"] = 1
df.loc[df["economizer_sig"].gt(df["ahu_min_oa"]) & df["cooling_sig"].eq(0), "econ_mode"] = 1
df.loc[df["economizer_sig"].gt(df["ahu_min_oa"]) & df["cooling_sig"].gt(0), "econ mech_cooling"] = 1
df.loc[df["economizer_sig"].le(df["ahu_min_oa"]) & df["cooling_sig"].gt(0), "mech_cooling"] = 1

CodePudding user response：

There might be more efficient ways of doing the same, but if you really need to use iterrows(), then follow the following approach:

def data_preprocess(dataframe):
    for index, row in dataframe.iterrows():
        # OS1, the AHU is heating
        if row.heating_sig > 0:
            dataframe.at[index, 'heating_mode'] = 1

        # OS2, the AHU is using free cooling only
        if row.economizer_sig > row.ahu_min_oa and row.cooling_sig == 0:
            dataframe.at[index, 'econ_mode'] = 1

        # OS3, the AHU is using free and mechanical cooling
        if row.economizer_sig > row.ahu_min_oa and row.cooling_sig > 0:
            dataframe.at[index, 'econ mech_cooling'] = 1

        # OS4, the AHU is using mechanical cooling only
        if row.economizer_sig <= row.ahu_min_oa and row.cooling_sig > 0:
            dataframe.at[index, 'mech_cooling'] = 1

    return dataframe