I try and defined a function to process a df (like adding columns and convert all cols head to lower case) before doing the analysis. All other line works fine except the line that I tried to rearrange the columns orders.
the function looks like this
def cleanDf(df):
df.columns = df.columns.str.replace(' ','_')
df.columns = df.columns.str.lower()
df['date1'] = pd.to_datetime(df['date'].astype(str) ' ' df['time'].astype(str))
df['weekday'] = df['date1'].dt.day_name()
business_hour_mask = (df['date1'].dt.hour >=9) & (df['date1'].dt.hour <=18)
df['business_hour'] = np.where(business_hour_mask, "Yes","No")
df['week_number'] = df.date1.dt.week
df = df.reindex(['date1','week_number','weekday','business_hour','changed_by','customer','field_name','new_value','old_value','new_value.1','old_value.1','date','time','company_code','sales_organization','distribution_channel','division'], axis=1)
#problem line, i've tried both with and without "df = " in front of this line
return df
my current workaround is to insert that line after i call the function then it works
cleanDf(df)
df = df.reindex(['date1','week_number','weekday','business_hour','changed_by','customer','field_name','new_value','old_value','new_value.1','old_value.1','date','time','company_code','sales_organization','distribution_channel','division'], axis=1)
df.head()
Appreciate if you can advise why the line does not inside the function, but ok when executed separately.
thank you very much
CodePudding user response:
It's because you're reassigning the df
variable inside the function, where it's just a parameter. Since you're returing df
though, it's simple. Just write df = cleanDf(df)
instead of just cleanDf(df)
:
df = cleanDf(df)
df.head()
Per @mozway's comment, you should also define your cleanDf
function like so:
def cleanDf(df):
df = df.copy()
# ... do your stuff ...
return df