Apply function to data frame and make output a separate df pandas-CodePudding

I have a data frame

cat input.csv
dwelling,wall,weather,occ,height,temp
5,2,Ldn,Pen,154.7,23.4
5,4,Ldn,Pen,172.4,28.7
3,4,Ldn,Pen,183.5,21.2
3,4,Ldn,Pen,190.2,30.3

To which I'm trying to apply the following function:

input_df = pd.read_csv('input.csv')


def folder_column(row):
    if row['dwelling'] == 5 and row['wall'] == 2:
        return 'folder1'
    elif row['dwelling'] == 3 and row['wall'] == 4:
        return 'folder2'
    else:
        return 0

I want to run the function on the input dataset and store the output in a separate data frame using something like this:

temp_df = pd.DataFrame()
temp_df = input_df['archetype_folder'] = input_df.apply(folder_column, axis=1)

But when I do this I only get the newly created 'archetype_folder' in the temp_df, when I would like all the original columns from the input_df. Can anyone help? Note that I don't want to add the new column 'archetype_folder' to the original, input_df. I've also tried this:

temp_df = input_df

temp_df['archetype_folder'] = temp_df.apply(folder_column, axis=1)

But when I run the second command both input_df and temp_df end up with the new column?

Any help is appreciated!

CodePudding user response：

Use Dataframe.copy :

temp_df = input_df.copy()

temp_df['archetype_folder'] = temp_df.apply(folder_column, axis=1)

CodePudding user response：

You need to create copy of original DataFrame, then assign return values of your function to it, consider following simple example

import pandas as pd
def is_odd(row):
    return row.value % 2 == 1
df1 = pd.DataFrame({"value":[1,2,3],"name":["uno","dos","tres"]})
df2 = df1.copy()
df2["odd"] = df1.apply(is_odd,axis=1)
print(df1)
print("=====")
print(df2)

gives output

   value  name
0      1   uno
1      2   dos
2      3  tres
=====
   value  name    odd
0      1   uno   True
1      2   dos  False
2      3  tres   True

CodePudding user response：

You don't need apply. Use .loc to be more efficient.

temp_df = input_df.copy()
m1 = (input_df['dwelling'] == 5) & (input_df['wall'] == 2)
m2 = (input_df['dwelling'] == 3) & (input_df['wall'] == 4)
temp_df.loc[m1, 'archetype_folder'] = 'folder1'
temp_df.loc[m2, 'archetype_folder'] = 'folder2'