I have a data frame
cat input.csv
dwelling,wall,weather,occ,height,temp
5,2,Ldn,Pen,154.7,23.4
5,4,Ldn,Pen,172.4,28.7
3,4,Ldn,Pen,183.5,21.2
3,4,Ldn,Pen,190.2,30.3
To which I'm trying to apply the following function:
input_df = pd.read_csv('input.csv')
def folder_column(row):
if row['dwelling'] == 5 and row['wall'] == 2:
return 'folder1'
elif row['dwelling'] == 3 and row['wall'] == 4:
return 'folder2'
else:
return 0
I want to run the function on the input dataset and store the output in a separate data frame using something like this:
temp_df = pd.DataFrame()
temp_df = input_df['archetype_folder'] = input_df.apply(folder_column, axis=1)
But when I do this I only get the newly created 'archetype_folder' in the temp_df, when I would like all the original columns from the input_df. Can anyone help? Note that I don't want to add the new column 'archetype_folder' to the original, input_df. I've also tried this:
temp_df = input_df
temp_df['archetype_folder'] = temp_df.apply(folder_column, axis=1)
But when I run the second command both input_df and temp_df end up with the new column?
Any help is appreciated!
CodePudding user response:
Use Dataframe.copy :
temp_df = input_df.copy()
temp_df['archetype_folder'] = temp_df.apply(folder_column, axis=1)
CodePudding user response:
You need to create copy of original DataFrame, then assign return values of your function to it, consider following simple example
import pandas as pd
def is_odd(row):
return row.value % 2 == 1
df1 = pd.DataFrame({"value":[1,2,3],"name":["uno","dos","tres"]})
df2 = df1.copy()
df2["odd"] = df1.apply(is_odd,axis=1)
print(df1)
print("=====")
print(df2)
gives output
value name
0 1 uno
1 2 dos
2 3 tres
=====
value name odd
0 1 uno True
1 2 dos False
2 3 tres True
CodePudding user response:
You don't need apply
. Use .loc
to be more efficient.
temp_df = input_df.copy()
m1 = (input_df['dwelling'] == 5) & (input_df['wall'] == 2)
m2 = (input_df['dwelling'] == 3) & (input_df['wall'] == 4)
temp_df.loc[m1, 'archetype_folder'] = 'folder1'
temp_df.loc[m2, 'archetype_folder'] = 'folder2'