overwriting dataframes in pandas-CodePudding

I have a given dataframe

new_df :

ID	summary	text_len
1	xxx	45
2	aaa	34

I am performing some df manipulation by concatenating keywords from different df, like that:

keywords = df["keyword"].to_list()
for key in keywords:
    new_df[key] = new_df["summary"].str.lower().str.count(key)
new_df

from here I need two separate dataframes to perform few actions (to each of them add some columns, do some calculations etc).

I need a dataframe with occurrences as per given piece of code and a binary dataframe.

WHAT I DID:

assign dataframe for occurrences: df_freq = new_df (because it is already calculated an done)
I created another dataframe - binary one - on the top of new_df:

#select only numeric columns to change them to binary

numeric_cols = new_df.select_dtypes("number", exclude='float64').columns.tolist()

new_df_binary = new_df

new_df_binary['text_length'] = new_df_binary['text_length'].astype(int)

new_df_binary[numeric_cols] = (new_df_binary[numeric_cols] > 0).astype(int)
Everything works fine - I perform the math I need, but when I want to come back to df_freq - it is no longer dataframe with occurrences.. looks like it changed along with binary code

I need separate tables and perform separate math on them. Do you know how I can avoid this hmm overwriting issue?

CodePudding user response：

You may use pandas' copy method with the deep argument set to True:

df_freq = new_df.copy(deep=True)

Setting deep=True (which is the default parameter) ensures that modifications to the data or indices of the copy do not impact the original dataframe.