I have a given dataframe
new_df :
ID | summary | text_len |
---|---|---|
1 | xxx | 45 |
2 | aaa | 34 |
I am performing some df manipulation by concatenating keywords from different df, like that:
keywords = df["keyword"].to_list()
for key in keywords:
new_df[key] = new_df["summary"].str.lower().str.count(key)
new_df
from here I need two separate dataframes to perform few actions (to each of them add some columns, do some calculations etc).
I need a dataframe with occurrences as per given piece of code and a binary dataframe.
WHAT I DID:
assign dataframe for occurrences:
df_freq = new_df
(because it is already calculated an done)I created another dataframe - binary one - on the top of new_df:
#select only numeric columns to change them to binary
numeric_cols = new_df.select_dtypes("number", exclude='float64').columns.tolist()
new_df_binary = new_df
new_df_binary['text_length'] = new_df_binary['text_length'].astype(int)
new_df_binary[numeric_cols] = (new_df_binary[numeric_cols] > 0).astype(int)
Everything works fine - I perform the math I need, but when I want to come back to df_freq - it is no longer dataframe with occurrences.. looks like it changed along with binary code
I need separate tables and perform separate math on them. Do you know how I can avoid this hmm overwriting issue?
CodePudding user response:
You may use pandas' copy
method with the deep
argument set to True
:
df_freq = new_df.copy(deep=True)
Setting deep=True
(which is the default parameter) ensures that modifications to the data or indices of the copy do not impact the original dataframe.