Home > Blockchain >  overwriting dataframes in pandas
overwriting dataframes in pandas

Time:12-02

I have a given dataframe

new_df :

ID summary text_len
1 xxx 45
2 aaa 34

I am performing some df manipulation by concatenating keywords from different df, like that:

keywords = df["keyword"].to_list()
for key in keywords:
    new_df[key] = new_df["summary"].str.lower().str.count(key)
new_df

from here I need two separate dataframes to perform few actions (to each of them add some columns, do some calculations etc).

I need a dataframe with occurrences as per given piece of code and a binary dataframe.

WHAT I DID:

  1. assign dataframe for occurrences: df_freq = new_df (because it is already calculated an done)

  2. I created another dataframe - binary one - on the top of new_df:

    #select only numeric columns to change them to binary

    numeric_cols = new_df.select_dtypes("number", exclude='float64').columns.tolist()

    new_df_binary = new_df

    new_df_binary['text_length'] = new_df_binary['text_length'].astype(int)

    new_df_binary[numeric_cols] = (new_df_binary[numeric_cols] > 0).astype(int)

  3. Everything works fine - I perform the math I need, but when I want to come back to df_freq - it is no longer dataframe with occurrences.. looks like it changed along with binary code

I need separate tables and perform separate math on them. Do you know how I can avoid this hmm overwriting issue?

CodePudding user response:

You may use pandas' copy method with the deep argument set to True:

df_freq = new_df.copy(deep=True)

Setting deep=True (which is the default parameter) ensures that modifications to the data or indices of the copy do not impact the original dataframe.

  • Related