I am writing a code to read data from google sheets using gspread module.
First I read the spreadsheet and store values in a variable called df. Afterwards, I create a variable called df2 from df to make some transformations (string to numeric), while keeping df (the original database intact ). However this transformation made in df2 is carried to df (original variable where I store the original database). This should not behave like that, the change sould occur only in df2.
Does anyone know why this is happening?
Pls see the code below:
import gspread
import pandas as pd
sa = gspread.service_account(filename = "keys.json")
sheet = sa.open("chupacabra")
worksheet = sheet.worksheet("vaca_loca")
df = pd.DataFrame(worksheet.get("B2:I101"))
df
[df loaded](https://i.stack.imgur.com/lV3GJ.png)
df2 = df
df2["Taxa"] = df2["Taxa"].str.replace(",",".")
df2["Taxa"] = df2["Taxa"].str.replace("%","")
df2["Taxa"] = pd.to_numeric(df2["Taxa"])
df2["Taxa"] = df2["Taxa"]/100
df2
[df2 after string transformation](https://i.stack.imgur.com/cFWOg.png)
df
[df carrying the transformation changes made in df2](https://i.stack.imgur.com/KsSsa.png)
I was trying to perform only transformation in df2, while df should remain intact.
CodePudding user response:
In your script, I'm worried that the reason for your issue might be due to the call by reference. If my understanding is correct, how about the following modification?
From:
df2 = df
To:
df2 = df.copy()
- By this modification,
df
is copied as the pass-by-value.