I'm trying to subtract the rows with some logic in a dataframe like that:
I want to subtract the second row on the first rown in 'Qtd' variable. Then, the result would be like that:
I was looking for a pandas function like diff() but my dataframe had non-numeric variables and the first row always is NaN. Anyone has some tip?
CodePudding user response:
The diff() function can calculate the difference between consecutive rows in a pandas data frame.
For the trick to handle the NaN values, you can add .fillna(df['Qtd'])
to fill values from the Qtd column as default to replace NaN values. Finally, if want to convert float diff value to an integer as shown in posted example then add .astype(int)
to the end of the expression.
Example:
import pandas as pd
df = pd.DataFrame({"X":["Gold","Gold"], "Y": ["Dirty","Clean"],
"ID":[1,1], "Qtd": [11, 10],
"Day": [11, 11], "Month":[8,8],
"Year":[2021,2021]
})
df["Qtd"] = df["Qtd"].diff(-1).fillna(df['Qtd']).astype(int)
print(df)
Output:
X Y ID Qtd Day Month Year 0 Gold Dirty 1 1 11 8 2021 1 Gold Clean 1 10 11 8 2021