Excel Table--pandas read_excel
Hi,
I am trying to calculate the difference of column C (unnamed: 2) and column D (unnamed: 3), and put the result in column E. I have attached a Excel screenshot and a pandas read Excel screenshot.
Here is my code:
import pandas
# load excel file
df1 = pandas.read_excel("Trial.xlsx")
# column C minus column D, and store result in column E
df1["total"] = df1["unnamed: 2"] - df1["unnamed: 3"]
print(df1)
The code does not work because column C and D contain string on C4 and D4. I am wondering if there is any way to bypass row 4 or do the calculation starting from row 5 (index 4).
CodePudding user response:
First I would use the parameter skiprows
in the read_excel
method Documentation:
df1 = pandas.read_excel("Trial.xlsx", skiprows=4)
You can also use pandas.to_numeric
with the parameter errors='coerce'
to make all non numeric values into nan
s. See documentation If you use the above code, pandas should pick up the header names and you can use the following:
df1["total"] = (
pandas.to_numeric(df1["Debit"], errors='coerce')
- pandas.to_numeric(df1["Credit"], errors='coerce')
)
CodePudding user response:
one way to accomplish it would be to make the values in the column as float
df1["unnamed: 2"] = df1["unnamed: 2"].str.replace('([^\d\.]) ','', regex=True ).replace('',np.nan).astype(float)
df1["unnamed: 3"] = df1["unnamed: 3"].str.replace('([^\d\.]) ','', regex=True ).replace('',np.nan).astype(float)
df1["total"] = df1["unnamed: 2"] - df1["unnamed: 3"]
if you post the excel as a downloadable, i an run the solution and provide the Result