I am trying to manipulate excel sheet data to automate a process on excel(not a developer) in order to delete the value of the last row from the first then the value of the last -1 from the second and so on, my data is similar to the below
Code Col1 Col2
0 A 1.7653 56.2
1 B 1 Nan
2 C Nan 5
3 D 34.4 0
and i have to deduct the last last column from the first, then then last -1 from the second and so on until i meet them in the middle(assuming that we'll only be having even numbers of rows), i already solved the issue of getting rid of columns having strings so my output pandas looks like this
Col1 Col2
0 1.7653 56.2
1 1 Nan
2 Nan 5
3 34.4 0
now i need to deduct the values so the new panda frame to be created will look like this: the values below are found after the deductions
Col1 Col2
0 -32.2347 56.2
1 1 -5
I was able to delete it per 1 value but not iteratively no matter how many rows i have and create a pandas half the rows of the first with the same columns as output
Nan will be treated as 0 and the actual dataset has hundreds of columns and rows that can change
code:
import pandas as pd
import datetime
# Create a dataframe
df = pd.read_excel(r'file.xls', sheet_name='sheet1')
for col, dt in df.dtypes.items():
if dt == object:
df = df.drop(col, 1)
i=0
for col in df.dtypes.items():
while i < len(df)/2:
df[i] = df[i] - df[len(df) - i]
i
CodePudding user response:
An approach could be the following:
import pandas as pd
import numpy as np
df = pd.DataFrame([["A", 1.7653, 56.2], ["B", 1, np.nan], ["C", np.nan, 5], ["D", 34.4, 0]], columns=["Code", "Col1", "Col2"], )
del df["Code"]
df.fillna(0, inplace=True)
s = df.shape[0] // 2
differences = pd.DataFrame([df.iloc[i] - df.iloc[df.shape[0]-i-1] for i in range(s)])
print(differences)
OUTPUT
Col1 Col2
0 -32.6347 56.2
1 1.0000 -5.0
FOLLOW UP
Reading the comments, I understand that the subtraction logic you want to apply is the following:
- Normal subtraction if both numbers are not
nan
- If one of the numbers is
nan
, then swap thenan
with 0 - If both numbers are
nan
, a `nan is returned
I don't know if there is a function which works like that out of the box, hence I have created a custom_sub
.
In avoidance of doubt, this is the file I am using
grid.txt
,Code,Col1,Col2
0,A,1.7653,56.2
1,B,1,
2,C,,5
3,D,34.4,0
The code:
import pandas as pd
import numpy as np
df = pd.read_csv("grid.txt", sep=",",index_col=[0])
del df["Code"]
def custom_sub(x1, x2):
if np.isnan(x1) or np.isnan(x2):
if np.isnan(x1) and np.isnan(x2):
return np.nan
else:
return -x2 if np.isnan(x1) else x1
else:
return x1 - x2
s = df.shape[0] // 2
differences = pd.DataFrame([df.iloc[i].combine(df.iloc[df.shape[0]-i-1], custom_sub) for i in range(s)])
print(differences)