Home > Blockchain >  How to deduct the values from pandas (same column)?
How to deduct the values from pandas (same column)?

Time:12-18

I am trying to manipulate excel sheet data to automate a process on excel(not a developer) in order to delete the value of the last row from the first then the value of the last -1 from the second and so on, my data is similar to the below

    Code  Col1    Col2
0    A    1.7653  56.2 
1    B    1       Nan
2    C    Nan     5
3    D    34.4    0

and i have to deduct the last last column from the first, then then last -1 from the second and so on until i meet them in the middle(assuming that we'll only be having even numbers of rows), i already solved the issue of getting rid of columns having strings so my output pandas looks like this

       Col1    Col2
0      1.7653  56.2 
1      1       Nan
2      Nan     5
3      34.4    0 

now i need to deduct the values so the new panda frame to be created will look like this: the values below are found after the deductions

       Col1    Col2
0      -32.2347 56.2 
1      1       -5

I was able to delete it per 1 value but not iteratively no matter how many rows i have and create a pandas half the rows of the first with the same columns as output

Nan will be treated as 0 and the actual dataset has hundreds of columns and rows that can change

code:

import pandas as pd
import datetime

# Create a dataframe
df = pd.read_excel(r'file.xls', sheet_name='sheet1')

for col, dt in df.dtypes.items():
    if dt == object:
        df = df.drop(col, 1)

i=0

for col in df.dtypes.items():
    while i < len(df)/2:
        df[i] = df[i] - df[len(df) - i]
        i  

CodePudding user response:

An approach could be the following:

import pandas as pd
import numpy as np

df = pd.DataFrame([["A", 1.7653,  56.2], ["B", 1, np.nan], ["C", np.nan, 5], ["D", 34.4, 0]], columns=["Code", "Col1", "Col2"], )

del df["Code"]
df.fillna(0, inplace=True)

s = df.shape[0] // 2
differences = pd.DataFrame([df.iloc[i] - df.iloc[df.shape[0]-i-1] for i in range(s)])

print(differences)

OUTPUT

Col1  Col2
0 -32.6347  56.2
1   1.0000  -5.0

FOLLOW UP

Reading the comments, I understand that the subtraction logic you want to apply is the following:

  • Normal subtraction if both numbers are not nan
  • If one of the numbers is nan, then swap the nan with 0
  • If both numbers are nan, a `nan is returned

I don't know if there is a function which works like that out of the box, hence I have created a custom_sub. In avoidance of doubt, this is the file I am using

grid.txt

,Code,Col1,Col2
0,A,1.7653,56.2
1,B,1,
2,C,,5
3,D,34.4,0

The code:

import pandas as pd
import numpy as np

df = pd.read_csv("grid.txt", sep=",",index_col=[0])  

del df["Code"]

def custom_sub(x1, x2):
    if np.isnan(x1) or np.isnan(x2):
       if np.isnan(x1) and np.isnan(x2):
          return np.nan
       else:
          return -x2 if np.isnan(x1) else x1
    else:
       return x1 - x2


s = df.shape[0] // 2

differences = pd.DataFrame([df.iloc[i].combine(df.iloc[df.shape[0]-i-1], custom_sub) for i in range(s)])

print(differences)
  • Related