How to subtract X rows in a dataframe with first value from another dataframe?-CodePudding

I am using pandas for this work.

I have a 2 datasets. The first dataset has approximately 6 million rows and 6 columns. For example the first data set looks something like this:

Date	Time	U	V	W	T
2020-12-30	2:34	3	4	5	7
2020-12-30	2:35	2	3	6	5
2020-12-30	2:36	1	5	8	5
2020-12-30	2:37	2	3	0	8
2020-12-30	2:38	4	4	5	7
2020-12-30	2:39	5	6	5	9

this is just the raw data collected from the machine.

The second is the average values of three rows at a time from each column (U,V,W,T).

U	V	W	T
2	4	6.33	5.67
3.66	4.33	3.33	8

What I am trying to do is calculate the perturbation for each column per second.

U(perturbation)=U(raw)-U(avg)

U(raw)= dataset 1

U(avg)= dataset 2

Basically take the first three rows from the first column of the first dataset and individually subtract them by the first value in the first column of the second dataset, then take the next three values from the first column of the first data set and individually subtract them by second value in the first column of the second dataset. Do the same for all three columns.

The desired final output should be as the following:

Date	Time	U	V	W	T
2020-12-30	2:34	1	0	-1.33	1.33
2020-12-30	2:35	0	-1	-0.33	-0.67
2020-12-30	2:36	-1	1	1.67	-0.67
2020-12-30	2:37	-1.66	-1.33	-3.33	0
2020-12-30	2:38	0.34	-0.33	1.67	-1
2020-12-30	2:39	1.34	1.67	1.67	1

I am new to pandas and do not know how to approach this. I hope it makes sense.

CodePudding user response：

a = df1.assign(index = df1.index // 3).merge(df2.reset_index(), on='index')
b = a.filter(regex = '_x', axis=1) - a.filter(regex = '_y', axis = 1).to_numpy()
pd.concat([a.filter(regex='^[^_] $', axis = 1), b], axis = 1)

         Date  Time  index   U_x   V_x   W_x   T_x
0  2020-12-30  2:34      0  0.00  0.00 -1.33  1.33
1  2020-12-30  2:35      0 -1.00 -1.00 -0.33 -0.67
2  2020-12-30  2:36      0 -2.00  1.00  1.67 -0.67
3  2020-12-30  2:37      1 -1.66 -1.33 -3.33  0.00
4  2020-12-30  2:38      1  0.34 -0.33  1.67 -1.00
5  2020-12-30  2:39      1  1.34  1.67  1.67  1.00

CodePudding user response：

You can use numpy:

import numpy as np

df1[df2.columns] -= np.repeat(df2.to_numpy(), 3, axis=0)

NB. This modifies df1 in place, if you want you can make a copy first (df_final = df1.copy()) and apply the subtraction on this copy.