Home > Software engineering >  How to iterate two or more columns and perform analysis in pandas?
How to iterate two or more columns and perform analysis in pandas?

Time:10-12

I have two dataframes, where one dataframe has 2 columns with 11 rows and another dataframe with 2 columns with 2 rows.

print(df)

Output is :

    C1  C2
0   1   1
1   2   2
2   3   3
3   4   4
4   5   5
5   6   6
6   7   7
7   9   9
8   11  13
9   10  11
10  12  11

Second dataframe is

print(df1)

Output is :

    Mean Dev
0   2    0.5
1   1    1.0

I'm trying to subtract each and every value from column 1 of df with 1st column 1st row Mean value and divinding with 2nd column 1st row Dev value. Below is the code

for i in range(0, len(df)):
    print((df['C1'][i] - df1['Mean'][0]) / (df1['Dev'][0]))

Output is :

-2.0
0.0
2.0
4.0
6.0
8.0
10.0
14.0
18.0
16.0
20.0

My question is how to perform the subtraction and dividing for every column with respect to the Mean and Dev columns. For example, i'm trying to write code

for i in range(0, len(df)):
        print((df['C2'][i] - df1['Mean'][1]) / (df1['Dev'][1]))

Followed by

for i in range(0, len(df)):
        print((df['C3'][i] - df1['Mean'][2]) / (df1['Dev'][2]))

Followed by

for i in range(0, len(df)):
        print((df['C4'][i] - df1['Mean'][3]) / (df1['Dev'][3]))

In the above codes, we are looping df values. How to loop the df1 values?

Can anyone help me with this?

CodePudding user response:

You can accomplish this without for loops taking advantage of elementwise subtraction the following way:

import pandas as pd

#Example data
df = pd.DataFrame({'C1': [i for i in range(1, 12)], 'C2': [i for i in range(2, 13)]})
#Example mean and standard deviation
df1 = pd.DataFrame({'Mean': [2, 1], 'Dev': [0.5, 1]})

#Transpose the mean column and subtract from the original dataframe
#Transpose the standard deviation column and divide
df_out = (df - df1['Mean'].to_numpy().T)/df1['dev'].to_numpy().T

This is assuming that the number of rows in the mean/standard deviation matrix is equal to the columns in the data matrix. It also assumes that each row number in the mean/standard deviation matrix corresponds to the same number column number in the data matrix.

  • Related