Calculate Multiple Column Growth in Python Dataframe-CodePudding

The data I used look like this

data

Subject    2000_X1    2000_X2   2001_X1    2001_X2    2002_X1    2002_X2
1              100         50       120         45        110         50
2               95         40       100         45        105         50
3              110         45       100         45        110         40

I want to calculate each variable growth for each year so the result will look like this

Subject    2001_X1_gro      2001_X2_gro      2002_X1_gro      2002_X2_gro
1                  0.2             -0.1         -0.08333          0.11111
2             0.052632            0.125             0.05          0.11111
3             -0.09091                0              0.1         -0.11111

I already do it manually for each variable for each year with code like this

data[2001_X1_gro]= (data[2001_X1]-data[2000_X1])/data[2000_X1]
data[2002_X1_gro]= (data[2002_X1]-data[2001_X1])/data[2001_X1]
data[2001_X2_gro]= (data[2001_X2]-data[2000_X2])/data[2000_X2]
data[2002_X2_gro]= (data[2002_X2]-data[2001_X2])/data[2001_X2]

Is there a way to do it more efficient escpecially if I have more year and/or more variable?

CodePudding user response：

import pandas as pd

df = pd.read_csv('data.txt', sep=',', header=0)

Input

   Subject  2000_X1  2000_X2  2001_X1  2001_X2  2002_X1  2002_X2
0        1      100       50      120       45      110       50
1        2       95       40      100       45      105       50
2        3      110       45      100       45      110       40

Next, a loop is created and the columns are filled:

qqq = '_gro'
new_name = ''
year = ''
for i in range(1, len(df.columns) - 2):
    year = str(int(df.columns[i][:4])   1)   df.columns[i][4:]
    new_name = year   qqq
    df[new_name] = (df[year] - df[df.columns[i]])/df[df.columns[i]]

print(df)

Output

   Subject  2000_X1  2000_X2  2001_X1  2001_X2  2002_X1  2002_X2  2001_X1_gro  \
0        1      100       50      120       45      110       50     0.200000   
1        2       95       40      100       45      105       50     0.052632   
2        3      110       45      100       45      110       40    -0.090909   

   2001_X2_gro  2002_X1_gro  2002_X2_gro  
0       -0.100    -0.083333     0.111111  
1        0.125     0.050000     0.111111  
2        0.000     0.100000    -0.111111

In the loop, the year is extracted from the column name, converted to int, 1 is added to it. The value is again converted to a string, the prefix '_Xn' is added. A new_name variable is created, to which the string '_gro ' is also appended. A column is created and filled with calculated values.

If you want to count, for example, for three years, then you need to add not 1, but 3. This is with the condition that your data will be ordered. And note that the loop does not go through all the elements: for i in range(1, len(df.columns) - 2):. In this case, it skips the Subject column and stops short of the last two values. That is, you need to know where to stop it.