Home > other >  Pandas perform identical transformation on multiple dataframes where inline is not an option
Pandas perform identical transformation on multiple dataframes where inline is not an option

Time:12-28

I have a situation where I want to shift multiple dataframes by one line. A for loop is the first thing that comes to mind, but that does not actually store the dataframe. Other SO posts suggest using the inline=True option, which works for certain transformations but not shift as inline is not an option. Obviously I could just manually do it by writing several lines of df = df.shift(1) but in the spirit of learning the most pythonic method... Here's a MRE, showing both a standard for loop and a function-based method:

import pandas as pd
import numpy as np

def shifter(df):
    df = df.shift(1)
    return df

data1 = {'a':np.random.randint(0, 5, 20), 
        'b':np.random.randint(0, 5, 20), 
        'c':np.random.randint(0, 5, 20)}
data2 = {'foo':np.random.randint(0, 5, 20), 
        'bar':np.random.randint(0, 5, 20), 
        'again':np.random.randint(0, 5, 20)}

df1 = pd.DataFrame(data=data1)
df2 = pd.DataFrame(data=data2)

print(df1)      
for x in [df1, df2]:
    x = x.shift(1)

print(df1)

for x in [df1, df2]:
    x = shifter(x)
    
print(df1)

CodePudding user response:

You need to assign back the content, not create a new variable:

def shifter(df):
    return  df.shift(1)

for x in [df1, df2]:
    x[:] = shifter(x)

Or, in place within the function:

def shifter(df):
    df[:] = df.shift(1)

for x in [df1, df2]:
    shifter(x)

CodePudding user response:

Another approach would be:

import pandas as pd
import numpy as np

def shifter(df):
    a = df.shift(1)
    return a

data1 = {'a':np.random.randint(0, 5, 20), 
        'b':np.random.randint(0, 5, 20), 
        'c':np.random.randint(0, 5, 20)}
data2 = {'foo':np.random.randint(0, 5, 20), 
        'bar':np.random.randint(0, 5, 20), 
        'again':np.random.randint(0, 5, 20)}

df1 = pd.DataFrame(data=data1)
df2 = pd.DataFrame(data=data2)


Y = []

for x in [df1, df2]:
    x = shifter(x)
    Y.append(x)
    
Ydf = pd.concat(Y, axis = 1)    
    
print(Ydf)

which returns

    a    b    c  foo  bar  again
0   NaN  NaN  NaN  NaN  NaN    NaN
1   2.0  3.0  3.0  3.0  2.0    1.0
2   3.0  1.0  1.0  4.0  3.0    0.0
3   0.0  1.0  4.0  2.0  2.0    4.0
4   0.0  3.0  0.0  3.0  2.0    1.0
5   4.0  4.0  4.0  2.0  2.0    2.0
6   0.0  1.0  3.0  4.0  2.0    3.0
7   1.0  0.0  3.0  4.0  3.0    3.0
8   4.0  2.0  3.0  0.0  0.0    2.0
9   1.0  1.0  4.0  4.0  3.0    2.0
10  4.0  3.0  4.0  0.0  1.0    4.0
11  0.0  3.0  1.0  1.0  2.0    4.0
12  2.0  0.0  4.0  4.0  3.0    0.0
13  3.0  3.0  4.0  4.0  1.0    2.0
14  4.0  1.0  4.0  0.0  1.0    0.0
15  3.0  4.0  0.0  3.0  0.0    3.0
16  1.0  4.0  2.0  0.0  3.0    1.0
17  3.0  2.0  0.0  0.0  2.0    0.0
18  1.0  1.0  1.0  3.0  3.0    4.0
19  1.0  3.0  2.0  1.0  1.0    0.0
  • Related