Home > OS >  Performant Way of appending shifted columns in existing dataframe
Performant Way of appending shifted columns in existing dataframe

Time:01-08

I am trying to append the shifts in dataframe as per below code:

import pandas as pd 
# list of strings 
lst1 = range(10000000)
df = pd.DataFrame(list(zip(lst1)), columns =['mw']) 
print(df)
lagSize=365
for col in df.columns:
    for i in range(1, lagSize   1):
        df["%s_%s_%s" % (col, i, -1)] = df[col].shift(i)

I am getting following warning:

PerformanceWarning: DataFrame is highly fragmented.  This is usually the result of calling `frame.insert` many times, which has poor performance.  Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
  df["%s_%s_%s" % (col, i, -1)] = df[col].shift(i)

What is good way to do the following operation?

CodePudding user response:

Named columns

In case you want to also keep a key for each shift so you know what it is you can keep them in a dictionary and create a new dataframe from the dictionary. Here's an example:

import pandas as pd

# list of strings 
lst1 = range(10000000)
df = pd.DataFrame(list(zip(lst1)), columns =['mw']) 
print(df)
lagSize=365

shifts = {}
for col in df.columns:
  for i in range(1, lagSize   1):
    shifts['key'   i] = df[col].shift(i)

dd = pd.DataFrame.from_dict(shifts)
print(dd)

Unnamed columns

If you keep a list instead it is even easier to create the next dataframe.

import pandas as pd

# list of strings 
shiftsList = []
for col in df.columns:
  for i in range(1, lagSize   1):
    shiftsList.append(df[col].shift(i))

dl = pd.DataFrame(shiftsList)
print(dl)

After you've created the shifted Dataframe it's way faster to add the original 1 column rather than add each column as you go.

  • Related