I am trying to append the shifts in dataframe as per below code:
import pandas as pd
# list of strings
lst1 = range(10000000)
df = pd.DataFrame(list(zip(lst1)), columns =['mw'])
print(df)
lagSize=365
for col in df.columns:
for i in range(1, lagSize 1):
df["%s_%s_%s" % (col, i, -1)] = df[col].shift(i)
I am getting following warning:
PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
df["%s_%s_%s" % (col, i, -1)] = df[col].shift(i)
What is good way to do the following operation?
CodePudding user response:
Named columns
In case you want to also keep a key for each shift so you know what it is you can keep them in a dictionary and create a new dataframe from the dictionary. Here's an example:
import pandas as pd
# list of strings
lst1 = range(10000000)
df = pd.DataFrame(list(zip(lst1)), columns =['mw'])
print(df)
lagSize=365
shifts = {}
for col in df.columns:
for i in range(1, lagSize 1):
shifts['key' i] = df[col].shift(i)
dd = pd.DataFrame.from_dict(shifts)
print(dd)
Unnamed columns
If you keep a list instead it is even easier to create the next dataframe.
import pandas as pd
# list of strings
shiftsList = []
for col in df.columns:
for i in range(1, lagSize 1):
shiftsList.append(df[col].shift(i))
dl = pd.DataFrame(shiftsList)
print(dl)
After you've created the shifted Dataframe it's way faster to add the original 1 column rather than add each column as you go.