I have to create a new dataframe in which each column is determined by a function which has two arguments. The problem is that for each column the function needs a different argument which is given by the number of the column. There are about 6k rows and 200 columns in the dataframe:
The function that defines each column of the new dataframe is defined like this:
def phiNT(M,nT):
M=M[M.columns[:nT]]
d=pd.concat([M.iloc[:,nT-1]]*nT,axis=1)
d.columns=M.columns
D=M-d
D=D.mean(axis=1)
return D
I tried to create an empty dataframe and then add each column using a loop:
A=pd.DataFrame()
for i in range(1,len(M.columns)):
A[i]=phiNT(M,i)
But this is what pops up:
PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling `frame.insert` many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use `newframe = frame.copy()`
So i need a way to apply pd.concat to create all columns at once.
CodePudding user response:
you should create all dataframes in a list or generator then call pd.concat
on the list or generator to create a new dataframe with all the dataframe columns in it, instead of doing it once for each column.
the following uses a generator to be memory efficient.
results = (phiNT(M,i) for i in range(1,len(M.columns)))
A = pd.concat(results,axis=1)
this is how it'd be done in a list.
A = pd.concat([phiNT(M,i) for i in range(1,len(M.columns))],axis=1)