I have built the following function and now .append will be removed from pandas in a future version. So I am weeling to convert this code with concat.
def MyDF(self,DF1,DF2):
OutputDf = pd.DataFrame([]).reset_index(drop=True)
for i in range(0,len(DF2)):
OutputDf = OutputDf.append(DF2.loc[[i]])
OutputDf = OutputDf.append(DF1.loc[(DF1['TheName'] == DF2['TheName'][i]) & (DF1['WGT'].apply(lambda x: float(x)) > 0) ])
OutputDf = OutputDf.reset_index(drop=True)
return OutputDf
I don't know how to use concat
in this case, so how would I avoid .append
there ?
Not sure that would work :
OutputDf = pd.Concat(OutputDf,DF2.loc[[i]])
CodePudding user response:
pandas.DataFrame.append
and pandas.Series.append
are Deprecated since version 1.4.0. See Deprecated DataFrame.append and Series.append
The alternative is using pandas.concat
.
In OP's case, .append()
is being used in two cases:
OutputDf = OutputDf.append(DF2.loc[[i]])
OutputDf = OutputDf.append(DF1.loc[(DF1['TheName'] == DF2['TheName'][i]) & (DF1['WGT'].apply(lambda x: float(x)) > 0) ])
Case 1
One can change to the following
OutputDf = pd.concat([OutputDf, DF2.loc[[i]]], ignore_index=True)
Case 2
One can change to the following
OutputDf = pd.concat([OutputDf, DF1.loc[(DF1['TheName'] == DF2['TheName'][i]) & (DF1['WGT'].apply(lambda x: float(x)) > 0) ]], ignore_index=True)
Notes:
- As I do not have access to the dataframes and do not know the desired output, one might have to do some adjustments.
CodePudding user response:
I think pandas.concat() is easy to understand, so that, you just tell good bye to append and keep up to pandas.
At the beginning, just attention to objs, ignore_index and axis arguments. If you want to add rows one under the other, just you can give this with axis=0 argument. If you give axis=0, you can concat dataFrame objects vertically like .append()
. If you give axis=1, this process will be done horizontally like the documentation says:
axis : {0/’index’, 1/’columns’}, default 0
The axis to concatenate along.
Also, you can use ignore_index
rather than reset_index
. To organize indexes, you can use ignore_index=True
argument.
Summarily, if you have 2 dataframes to concat like your question, you can use something like this:
def MyDF(self,DF1,DF2):
OutputDf = pd.DataFrame([]).reset_index(drop=True)
for i in range(0,len(DF2)):
process1 = DF2.loc[[i]]
process2 = DF1.loc[(DF1['TheName'] == DF2['TheName'][i]) & (DF1['WGT'].apply(lambda x: float(x)) > 0) ]
OutputDf = pd.concat([process1, process2], ignore_index=True)
return OutputDf
You can make this code much shorter but it will decrease to readability, obviously. You may want to use:
def MyDF(self,DF1,DF2):
OutputDf = pd.DataFrame([]).reset_index(drop=True)
for i in range(0,len(DF2)):
OutputDf = pd.concat([DF2.loc[[i]], DF1.loc[(DF1['TheName'] == DF2['TheName'][i]) & (DF1['WGT'].apply(lambda x: float(x)) > 0) ]], ignore_index=True)
return OutputDf
Or, you give the pd.concat() part to return, but it will be harder to read, so that, it is your decision. Just don't forget to use [] in your code, be careful that the usage of concat:
pd.concat([process1, process2]) # use [] inside concat for dataframes
If you directly use pd.concat(process1, process2), it will give an error.