I need to add new columns to a dataframe. Every column has a header and a value across all the rows (the value is the same for all the columns).
Right now im doing something like this:
array_of_new_headers = [...]
for column in array_of_new_headers:
df[column] = 0
As a result I'm getting this message:
PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling
frame.insert
many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, usenewframe = frame.copy()
It tells me to use concat, but, I don't need to concatenate two dataframes really, should I use concat for better performance and better code? To me it doesn't really make sense unless I think of the arrays as also dataframes maybe.
CodePudding user response:
You can pass an unpacked dictionary with keys as column names, and values as value for the columns to pandas.DataFrame.assign
:
>>> array_of_new_headers = [...]
>>> df.assign(**{c:0 for c in array_of_new_headers})
But the operation is immutable, so make sure to assign it back to the required variable.
CodePudding user response:
should I use concat for better performance
Beware so-called premature optimization, if your code does work rapidly enough for your needs then you might end simply wasting your time on trying to make it faster.