Correct way of adding new columns/headers to a dataframe-CodePudding

I need to add new columns to a dataframe. Every column has a header and a value across all the rows (the value is the same for all the columns).

Right now im doing something like this:

array_of_new_headers = [...]
for column in array_of_new_headers:
   df[column] = 0

As a result I'm getting this message:

PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()

It tells me to use concat, but, I don't need to concatenate two dataframes really, should I use concat for better performance and better code? To me it doesn't really make sense unless I think of the arrays as also dataframes maybe.

CodePudding user response：

You can pass an unpacked dictionary with keys as column names, and values as value for the columns to pandas.DataFrame.assign :

>>> array_of_new_headers = [...]
>>> df.assign(**{c:0 for c in array_of_new_headers})

But the operation is immutable, so make sure to assign it back to the required variable.

CodePudding user response：

should I use concat for better performance

Beware so-called premature optimization, if your code does work rapidly enough for your needs then you might end simply wasting your time on trying to make it faster.