Home > Net >  Correct way of adding new columns/headers to a dataframe
Correct way of adding new columns/headers to a dataframe

Time:03-05

I need to add new columns to a dataframe. Every column has a header and a value across all the rows (the value is the same for all the columns).

Right now im doing something like this:

array_of_new_headers = [...]
for column in array_of_new_headers:
   df[column] = 0

As a result I'm getting this message:

PerformanceWarning: DataFrame is highly fragmented. This is usually the result of calling frame.insert many times, which has poor performance. Consider joining all columns at once using pd.concat(axis=1) instead. To get a de-fragmented frame, use newframe = frame.copy()

It tells me to use concat, but, I don't need to concatenate two dataframes really, should I use concat for better performance and better code? To me it doesn't really make sense unless I think of the arrays as also dataframes maybe.

CodePudding user response:

You can pass an unpacked dictionary with keys as column names, and values as value for the columns to pandas.DataFrame.assign :

>>> array_of_new_headers = [...]
>>> df.assign(**{c:0 for c in array_of_new_headers})

But the operation is immutable, so make sure to assign it back to the required variable.

CodePudding user response:

should I use concat for better performance

Beware so-called premature optimization, if your code does work rapidly enough for your needs then you might end simply wasting your time on trying to make it faster.

  • Related