I am getting the following FutureWarning in my Python code:
FutureWarning: The frame.append method is deprecated and will be removed from pandas in a future version. Use pandas.concat instead.
Right now I was using the append function, in various parts of my code, to add rows to an existing DataFrame.
Example 1:
init_hour = pd.to_datetime('00:00:00')
orig_hour = init_hour timedelta(days=1)
while init_hour < orig_hour:
row = {'Hours': init_hour.time()}
df = df.append(row, ignore_index = True)
init_hour = init_hour timedelta(minutes=60)
Example 2:
row2 = {'date': tmp_date, 'false_negatives': fn, 'total': total}
df2 = df2.append(row2, ignore_index = True)
How could I solve this in a simple way without modifying much of the code before the sections above?
CodePudding user response:
Use pd.concat
instead of append. Preferably on a bunch of rows at the same time. Also, use date_range and timedelta_range whenever possible.
Example 1:
# First transformation
init_hour = pd.to_datetime('00:00:00')
orig_hour = init_hour timedelta(days=1)
rows = []
while init_hour < orig_hour:
rows.append({'Hours': init_hour.time()})
init_hour = init_hour timedelta(minutes=60)
df = pd.concat([df, pd.DataFrame(rows)], axis=0, ignore_index=True)
# Second transformation - just construct it without loop
hours = pd.Series(pd.date_range("00:00:00", periods=24, freq="H").time, name="Hours")
# Then insert/concat hours into your dataframe.
Example 2
Don't see the context so it's harder to know what's appropriate. Either of these two
# Alternative 1
row2 = {'date': tmp_date, 'false_negatives': fn, 'total': total}
row2df = pd.DataFrame.from_records([row2])
df2 = pd.concat([df2, row2df], ignore_index=True, axis=0) # how to handle index depends on context
# Alternative 2
# assuming integer monotonic index - assign a new row with loc
new_index = len(df2.index)
df2.loc[new_index] = row2
Note: there's a reason append is deprecated. Using it row by row leads to very slow code. So it's important that we do more than just a local translation of the code to not waste computer time when the analysis runs.
Pandas authors are trying to get us to understand and appreciate concatenating or building dataframes using more efficient means, i.e. not row-by-row. Combining many rows into one DataFrame and concatenating bigger dataframes together is the way to go.