I'm working with dataframes, and need to delete a few rows as I iterate through them.
A brief overview: I read a row (N), compare it with the next 20 rows (till N 20), and delete a few rows between N and N 20 based on the comparison. I then go back to N 1, and compare that row with the next 20 rows, until N 1 20. I do not want to compare N 1 with the rows I've previously deleted.
However, as I delete the rows, the deletion is not reflected in the dataframe as I am traversing its original copy, and the change hasn't been reflected. Any solutions for this?
df = pd.read_csv(r"C:\snip\test.csv")
index_to_delete = []
for index, row in df.iterrows():
snip
for i in range(20):
if (index i 1) < len(df.index):
if condition:
index_to_delete.append(index i 1) #storing indices of rows to delete between N and N 20
df.loc[index, ['snip1', 'snip2']] = [snip, snip] #updating values in row N
df = df.drop(index_to_delete)
index_to_delete.clear()
CodePudding user response:
You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.
there are a many tricks to solve ploblem:
1: you can itrate over len of df
instead of itrate on df
.
for inx in range(len(df)):
try:
row = df.loc[inx]
except:
continue
2: store checked indexes and skip them
df = pd.read_csv(r"C:\snip\test.csv")
all_index_to_delete = []
index_to_delete = []
for index, row in df.iterrows():
if index in all_index_to_delete:
continue
snip
for i in range(20):
if (index i 1) < len(df.index):
if condition:
index_to_delete.append(index i 1) #storing indices of rows to delete between N and N 20
all_index_to_delete.append(index i 1) #storing indices of rows to delete between N and N 20
df.loc[index, ['snip1', 'snip2']] = [snip, snip] #updating values in row N
df = df.drop(index_to_delete)
index_to_delete.clear()