Updating Dataframe during Traversal-CodePudding

I'm working with dataframes, and need to delete a few rows as I iterate through them.

A brief overview: I read a row (N), compare it with the next 20 rows (till N 20), and delete a few rows between N and N 20 based on the comparison. I then go back to N 1, and compare that row with the next 20 rows, until N 1 20. I do not want to compare N 1 with the rows I've previously deleted.

However, as I delete the rows, the deletion is not reflected in the dataframe as I am traversing its original copy, and the change hasn't been reflected. Any solutions for this?

df = pd.read_csv(r"C:\snip\test.csv")
index_to_delete = []

for index, row in df.iterrows():
    snip

    for i in range(20):
        if (index   i   1) < len(df.index):
            if condition:
                index_to_delete.append(index   i   1) #storing indices of rows to delete between N and N 20

    df.loc[index, ['snip1', 'snip2']] = [snip, snip] #updating values in row N
    df = df.drop(index_to_delete)
    index_to_delete.clear()

CodePudding user response：

pandas.DataFrame.iterrows():

You should never modify something you are iterating over. This is not guaranteed to work in all cases. Depending on the data types, the iterator returns a copy and not a view, and writing to it will have no effect.

there are a many tricks to solve ploblem:

1: you can itrate over len of df instead of itrate on df.

for inx in range(len(df)):
    try:
        row = df.loc[inx]
    except:
        continue

2: store checked indexes and skip them

df = pd.read_csv(r"C:\snip\test.csv")
all_index_to_delete = []
index_to_delete = []

for index, row in df.iterrows():
    if index in all_index_to_delete:
        continue
    snip

    for i in range(20):
        if (index   i   1) < len(df.index):
            if condition:
                index_to_delete.append(index   i   1) #storing indices of rows to delete between N and N 20
                all_index_to_delete.append(index   i   1) #storing indices of rows to delete between N and N 20

    df.loc[index, ['snip1', 'snip2']] = [snip, snip] #updating values in row N
    df = df.drop(index_to_delete)
    index_to_delete.clear()