Home > Mobile >  How to adjust the range into iteration for if some rows are deleted
How to adjust the range into iteration for if some rows are deleted

Time:08-25

I have a dataframe with 12.675 rows at the begining.

Proyectonevera2.info()
<class 'pandas.core.frame.DataFrame'>
  Int64Index: 12689 entries, 3 to 12683
   Data columns (total 60 columns):

But I need to go over all the dataframe and delete some rows according some operation into there:

for i in range(0,len(Proyectonevera2)-1):
if Proyectonevera2.loc[i,'Clave'] == "P":
    CodIng = Proyectonevera2.loc[i,'Ingrediente']
    NuevoInv_Ing = Proyectonevera2.loc[i,'Nuevo Inventario']
    for j in range(0,len(Proyectonevera2)-1):
        if Proyectonevera2.loc[j,'Clave'] == "N":
            Codprod = Proyectonevera2.loc[j,'CodProducto']
            if Proyectonevera2.loc[j,'Ingrediente'] == CodIng:
                Proyectonevera2.loc[j,'Inv_Ing'] = NuevoInv_Ing
                if Proyectonevera2.loc[j,'Cant Ingr SUM'] !=0 and Proyectonevera2.loc[j,'Cant Prod SUM']!=0:
                    Proyectonevera2.loc[j,'Cantidad producir']=Proyectonevera2.loc[j,'Inv_Ing']*Proyectonevera2.loc[j,'Cant Prod SUM']/Proyectonevera2.loc[j,'Cant Ingr SUM']
                    dfc = Proyectonevera2.groupby('CodProducto')['Cantidad producir']
                    Proyectonevera2.assign(min=dfc.transform(min))
                    Proyectonevera2.loc[j,'Consumo Agosto'] = Proyectonevera2.loc[j,'min']*Proyectonevera2.loc[j,'Cant Ingr SUM']/Proyectonevera2.loc[j,'Cant Prod SUM']
                    Proyectonevera2.loc[j,'Nuevo Inventario'] = Proyectonevera2.loc[j,'Inv_Ing']-Proyectonevera2.loc[j,'Consumo Agosto']
                    Proyectonevera2= Proyectonevera2.drop(Proyectonevera2[Proyectonevera2['Clave']=="P"].index)
                    Proyectonevera2 = Proyectonevera2.reset_index(drop=True)

And the output that I have gotten is:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
C:\ProgramData\Anaconda3\lib\site-packages\pandas\core\indexes\range.py in get_loc(self, key, method, tolerance)
    384                 try:
--> 385                     return self._range.index(new_key)
    386                 except ValueError as err:

ValueError: 12675 is not in range

This is because at the begining the len of dataframe were 12.675 rows but if I delete some rows the len will be less, so I need to arrange in this part, to adjust the len

for i in range(0,len(Proyectonevera2)-1):

How can I manage it?

CodePudding user response:

Python is not C or other language that you have to worry about figuring out the range before iterating. Instead just iterate through the iterable directly. This allows you to iterate even if you don't know the size and prevents many off-by-one errors and an iterable changing size while processing.

So instead of:

for i in range(data):
    item = data[i]
    # do something with item

Do the following:

for item in data:
    # do something with item

If you do need the index as well, then you can use enumerate as follows:

for index, item in enumerate(data):
    # do something with item and/or index

CodePudding user response:

The advice given in the answer of Brian M. Sheldon does not cover the case of deleting rows/items of the iterable as part of their processing and will lead to side-effects of not visited rows/items and/or rising an index out of range error.

Generally it is a very bad idea to delete items/rows in a loop while processing the iterable data.

A good idea is to remember while processing the indices of the items/rows for later deletion. Pandas allow to delete multiple rows at once if provided with a list with their indices.

As a quick (not recommended) fix before restructuring the code in the question to separate processing and deleting of data could be looping over the dataframe backwards from the end to the start:

for j in range(len(Proyectonevera2)-2, -1, -1):

in order to get the error with not available indices out of the way.

But please put considerably effort into separating as good as possible deleting of items from their processing bearing in mind the danger of hard to track and to detect side-effects. And if possible use looping over items/rows and not over indices ( with for row in Proyectonevera2: or for indx, row in enumerate(Proyectonevera2):).

Let's give here only one bare minimal example of a side-effect using a small list instead of a pandas dataframe to make it as simple as possible to see the reason for the advice to separate content deletion from processing it in a loop.

The example below demonstrates that if deleting of items is involved Python for item in iterable: loop will silently not 'visit' all of the items of the iterable.

lst = [0,1,2,3,4]
for value in lst:
    print(f' > {value=}, {lst.index(value)=} {lst=}', end=' ')
    if value==1:
        lst.pop(lst.index(0)) # delete value 0 from list
        lst.pop(lst.index(1)) # delete value 1 from list
    if value==2:
        lst[lst.index(2)]='2' # change value from 2 to '2'
    if value==3:
        lst[lst.index(3)]='3' # change value from 3 to '3'
    try:               lst_index_value = lst.index(value)
    except ValueError: lst_index_value = None
    print(f' >> {value=}, lst.index(value)={lst_index_value} {lst=}')
print(f'  FINALLY: {lst=} WHERE values 2 and 3 were not "visited" in the loop' )

printing:

 > value=0, lst.index(value)=0 lst=[0, 1, 2, 3, 4]  >> value=0, lst.index(value)=0 lst=[0, 1, 2, 3, 4]
 > value=1, lst.index(value)=1 lst=[0, 1, 2, 3, 4]  >> value=1, lst.index(value)=None lst=[2, 3, 4]
 > value=4, lst.index(value)=2 lst=[2, 3, 4]  >> value=4, lst.index(value)=2 lst=[2, 3, 4]
  FINALLY: lst=[2, 3, 4] WHERE  values 2 and 3 were not "visited" in the loop
  • Related