Home > Back-end >  How to delete a row while iterating over a dataframe?
How to delete a row while iterating over a dataframe?

Time:07-19

I'm trying to do the following with an SRT (subtitles) file:

  • while a row does not appear on the screen for at least 5s
  • add text from the next row to current row with a space between AND replace current End_Time with next row End_Time
  • delete next row
  • go to next row

I have to do that on the dataframe dfClean with the edited timestamp fields and then do the same to the dataframe with the original SRT time format dfSRTForm so I can export the latter later as an SRT file.

My code to do that is this:

for i in dfClean.index:
    while dfClean.at[i, 'Difference'] < 5:
        dfClean.at[i, 'Text'] = dfClean.at[i, 'Text']   ' '   dfClean.at[i 1, 'Text']
        dfSRTForm.at[i, 'Text'] = dfSRTForm.at[i, 'Text']   ' '   dfSRTForm.at[i 1, 'Text']
    
        dfClean.at[i, 'End_Time'] = dfClean.at[i 1, 'End_Time']
        dfSRTForm.at[i, 'End_Time'] = dfSRTForm.at[i 1, 'End_Time']
    
        dfClean = dfClean.drop(i 1)
        dfSRTForm = dfSRTForm.drop(i 1)

But I get this error:

KeyError: 3

UPDATE (keeping previous if anyone else is having the same issue): I found a way to reset the index to avoid KeyError: 3

My current code is:

for i in dfClean.index:
    while dfClean.at[i, 'Difference'] < 5:
        dfClean.at[i, 'Text'] = dfClean.at[i, 'Text']   ' '   dfClean.at[i 1, 'Text']
        dfSRTForm.at[i, 'Text'] = dfSRTForm.at[i, 'Text']   ' '   dfSRTForm.at[i 1, 'Text']
    
        dfClean.at[i, 'End_Time'] = dfClean.at[i 1, 'End_Time']
        dfSRTForm.at[i, 'End_Time'] = dfSRTForm.at[i 1, 'End_Time']
    
        dfClean = dfClean.drop(i 1)
        dfSRTForm = dfSRTForm.drop(i 1)
    
        dfClean = dfClean.reset_index()
        dfClean = dfClean.drop(columns='index')
    
        dfSRTForm = dfSRTForm.reset_index()
        dfSRTForm = dfSRTForm.drop(columns='index')
    
        dfClean['Difference'] = (dfClean['End_Time'] - dfClean['Start_Time']).astype('timedelta64[s]')

But I get KeyError: 267 and I'm pretty sure it's because it condenses the rows to 266.

Is there a way to put "or end of index" or "or last row" in the while loop without hard coding the 266 lines? I want to use it for other SRT files with different varying number of rows.

CodePudding user response:

You can define an empty list, then loop over your dataframe rows and if it doesn't fulfil your condition save the index to that list.

After that do the following:

df = df.drop(index=your_indices)

CodePudding user response:

Without having a look at your data I cannot make a precise solution. But below should serve as an example of how to accomplish what you are doing

dfClean['Difference'] = (dfClean['End_Time'] - dfClean['Start_Time']).astype('timedelta64[s]')

tmp_diff = 0
tmp_txt = ''
new_data = []
for i, row in dfClean.iterrows():
    if tmp_diff < 5:
        tmp_txt = ' '.join([tmp_row, row['Text'])
        tmp_diff  = row['Difference']
    else:
        new_row = dict(row)
        new_row['Text'] = tmp_txt
        new_row['End_Time'] = row['End_Time']
        new_row['Difference'] = tmp_diff
        new_data.append(new_row)
        
        tmp_txt = ''
        tmp_diff = 0

new_df = pd.DataFrame(new_data)
  • Related