How to delete a row while iterating over a dataframe?-CodePudding

I'm trying to do the following with an SRT (subtitles) file:

while a row does not appear on the screen for at least 5s
add text from the next row to current row with a space between AND replace current End_Time with next row End_Time
delete next row
go to next row

I have to do that on the dataframe dfClean with the edited timestamp fields and then do the same to the dataframe with the original SRT time format dfSRTForm so I can export the latter later as an SRT file.

My code to do that is this:

for i in dfClean.index:
    while dfClean.at[i, 'Difference'] < 5:
        dfClean.at[i, 'Text'] = dfClean.at[i, 'Text']   ' '   dfClean.at[i 1, 'Text']
        dfSRTForm.at[i, 'Text'] = dfSRTForm.at[i, 'Text']   ' '   dfSRTForm.at[i 1, 'Text']
    
        dfClean.at[i, 'End_Time'] = dfClean.at[i 1, 'End_Time']
        dfSRTForm.at[i, 'End_Time'] = dfSRTForm.at[i 1, 'End_Time']
    
        dfClean = dfClean.drop(i 1)
        dfSRTForm = dfSRTForm.drop(i 1)

But I get this error:

KeyError: 3

UPDATE (keeping previous if anyone else is having the same issue): I found a way to reset the index to avoid KeyError: 3

My current code is:

for i in dfClean.index:
    while dfClean.at[i, 'Difference'] < 5:
        dfClean.at[i, 'Text'] = dfClean.at[i, 'Text']   ' '   dfClean.at[i 1, 'Text']
        dfSRTForm.at[i, 'Text'] = dfSRTForm.at[i, 'Text']   ' '   dfSRTForm.at[i 1, 'Text']
    
        dfClean.at[i, 'End_Time'] = dfClean.at[i 1, 'End_Time']
        dfSRTForm.at[i, 'End_Time'] = dfSRTForm.at[i 1, 'End_Time']
    
        dfClean = dfClean.drop(i 1)
        dfSRTForm = dfSRTForm.drop(i 1)
    
        dfClean = dfClean.reset_index()
        dfClean = dfClean.drop(columns='index')
    
        dfSRTForm = dfSRTForm.reset_index()
        dfSRTForm = dfSRTForm.drop(columns='index')
    
        dfClean['Difference'] = (dfClean['End_Time'] - dfClean['Start_Time']).astype('timedelta64[s]')

But I get KeyError: 267 and I'm pretty sure it's because it condenses the rows to 266.

Is there a way to put "or end of index" or "or last row" in the while loop without hard coding the 266 lines? I want to use it for other SRT files with different varying number of rows.

CodePudding user response：

You can define an empty list, then loop over your dataframe rows and if it doesn't fulfil your condition save the index to that list.

After that do the following:

df = df.drop(index=your_indices)

CodePudding user response：

Without having a look at your data I cannot make a precise solution. But below should serve as an example of how to accomplish what you are doing

dfClean['Difference'] = (dfClean['End_Time'] - dfClean['Start_Time']).astype('timedelta64[s]')

tmp_diff = 0
tmp_txt = ''
new_data = []
for i, row in dfClean.iterrows():
    if tmp_diff < 5:
        tmp_txt = ' '.join([tmp_row, row['Text'])
        tmp_diff  = row['Difference']
    else:
        new_row = dict(row)
        new_row['Text'] = tmp_txt
        new_row['End_Time'] = row['End_Time']
        new_row['Difference'] = tmp_diff
        new_data.append(new_row)
        
        tmp_txt = ''
        tmp_diff = 0

new_df = pd.DataFrame(new_data)