Restart for loop in dataframe-CodePudding

I have this dataframe

   index turns conv
0      0  utt1  yes
1      1  utt2  yes
2      2  utt3   no
3      3  utt4  yes
4      0  utt5  yes
5      1  utt6   no
6      2  utt7  yes

I want to print two elements of the 'turns' column and the corresponding element of the 'conv' column but re-start the for loop at index 0, so that utt4 and utt5 don't get connected. The code I have is this:

for i in range(len(df['turns'])):
    if(i 1==len(df['turns'])):
      break;
    else:
      print(df['turns'][i], df['turns'][i 1], df['conv'][i 1])

But currently it outputs:

utt1 utt2 yes
utt2 utt3 no
utt3 utt4 yes
utt4 utt5 yes
utt5 utt6 no
utt6 utt7 yes

Whereas I need it to output:

utt1 utt2 yes
utt2 utt3 no
utt3 utt4 yes

utt5 utt6 no
utt6 utt7 yes

(The idea is that of a sliding window but I couldn't figure out how to do that in a simpler way)

CodePudding user response：

If you just want to print, you could change your loop to:

for i in range(len(df['turns'])-1):
    if df.loc[i 1, 'index'] == 0:
        print()
    else:
        print(df['turns'][i], df['turns'][i 1], df['conv'][i 1])

output:

utt1 utt2 yes
utt2 utt3 no
utt3 utt4 yes

utt5 utt6 no
utt6 utt7 yes

A vectorial solution would be:

group = df['index'].eq(0).cumsum()
(df
 .assign(turns2=df.groupby(group)['turns'].shift())
 .dropna(subset=['turns2'])
 [['turns2', 'turns', 'conv']]
 .to_csv('out.csv', index=False, header=False, sep=' ')
)

out.csv:

utt1 utt2 yes
utt2 utt3 no
utt3 utt4 yes
utt5 utt6 no
utt6 utt7 yes