I have this dataframe
index turns conv
0 0 utt1 yes
1 1 utt2 yes
2 2 utt3 no
3 3 utt4 yes
4 0 utt5 yes
5 1 utt6 no
6 2 utt7 yes
I want to print two elements of the 'turns' column and the corresponding element of the 'conv' column but re-start the for loop
at index 0, so that utt4 and utt5 don't get connected. The code I have is this:
for i in range(len(df['turns'])):
if(i 1==len(df['turns'])):
break;
else:
print(df['turns'][i], df['turns'][i 1], df['conv'][i 1])
But currently it outputs:
utt1 utt2 yes
utt2 utt3 no
utt3 utt4 yes
utt4 utt5 yes
utt5 utt6 no
utt6 utt7 yes
Whereas I need it to output:
utt1 utt2 yes
utt2 utt3 no
utt3 utt4 yes
utt5 utt6 no
utt6 utt7 yes
(The idea is that of a sliding window but I couldn't figure out how to do that in a simpler way)
CodePudding user response:
If you just want to print, you could change your loop to:
for i in range(len(df['turns'])-1):
if df.loc[i 1, 'index'] == 0:
print()
else:
print(df['turns'][i], df['turns'][i 1], df['conv'][i 1])
output:
utt1 utt2 yes
utt2 utt3 no
utt3 utt4 yes
utt5 utt6 no
utt6 utt7 yes
A vectorial solution would be:
group = df['index'].eq(0).cumsum()
(df
.assign(turns2=df.groupby(group)['turns'].shift())
.dropna(subset=['turns2'])
[['turns2', 'turns', 'conv']]
.to_csv('out.csv', index=False, header=False, sep=' ')
)
out.csv
:
utt1 utt2 yes
utt2 utt3 no
utt3 utt4 yes
utt5 utt6 no
utt6 utt7 yes