Rename edit string if not first in index-CodePudding

I have pandas dataframe, containing information in the following format:

sentence_num	sent_word	tag	word_char	word_index
0	foo	B-foo	f	1
0	foo	B-foo	o	1
0	foo	B-foo	o	1
0	[ ]	B-ws	[ ]	2
0	bar	B-bar	b	3
0	bar	B-bar	a	3
0	bar	B-bar	r	3
1	john	B-name	j	1
1	john	B-name	o	1
1	john	B-name	h	1
1	john	B-name	n	1
1	[ ]	B-ws	[ ]	2
1	doe	B-sur	d	3
1	doe	B-sur	o	3
1	doe	B-sur	e	3

I want to rename tags if the char is not the first in the word:

sentence_num	sent_word	tag	word_char	word_index
0	foo	B-foo	f	1
0	foo	I-foo	o	1
0	foo	I-foo	o	1
0	[ ]	B-ws	[ ]	2
0	bar	B-bar	b	3
0	bar	I-bar	a	3
0	bar	I-bar	r	3
1	john	B-name	j	1
1	john	I-name	o	1
1	john	I-name	h	1
1	john	I-name	n	1
1	[ ]	B-ws	[ ]	2
1	doe	B-sur	d	3
1	doe	I-sur	o	3
1	doe	I-sur	e	3

Since the word index is repeating and the sentence num does not help me a lot, I am not sure how to group the data so that I get to the elements I want to edit.

CodePudding user response：

Use boolean indexing:

# is word_char not the first letter?
# and sent_word is not "[ ]"
m = ( df['sent_word'].str[0].ne(df['word_char']) 
    & df['sent_word'].ne('[ ]')
    )

# for those rows, change the B into I
df.loc[m, 'tag'] = 'I' df.loc[m, 'tag'].str[1:]

output:

    sentence_num sent_word     tag word_char  word_index
0              0       foo   B-foo         f           1
1              0       foo   I-foo         o           1
2              0       foo   I-foo         o           1
3              0       [ ]    B-ws       [ ]           2
4              0       bar   B-bar         b           3
5              0       bar   I-bar         a           3
6              0       bar   I-bar         r           3
7              1      john  B-name         j           1
8              1      john  I-name         o           1
9              1      john  I-name         h           1
10             1      john  I-name         n           1
11             1       [ ]    B-ws       [ ]           2
12             1       doe   B-sur         d           3
13             1       doe   I-sur         o           3
14             1       doe   I-sur         e           3