Home > Blockchain >  Rename edit string if not first in index
Rename edit string if not first in index

Time:10-19

I have pandas dataframe, containing information in the following format:

sentence_num sent_word tag word_char word_index
0 foo B-foo f 1
0 foo B-foo o 1
0 foo B-foo o 1
0 [ ] B-ws [ ] 2
0 bar B-bar b 3
0 bar B-bar a 3
0 bar B-bar r 3
1 john B-name j 1
1 john B-name o 1
1 john B-name h 1
1 john B-name n 1
1 [ ] B-ws [ ] 2
1 doe B-sur d 3
1 doe B-sur o 3
1 doe B-sur e 3

I want to rename tags if the char is not the first in the word:

sentence_num sent_word tag word_char word_index
0 foo B-foo f 1
0 foo I-foo o 1
0 foo I-foo o 1
0 [ ] B-ws [ ] 2
0 bar B-bar b 3
0 bar I-bar a 3
0 bar I-bar r 3
1 john B-name j 1
1 john I-name o 1
1 john I-name h 1
1 john I-name n 1
1 [ ] B-ws [ ] 2
1 doe B-sur d 3
1 doe I-sur o 3
1 doe I-sur e 3

Since the word index is repeating and the sentence num does not help me a lot, I am not sure how to group the data so that I get to the elements I want to edit.

CodePudding user response:

Use boolean indexing:

# is word_char not the first letter?
# and sent_word is not "[ ]"
m = ( df['sent_word'].str[0].ne(df['word_char']) 
    & df['sent_word'].ne('[ ]')
    )

# for those rows, change the B into I
df.loc[m, 'tag'] = 'I' df.loc[m, 'tag'].str[1:]

output:

    sentence_num sent_word     tag word_char  word_index
0              0       foo   B-foo         f           1
1              0       foo   I-foo         o           1
2              0       foo   I-foo         o           1
3              0       [ ]    B-ws       [ ]           2
4              0       bar   B-bar         b           3
5              0       bar   I-bar         a           3
6              0       bar   I-bar         r           3
7              1      john  B-name         j           1
8              1      john  I-name         o           1
9              1      john  I-name         h           1
10             1      john  I-name         n           1
11             1       [ ]    B-ws       [ ]           2
12             1       doe   B-sur         d           3
13             1       doe   I-sur         o           3
14             1       doe   I-sur         e           3

  • Related