Home > Net >  NLP : Find word position in string if I have only character position
NLP : Find word position in string if I have only character position

Time:05-20

I have a dataframe with 2 columns: text,char_position, with the first column having a sentence and the second column has position of the starting character of a specific word. For example :

text                       char_position
This is an example sentence.   11

Here the char_position appends to the starting character of the word example.

My question is : Is there a way to create a new column named word_position and have the position of the word that the char_position mentions? I.e. in this example it would be word_position = 3.

CodePudding user response:

counting the spaces would work:

text = "This is an example sentence."
char_pos = 11;

word_position = 0;
for i in range(char_pos):
    if text[i].isspace():
        word_position  = 1;
        
print(word_position)

output:

3

you have to test all the edge cases yourself.

CodePudding user response:

Using a custom function and a list comprehension:

import re
def get_word(string, pos):
    for i, m in enumerate(re.finditer('\w ', string)):
        if m.end()>=pos:
            return i
            # to get the word use
            # return m.group()
            
df['word_position'] = [get_word(t, p) for t, p in
                       zip(df['text'], df['char_position'])]

output:

                           text  char_position  word_position
0  This is an example sentence.             11              3
  • Related