I have a dataframe with 2 columns: text,char_position
, with the first column having a sentence and the second column has position of the starting character of a specific word. For example :
text char_position
This is an example sentence. 11
Here the char_position
appends to the starting character of the word example.
My question is : Is there a way to create a new column named word_position
and have the position of the word that the char_position
mentions? I.e. in this example it would be word_position = 3
.
CodePudding user response:
counting the spaces would work:
text = "This is an example sentence."
char_pos = 11;
word_position = 0;
for i in range(char_pos):
if text[i].isspace():
word_position = 1;
print(word_position)
output:
3
you have to test all the edge cases yourself.
CodePudding user response:
Using a custom function and a list comprehension:
import re
def get_word(string, pos):
for i, m in enumerate(re.finditer('\w ', string)):
if m.end()>=pos:
return i
# to get the word use
# return m.group()
df['word_position'] = [get_word(t, p) for t, p in
zip(df['text'], df['char_position'])]
output:
text char_position word_position
0 This is an example sentence. 11 3