Fill in word that letter is located in-CodePudding

I am processing keystroke data, and need to find the word that a keystroke is located within. Because there can be invisible keystrokes (like Shift), this is not a trivial problem where I can just iterate the index of keystrokes, and locate the word. Rather, I need to find the space-delimited word that the keystroke is produced within. I do have the full text and existing text available, which I should be able to leverage. I've tried solutions using fill(), lag(), and cumsum(), but none are working.

I have a dataframe like the below, where I group by experiment_id:

x <- tibble(
  experiment_id = rep(c('1a','1b'),each=10),
  keystroke = rep(c('a','SPACE','SHIFT','b','a','d','SPACE','m','a','n'),2),
  existing_text = rep(c('a','a ','a ','a B','a Ba','a Bad','a Bad ',
                    'a Bad m','a Bad ma','a Bad man'),2),
  final_text = 'a Bad man'
)

The additional column should look like:

within_word = c('a',NA,'Bad','Bad','Bad','Bad',NA,'man','man','man')

Is there a way to derive this?

CodePudding user response：

x %>%
  mutate(ww = str_remove(existing_text, fixed(lag(existing_text, default = ".")))) %>%
  group_by(grp = cumsum(ww== ' '|lag(ww == ' ', default = F))) %>%
  mutate(within_word = str_c(ww, collapse = ''),
         within_word = na_if(within_word, ' '))

# A tibble: 10 x 6
# Groups:   grp [5]
   keystroke existing_text final_text ww      grp within_word
   <chr>     <chr>         <chr>      <chr> <int> <chr>      
 1 a         "a"           a Bad man  "a"       0 a          
 2 SPACE     "a "          a Bad man  " "       1 NA         
 3 SHIFT     "a "          a Bad man  ""        2 Bad        
 4 b         "a B"         a Bad man  "B"       2 Bad        
 5 a         "a Ba"        a Bad man  "a"       2 Bad        
 6 d         "a Bad"       a Bad man  "d"       2 Bad        
 7 SPACE     "a Bad "      a Bad man  " "       3 NA         
 8 m         "a Bad m"     a Bad man  "m"       4 man        
 9 a         "a Bad ma"    a Bad man  "a"       4 man        
10 n         "a Bad man"   a Bad man  "n"       4 man