Home > Back-end >  Conditional loop for a dataframe in R
Conditional loop for a dataframe in R

Time:09-26

I have a dataframe (tibble) containing information about sentences. The dataframe has the following structure:

word position category related_word sentence
a 1 det 2 1
man 2 noun 3 1
sees 3 verb 0 1
a 4 det 5 1
horse 5 noun 3 1
and 6 conj 7 1
a 7 det 8 1
dog 8 noun 3 1

I would like to create a loop that looks at every sentence in the dataframe (the sentence number is in the last column), then if there is a noun in the dataframe (category =="noun"), finds its related word by using the value of related_word in the same row as the noun. The value of related_word corresponds to the position of the related word. The loop would then add both words (the noun and its related word) in a new column (in the format "word" "word").

For the dataframe I provided below, there are three nouns in the first sentence. So the loop would first use the first noun (=="man"), and find its related word by using the value of related_word (==3). Since this value == 3, that related word is "sees". Then the loop would write in the same row as the word "man" the complete pair, i.e. "man sees" in a new column (called "pair").

For the remaining two nouns ("horse" and "dog", the new column would hold the following values: "horse see" and "dog see".

How could I approach this? There are a few problems here but the main one is how to use the value of related_word in order to find the values of a different variables. E.g. how can I get from "man" to "sees"?

CodePudding user response:

You can join the table on itself.. (join on sentence, and on position equaling related_word). Here is a start - perhaps give us more information about what you want the output to look like?

df %>%
  inner_join(filter(df,category=="noun"), by=c("sentence"="sentence", "position"="related_word")) %>% 
  mutate(newcol = paste(word.y,word.x)) %>% 
  select(sentence, newcol)

Output:

# A tibble: 3 × 2
  sentence newcol    
     <int> <chr>     
1        1 man sees  
2        1 horse sees
3        1 dog sees  

The output can be something slightly different: wrap the above in a left_join() [ notice that in this iteration I retain position.y in the final select statement, to facilitate the join:

df %>% left_join(
  df %>%
    inner_join(filter(df,category=="noun"), by=c("sentence"="sentence", "position"="related_word")) %>% 
    mutate(newcol = paste(word.y,word.x)) %>% 
    select(sentence, position.y, newcol),
  by=c("sentence"="sentence", "position" = "position.y")
)

Output:

# A tibble: 8 × 6
  word  position category related_word sentence newcol    
  <chr>    <int> <chr>           <int>    <int> <chr>     
1 a            1 det                 2        1 NA        
2 man          2 noun                3        1 man sees  
3 sees         3 verb                0        1 NA        
4 a            4 det                 5        1 NA        
5 horse        5 noun                3        1 horse sees
6 and          6 conj                7        1 NA        
7 a            7 det                 8        1 NA        
8 dog          8 noun                3        1 dog sees  
  • Related