Using this dataframe:
## doc_id paragraph_id sentence_id token_id token lemma upos xpos
## 1 doc1 1 1 1 Linguistics Linguistic NOUN NNS
## 2 doc1 1 1 2 also also ADV RB
## 3 doc1 1 1 3 deals deal NOUN NNS
## 4 doc1 1 2 1 Something something NOUN NNS
## 5 doc1 1 2 2 Else else NOUN NNS
I'd like to have something like this in a .txt file:
Linguistic_NNS also_R deal_NN
something_NN else_NN
Except that using this code:
paste(text_anndf$lemma, "_", text_anndf$xpos, collapse = " ", sep = "")
I have this:
Linguistic_NN also_R deal_NN something_NN else_NN
Because it doesn't take into account the "sentence_id" values. Do I need to use a If or something similar ? Thanks
CodePudding user response:
something like this?
library(dplyr)
df %>%
unite(col = lemma_and_position,
lemma, xpos, sep = '_') %>%
group_by(sentence_id) %>%
summarise(lemma_and_position = paste(lemma_and_position, collapse = ' '))
re @stompers advice: providing test data with dput(your_data)
is always helpful!