Home > Software engineering >  In R, chop off column after n words
In R, chop off column after n words

Time:06-04

I have a df with a text column, and a column with a wordcount value.

How can I delete the last n words of the text (specified in the 'wc' column) and save the output to a third column?

In other words, I need the "introductory" part of a bunch of texts, and I know when the intro ends, so I want to cut the text off at that point and save the intro in a new column.

df <- data.frame(text = c("this is a long text","this is also a long text", "another long text"),wc=c('1','2','1'))

Desired output:

text wc chopped_off_text
this is a long text 1 this is a long
this is also a long text 2 this is also a
another long text 1 another long

CodePudding user response:

require(quanteda)
#> Package version: 3.2.1
#> Unicode version: 13.0
#> ICU version: 69.1
#> Parallel computing: 8 of 8 threads used.
#> See https://quanteda.io for tutorials and examples.

df <- data.frame(text = c("this is a long text", 
                          "this is also a long text", 
                          "another long text"), 
                 wc = c(1, 2, 1))
corp <- corpus(df)
toks <- tokens(corp)
tokens_select(toks, startpos = rep(1, ndoc(toks)), endpos = ntoken(toks) - toks$wc)
#> Tokens consisting of 3 documents and 1 docvar.
#> text1 :
#> [1] "this" "is"   "a"    "long"
#> 
#> text2 :
#> [1] "this" "is"   "also" "a"   
#> 
#> text3 :
#> [1] "another" "long"

Created on 2022-06-04 by the reprex package (v2.0.1)

CodePudding user response:

You can use the word function from the stringr package to extract "words" in a sentence. str_count(text, "\\s") 1 counts the number of words present in the sentence.

library(stringr)
library(dplyr)

df %>% 
  mutate(chopped_off_text = 
           word(text, 1, end = str_count(text, "\\s")   1 - as.integer(wc)))

                      text wc chopped_off_text
1      this is a long text  1   this is a long
2 this is also a long text  2   this is also a
3        another long text  1     another long
  • Related