Home > Software design >  searching for texting and storing results in new columns within the dataframe
searching for texting and storing results in new columns within the dataframe

Time:12-13

I have a data frame (df1) with one column, with each entry/row/observation consisting of a long string of text (df1$text). In a separate data frame (df2) I have one column, with each entry/row/observation consisting of a single name (df2$name).

I would like to note for each row in df1 which of the names in df2$name appear in the text. Ideally, I'd like to store whether a name appears in df1$text as a 1/0 value that is stored in a new column in df1 (i.e. dummy variables), that is named for that name:

> df1
  text
1 ...
2 ...
3 ...
4 ...

> df2
   name
1  John
2  James
3  Jerry
4  Jackson

After code is executed:

> df1
  text John James Jerry Jackson 
1 ...   1    1     0        1
2 ...   0    0     0        1 
3 ...   1    1     0        1
4 ...   1    0     0        1

Is there a way to do this without using a for loop? my text fields are long and I have many observations in both df1 and df2.

CodePudding user response:

I'm not sure that you did not provide reproducible example. So, I made dummy data df1 myself like

df1 <- data.frame(
  text = c("John James John Jakson",
           "Jackson abcd zxcv",
           "John Jackson James Jerr aa",
           "John Jackson JAJAJAJA")
)

                        text
1     John James John Jakson
2          Jackson abcd zxcv
3 John Jackson James Jerr aa
4      John Jackson JAJAJAJA

Then, you may try using dplyr like

library(dplyr)

df1 %>%
  mutate(John = as.numeric(grepl("John", text)),
         James = as.numeric(grepl("James", text)),
         Jerry = as.numeric(grepl("Jerry", text)),
         Jackson = as.numeric(grepl("Jackson", text))
         )

                        text John James Jerry Jackson
1     John James John Jakson    1     1     0       0
2          Jackson abcd zxcv    0     0     0       1
3 John Jackson James Jerr aa    1     1     0       1
4      John Jackson JAJAJAJA    1     0     0       1

CodePudding user response:

A base R option using lapply -

df1[df2$name] <- lapply(df2$name, function(x)  (grepl(x, df1$text)))

If you want the match to be case insensitive then add ignore.case = TRUE in grepl.

  • Related