Home > Mobile >  Grep in two or more columns of a dataframe
Grep in two or more columns of a dataframe

Time:05-04

I have a dataframe of which I want to know if certain strings appear whithin certain columns and then get their row numbers, for which I'm using:

keywords <- c("knowledge management", "gestión del conocimiento")
npox <- grep(paste(keywords, collapse = "|"), full[,c(7)], ignore.case = T)

However this does not work with two or more columns only one full[,c(7)] any one knows what can I do?
sample data (csv): https://tempsend.com/rxucj

CodePudding user response:

Instead of using grep on individual columns, use grepl on the whole dataframe as a character matrix. This will return a logical vector. Convert the logical vector to a matrix of the same dimension as the original data frame, then run which, specifying arr.ind = TRUE. This will give you the row and column of all matches for your regex.

keywords <- c("knowledge management", "gestión del conocimiento")
npox <- grepl(paste(keywords, collapse = "|"), as.matrix(full), ignore.case = T)

which(matrix(npox, nrow = nrow(full)), arr.ind = TRUE)
#>      row col
#> [1,]  16   8
#> [2,]  15   9
#> [3,]  15  10
#> [4,]  16  15
#> [5,]  16  23

For example, we can see that there is a match in the 16th row of the 8th column. We can confirm this by doing:

full[16, 8]
#> [1] "The Impact of Human Resource Management Practices, Organisational 
#> Culture, Organisational Innovation and Knowledge Management on Organisational
#> Performance in Large Saudi Organisations: Structural Equation Modeling With 
#> Conceptual Framework"

Where we see that "knowledge management" is present in this cell.

If you want to limit your results to certain columns, it is probably easiest to filter out the results afterwards. For example, suppose I store all the matches in full to a variable called matches:

matches <- which(matrix(npox, nrow = nrow(full)), arr.ind = TRUE)

But I am only interested in matches in columns 7, 8 and 9, then I can do:

matches[matches[,'col'] %in% c(7, 8, 9),]
#>      row col
#> [1,]  16   8
#> [2,]  15   9
  • Related