Home > Blockchain >  How to search for a value in multiple columns in R?
How to search for a value in multiple columns in R?

Time:06-27

I need to search through multiple columns. Then, if any of the values in the columns equals a condition, then I need to extract the value in the column to the left of the column searched.

Below is a sample of my code. I am searching the congress_1 and congress_2 columns for the value 108. If there are any such matches, then I need to extract the value in position_1 and position_2 respectively.

a2 <- data.frame(position_1 = c("Senator", "Rep", "Senator", "Rep"),
                 congress_1 = c(107, 107, 108, 109),
                 position_2 = c("Senator", "Rep", "Senator", "Rep"),
                 congress_2 = c(108, 108, 109, 110))

I would like the output to look something like this:

row position_108
1   "Senator"
2   "Senator"
3   "Rep"
4   "none"

I've been trying to use the which() function, but the issue is that I have a large data frame and there are many columns that I need to searched through. Because of the nature of my question, I do think that I need to use the ifelse() function. The challenge I've been having is how to extract the data to the left of the column if there is a match.

Any help is greatly appreciated!

CodePudding user response:

Does transmute with case_when do what you need?

library(dplyr)
a2 %>%
  transmute(answer = case_when(congress_1 == 108 ~ position_1,
                               congress_2 == 108 ~ position_2))

   answer
1 Senator
2     Rep
3 Senator
4    <NA>

CodePudding user response:

You can use sapply with if

position_108 <- sapply(1:nrow(a2) , \(x) if(a2$congress_1[x] == 108)
    a2$position_1[x] else if(a2$congress_2[x] == 108)
        a2$position_1[x] else "none")

ans <- data.frame(position_108)

ans

#>  position_108
#> 1      Senator
#> 2          Rep
#> 3      Senator
#> 4         none

CodePudding user response:

Assuming that 108 cannot be present several times in the same row across the columns that are to be searched, we can use the approach below. The solution is generalized in the sense that you get around having to type the variable names (in case you have very many).

library(dplyr)
library(tidyr)

a2 %>%
  pivot_longer(everything(), values_transform = ~ as.character(.)) %>%
  mutate(row = str_sub(name, -1) < lag(str_sub(name, -1)),
         row = replace_na(row, F),
         row = cumsum(row),
         position_108 = if_else(value == 108, lag(value), NA_character_),
         id = row_number()) %>%
  group_by(row) %>%
  filter((! F %in% is.na(position_108) & id == min(id)) |
          ! is.na(position_108)) %>%
  ungroup() %>%
  select(position_108)

# # A tibble: 4 × 1
#   position_108
#   <chr>       
# 1 Senator     
# 2 Rep         
# 3 Senator     
# 4 NA

I can't help thinking that I have dug myself into a hole on this one, though. Views are more than welcome.

  • Related