I'd like to figure out a way to compare columns in the SAME data frame, but in such a way that I create a new column called STATUS
for the output. I have 3 columns 1)SNPs
, 2)gained
, and 3)lost
. I want to know if the data in each cell in column 1 is present in either column 2 or 3. If the data from column 1 is present in column 2 then I would want the output to say GAINED
, and if its present in column 3 then the output would be LOST
. If it's present in either then the output will be NEUTRAL
.
Here is what I would like:
SNPs GAINED LOST STATUS
1_752566 1_949654 6_30022061 NEUTRAL
1_776546 1_1045331 6_30314321 NEUTRAL
1_832918 1_832918 13_95612033 GAINED
1_914852 1_1247494 1_914852 LOST
I've tried this:
data_frame$status <- data.frame(lapply(data_frame[1], `%in%`, data_frame[2:3]))
but it produces 2 columns that all say NEUTRAL
. I believe it's reading per row to see if it matches, but my data isn't organized in that manner such that it will find every match per row. Instead I'd like to search the entire column and have R find the matches in each cell instead of searching per row.
CodePudding user response:
You don't need lapply
or anything fancy like that.
data_frame$STATUS = with(data_frame,
ifelse(SNPs %in% GAINED, "GAINED",
ifelse(SNPs %in% LOST, "LOST", "NEUTRAL")
)
)
Note that the way this is written the GAINED condition is checked first so if it is present in both GAINED and LOST the result will be "GAINED".
CodePudding user response:
Using a nested ifelse
should work, and be fairly understandable if indented properly:
tbl$status <- ifelse(tbl$SNPs %in% tbl$GAINED, "GAINED",
ifelse(tbl$SNPs %in% tbl$LOST, "LOST", "NEUTRAL") )
> tbl
SNPs GAINED LOST STATUS status
1 1_752566 1_949654 6_30022061 NEUTRAL NEUTRAL
2 1_776546 1_1045331 6_30314321 NEUTRAL NEUTRAL
3 1_832918 1_832918 13_95612033 GAINED GAINED
4 1_914852 1_1247494 1_914852 LOST LOST
CodePudding user response:
A Tidyverse approach with case_when
library(tidyverse)
df <-
structure(
list(
SNPs = c("1_752566", "1_776546", "1_832918", "1_914852"),
GAINED = c("1_949654", "1_1045331", "1_832918", "1_1247494"),
LOST = c("6_30022061", "6_30314321", "13_95612033", "1_914852")
),
row.names = c(NA,-4L),
spec = structure(list(
cols = list(
SNPs = structure(list(), class = c("collector_character",
"collector")),
GAINED = structure(list(), class = c("collector_character",
"collector")),
LOST = structure(list(), class = c("collector_character",
"collector"))
),
default = structure(list(), class = c("collector_guess",
"collector")),
delim = ","
), class = "col_spec"),
class = c("spec_tbl_df",
"tbl_df", "tbl", "data.frame")
)
df %>%
mutate(STATUS = case_when(
SNPs %in% GAINED ~ 'GAINED',
SNPs %in% LOST ~ 'LOST',
TRUE ~ 'NEUTRAL'
))
#> # A tibble: 4 × 4
#> SNPs GAINED LOST STATUS
#> <chr> <chr> <chr> <chr>
#> 1 1_752566 1_949654 6_30022061 NEUTRAL
#> 2 1_776546 1_1045331 6_30314321 NEUTRAL
#> 3 1_832918 1_832918 13_95612033 GAINED
#> 4 1_914852 1_1247494 1_914852 LOST
Created on 2022-12-01 with reprex v2.0.2