I have a dataframe with count information (df1)
rownames | sample1 | sample2 | sample3 |
---|---|---|---|
m1 | 0 | 5 | 1 |
m2 | 1 | 7 | 5 |
m3 | 6 | 2 | 0 |
m4 | 3 | 1 | 0 |
and a second with sample information (df2)
rownames | batch | total count |
---|---|---|
sample1 | a | 10 |
sample2 | b | 15 |
sample3 | a | 6 |
I also have two lists with information about the m values (could easily be turned into another data frame if necessary but I would rather not add to the count information as it is quite large). No patterns (such as even and odd) exist, I am just using a very simplistic example
x <- c("m1", "m3")
and y <- c("m2", "m4")
What I would like to do is add another two columns to the sample information. This is a count of each m per sample that has a value of above 5 and appears in list x or y
rownames | batch | total count | x | y |
---|---|---|---|---|
sample1 | a | 10 | 1 | 0 |
sample2 | b | 15 | 1 | 1 |
sample3 | a | 6 | 0 | 1 |
My current strategy is to make a list of values for both x and y and then append them to df2. Here are my attempts so far:
numX <- colSums(df1[sum(rownames(df1)>10 %in% x),])
and numX <- colSums(df1[sum(rownames(df1)>10 %in% x),])
both return a list of 0s
numX <- colSums(df1[rownames(df1)>10 %in% x,])
returns a list of the sum of count values meeting the conditions for each column
numX <- length(df1[rownames(df1)>10 %in% novel,])
returns the number of times the condition is met (in this example 2L)
I am not really sure how to approach this so I have just been throwing around attempts. I've tried looking for answers but maybe I am just struggling to find the proper wording.
CodePudding user response:
How about using using dplyr
and reshape2::melt
df3 <- df1 %>%
melt %>%
filter(value >= 5) %>%
mutate(x = as.numeric(rownames %in% c("m1", "m3")),
y = as.numeric(rownames %in% c("m2", "m4"))) %>%
select(-rownames, - value) %>%
group_by(variable) %>%
summarise(x = sum(x), y = sum(y))
df2 %>% left_join(df3, by = c("rownames" = "variable"))
rownames batch total_count x y
1 sample1 a 10 1 0
2 sample2 b 15 1 1
3 sample3 a 6 0 1
CodePudding user response:
You can create a named list of vectors and for each rownames
count how many values of x
and y
in the respective sample
is >= 5
.
Base R option -
list_vec <- list(x = x, y = y)
cbind(df2, do.call(rbind, lapply(df2$rownames, function(x)
sapply(list_vec, function(y) {
sum(df1[[x]][df1$rownames %in% y] >= 5)
}))))
# rownames batch total.count x y
#1 sample1 a 10 1 0
#2 sample2 b 15 1 1
#3 sample3 a 6 0 1
Using tidyverse
-
library(dplyr)
library(purrr)
list_vec <- lst(x, y)
df2 %>%
bind_cols(map_df(df2$rownames, function(x)
map(list_vec, ~sum(df1[[x]][df1$rownames %in% .x] >= 5))))
CodePudding user response:
We may do this with rowwise
library(dplyr)
df2 %>%
rowwise %>%
mutate(x = (sum(df1[[rownames]][df1$rownames %in% x]) >= 5),
y = (sum(df1[[rownames]][df1$rownames %in% y]) >= 5)) %>%
ungroup
-output
# A tibble: 3 × 5
rownames batch totalcount x y
<chr> <chr> <int> <int> <int>
1 sample1 a 10 1 0
2 sample2 b 15 1 1
3 sample3 a 6 0 1
Or based on the data, a base R
option would be
out <- aggregate(. ~ grp, FUN = sum,
transform(df1, grp = c('x', 'y')[1 (rownames %in% y)] )[-1])
df2[out$grp] <- (t(out[-1]) >= 5)
-output
> df2
rownames batch totalcount x y
1 sample1 a 10 1 0
2 sample2 b 15 1 1
3 sample3 a 6 0 1
data
df1 <- structure(list(rownames = c("m1", "m2", "m3", "m4"), sample1 = c(0L,
1L, 6L, 3L), sample2 = c(5L, 7L, 2L, 1L), sample3 = c(1L, 5L,
0L, 0L)), class = "data.frame", row.names = c(NA, -4L))
df2 <- structure(list(rownames = c("sample1", "sample2", "sample3"),
batch = c("a", "b", "a"), totalcount = c(10L, 15L, 6L)),
class = "data.frame", row.names = c(NA,
-3L))