I would like the number of single occurrences of some rows values across different columns. I have applied the following code:
dat = data.frame()
vector <- c(1, 2, 3)
for (i in names(data)){
for (j in vector){
dat[j,i] <- length(which(data[,i] == j))
}
}
print(dat)
That return exactly the output I am looking for. Does this code contain any redundancies? Could you please some more effective alternative way with the iterative method (including for loop) or with dplyr() packages?
Thanks
Here is a short extract of the dataset I am working on.
structure(list(run_set_1 = c(3, 3, 3, 3, 3, 3), run_set_2 = c(1,
1, 1, 1, 1, 1), run_set_3 = c(2, 2, 2, 2, 2, 2)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
CodePudding user response:
You could first match()
each column to
get the index in vector
that the column values correspond to, if any.
Then tabulate()
those to get the counts, including 0s:
lapply(data, match, vector) |>
sapply(tabulate, length(vector))
#> run_set_1 run_set_2 run_set_3
#> [1,] 0 6 0
#> [2,] 0 0 6
#> [3,] 6 0 0
CodePudding user response:
here is the tidyverse
version. I think it may be even shorter but I don't know yet.
library(dplyr)
library(tidyr)
data %>% pivot_longer(cols = everything()) %>%
group_by(name, value) %>% count() %>% ungroup() %>%
pivot_wider(names_from = name, values_from = n, values_fill = 0 ) %>%
arrange(value) %>% select(-value)
# last line only to remove the value column and fit your example
# # A tibble: 3 × 3
# run_set_1 run_set_2 run_set_3
# <int> <int> <int>
# 1 0 6 0
# 2 0 0 6
# 3 6 0 0