i have a DF like so:
row_id stn_1 stn_2 stn_3 stn_4 stn_5
1 1 0 1 0 1
2 0 1 0 0 0
3 1 0 0 0 0
4 1 0 1 0 0
5 0 0 0 1 0
I want to get the percentage of stn's that appeared in the data. basically the percentage of 1's in each column except row_id.
expected output:
stn percentage
stn_1 .60
stn_2 .20
stn_3 .40
stn_4 .20
stn_5 .20
How can I do this in dplyr?
CodePudding user response:
Using dplyr
and tidyr
, you can do
dd %>%
summarize(across(-row_id, mean)) %>%
pivot_longer(names_to="stn", values_to="percentage", everything())
# stn percentage
# <chr> <dbl>
# 1 stn_1 0.6
# 2 stn_2 0.2
# 3 stn_3 0.4
# 4 stn_4 0.2
# 5 stn_5 0.2
The summarize
does the calculation and the pivot_longer
does the reshaping.
CodePudding user response:
What about colMeans
with a bit of tibble enframe
-ing? (not dplyr
but maybe close enough)
library(tibble)
library(dplyr)
df |>
select(-row_id) |>
colMeans() |>
enframe(name = "stn", value = "percentage")
Output:
# A tibble: 5 × 2
stn percentage
<chr> <dbl>
1 stn_1 0.6
2 stn_2 0.2
3 stn_3 0.4
4 stn_4 0.2
5 stn_5 0.2
Data:
library(readr)
df <- read_table("row_id stn_1 stn_2 stn_3 stn_4 stn_5
1 1 0 1 0 1
2 0 1 0 0 0
3 1 0 0 0 0
4 1 0 1 0 0
5 0 0 0 1 0")
CodePudding user response:
Update: As stated by @akrun we also can use plyr::numcolwise(mean)(df[-1]) %>% gather()
First answer: Here is one more. Honestly @MrFlick the idea with the mean was fantastic!!!
library(dplyr)
library(tibble)
df %>%
mutate(across(-row_id, ~sum(.)/nrow(df))) %>%
t() %>%
data.frame() %>%
slice(-1) %>%
rownames_to_column("stn") %>%
select(stn, percentage=X1)
stn percentage
1 stn_1 0.6
2 stn_2 0.2
3 stn_3 0.4
4 stn_4 0.2
5 stn_5 0.2