Getting percentage of 1's in each column in R dplyr-CodePudding

i have a DF like so:

row_id   stn_1 stn_2 stn_3 stn_4 stn_5
1        1     0     1     0     1
2        0     1     0     0     0
3        1     0     0     0     0
4        1     0     1     0     0
5        0     0     0     1     0

I want to get the percentage of stn's that appeared in the data. basically the percentage of 1's in each column except row_id.

expected output:

stn    percentage
stn_1  .60
stn_2  .20
stn_3  .40
stn_4  .20
stn_5  .20

How can I do this in dplyr?

CodePudding user response：

Using dplyr and tidyr, you can do

dd %>% 
  summarize(across(-row_id, mean)) %>% 
  pivot_longer(names_to="stn", values_to="percentage", everything())
#   stn   percentage
#   <chr>      <dbl>
# 1 stn_1        0.6
# 2 stn_2        0.2
# 3 stn_3        0.4
# 4 stn_4        0.2
# 5 stn_5        0.2

The summarize does the calculation and the pivot_longer does the reshaping.

CodePudding user response：

What about colMeans with a bit of tibble enframe-ing? (not dplyr but maybe close enough)

library(tibble)
library(dplyr)

df |>
  select(-row_id) |>
  colMeans() |>
  enframe(name = "stn", value = "percentage")

Output:

# A tibble: 5 × 2
  stn     percentage
  <chr>   <dbl>
1 stn_1   0.6
2 stn_2   0.2
3 stn_3   0.4
4 stn_4   0.2
5 stn_5   0.2

Data:

library(readr)

df <- read_table("row_id   stn_1 stn_2 stn_3 stn_4 stn_5
1        1     0     1     0     1
2        0     1     0     0     0
3        1     0     0     0     0
4        1     0     1     0     0
5        0     0     0     1     0")

CodePudding user response：

Update: As stated by @akrun we also can use plyr::numcolwise(mean)(df[-1]) %>% gather()

First answer: Here is one more. Honestly @MrFlick the idea with the mean was fantastic!!!

library(dplyr)
library(tibble)

df %>% 
  mutate(across(-row_id, ~sum(.)/nrow(df))) %>% 
  t() %>% 
  data.frame() %>% 
  slice(-1) %>% 
  rownames_to_column("stn") %>% 
  select(stn, percentage=X1)

    stn percentage
1 stn_1        0.6
2 stn_2        0.2
3 stn_3        0.4
4 stn_4        0.2
5 stn_5        0.2