Home > Mobile >  Why doesn't ungroup() work when counting number of different columns containing a value by vari
Why doesn't ungroup() work when counting number of different columns containing a value by vari

Time:10-07

I have in my dataset a variable called APP_ID which values can be repeated, and several columns which names start by “NICE_”. I’m trying to count, by APP_ID, the number of different “NICE_” columns that contains a value different than zero. For instance, for the following APP_ID the value should be 7 because there are ones in 7 different columns.:

APP_ID NICE_1 NICE_2 NICE_3 NICE_4 NICE_5 NICE_6 NICE_7 NICE_8 NICE_9 NICE_10 NICE_11 NICE_12 NICE_13 NICE_14 NICE_15 NICE_16 NICE_17 NICE_18 NICE_19 NICE_20 NICE_21 NICE_22 NICE_23 NICE_24 NICE_25 NICE_26 NICE_27 NICE_28 NICE_29 NICE_30 NICE_31 NICE_32 NICE_33 NICE_34 NICE_35 NICE_36 NICE_37 NICE_38 NICE_39 NICE_40 NICE_41 NICE_42 NICE_43 NICE_44 NICE_45
38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
38 0 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 0 0
38 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0

The point is that I managed to do so, by using:

data_QI %>%
  arrange(APP_ID) %>% # Just to have them in order
  select(APP_ID, starts_with("NICE_")) %>% # Select only the variables I’ll work with
  group_by(APP_ID) %>%
  summarize_all(list(sum)) %>%
  mutate(APPNICENUM = rowSums(. != 0) - 1 ) # -1 to not count the column APP_ID.

The problem comes when I try to ungroup the observations. I mean, I need the repeated APP_ID to have the same APPNICENUM. However, adding ungroup() to the pipe doesn’t seem to work.

I wonder if there is a way to ungroup the data frame once all those operations are done. Or maybe there is a different (and simplier?) approach to my problem.

CodePudding user response:

summarise_all will automatically remove the grouping, so there is no need for ungroup afterwards:

library(tidyverse)
data <- tribble(
  ~APP_ID, ~NICE_1, ~NICE_2,
  38, 0, 0,
  38, 0, 0,
  38, 1, 1
)
data
#> # A tibble: 3 x 3
#>   APP_ID NICE_1 NICE_2
#>    <dbl>  <dbl>  <dbl>
#> 1     38      0      0
#> 2     38      0      0
#> 3     38      1      1

result <-
  data %>%
  group_by(APP_ID) %>%
  summarize_all(list(sum)) %>%
  # -1 to not count APP_ID
  mutate(APPNICENUM = rowSums(. != 0) - 1)
result
#> # A tibble: 1 x 4
#>   APP_ID NICE_1 NICE_2 APPNICENUM
#>    <dbl>  <dbl>  <dbl>      <dbl>
#> 1     38      1      1          2

result %>% groups()
#> list()

Created on 2021-10-06 by the reprex package (v2.0.1)

CodePudding user response:

Do you want to keep the same number of rows as your original data but just add a new column which has count of columns that have 1 in it for each APP_ID ?

Try this answer -

library(dplyr)

result <- data_QI %>%
  group_by(APP_ID) %>%
  mutate(res = sum(colSums(select(cur_data(), starts_with('NICE'))) > 0)) %>%
  ungroup

result

CodePudding user response:

We may do

library(dplyr)
data_QI %>%
    group_by(APP_ID) %>% 
    summarise(res = sum(across(everything(), sum)), .groups = 'drop')
  • Related