Why doesn't ungroup() work when counting number of different columns containing a value by vari-CodePudding

I have in my dataset a variable called APP_ID which values can be repeated, and several columns which names start by “NICE_”. I’m trying to count, by APP_ID, the number of different “NICE_” columns that contains a value different than zero. For instance, for the following APP_ID the value should be 7 because there are ones in 7 different columns.:

APP_ID	NICE_2	NICE_3	NICE_15	NICE_28	NICE_32	NICE_33	NICE_35
38	0	0	0	0	0	1	0
38	0	0	1	0	0	1	0
38	0	0	1	0	0	1	0
38	0	0	1	0	0	1	0
38	0	0	1	0	0	1	0
38	0	0	0	0	1	0	0
38	1	1	0	1	1	0	1
38	0	0	0	0	0	1	0

The point is that I managed to do so, by using:

data_QI %>%
  arrange(APP_ID) %>% # Just to have them in order
  select(APP_ID, starts_with("NICE_")) %>% # Select only the variables I’ll work with
  group_by(APP_ID) %>%
  summarize_all(list(sum)) %>%
  mutate(APPNICENUM = rowSums(. != 0) - 1 ) # -1 to not count the column APP_ID.

The problem comes when I try to ungroup the observations. I mean, I need the repeated APP_ID to have the same APPNICENUM. However, adding ungroup() to the pipe doesn’t seem to work.

I wonder if there is a way to ungroup the data frame once all those operations are done. Or maybe there is a different (and simplier?) approach to my problem.

CodePudding user response：

summarise_all will automatically remove the grouping, so there is no need for ungroup afterwards:

library(tidyverse)
data <- tribble(
  ~APP_ID, ~NICE_1, ~NICE_2,
  38, 0, 0,
  38, 0, 0,
  38, 1, 1
)
data
#> # A tibble: 3 x 3
#>   APP_ID NICE_1 NICE_2
#>    <dbl>  <dbl>  <dbl>
#> 1     38      0      0
#> 2     38      0      0
#> 3     38      1      1

result <-
  data %>%
  group_by(APP_ID) %>%
  summarize_all(list(sum)) %>%
  # -1 to not count APP_ID
  mutate(APPNICENUM = rowSums(. != 0) - 1)
result
#> # A tibble: 1 x 4
#>   APP_ID NICE_1 NICE_2 APPNICENUM
#>    <dbl>  <dbl>  <dbl>      <dbl>
#> 1     38      1      1          2

result %>% groups()
#> list()

^{Created on 2021-10-06 by the reprex package (v2.0.1)}

CodePudding user response：

Do you want to keep the same number of rows as your original data but just add a new column which has count of columns that have 1 in it for each APP_ID ?

Try this answer -

library(dplyr)

result <- data_QI %>%
  group_by(APP_ID) %>%
  mutate(res = sum(colSums(select(cur_data(), starts_with('NICE'))) > 0)) %>%
  ungroup

result

CodePudding user response：

We may do

library(dplyr)
data_QI %>%
    group_by(APP_ID) %>% 
    summarise(res = sum(across(everything(), sum)), .groups = 'drop')

APP_ID	NICE_2	NICE_3	NICE_15	NICE_28	NICE_32	NICE_33	NICE_35
38	0	0	0	0	0	1	0
38	0	0	1	0	0	1	0
38	0	0	1	0	0	1	0
38	0	0	1	0	0	1	0
38	0	0	1	0	0	1	0
38	0	0	0	0	1	0	0
38	1	1	0	1	1	0	1
38	0	0	0	0	0	1	0

APP_ID	NICE_2	NICE_3	NICE_15	NICE_28	NICE_32	NICE_33	NICE_35
38	0	0	0	0	0	1	0
38	0	0	1	0	0	1	0
38	0	0	1	0	0	1	0
38	0	0	1	0	0	1	0
38	0	0	1	0	0	1	0
38	0	0	0	0	1	0	0
38	1	1	0	1	1	0	1
38	0	0	0	0	0	1	0

APP_ID	NICE_2	NICE_3	NICE_15	NICE_28	NICE_32	NICE_33	NICE_35
38	0	0	0	0	0	1	0
38	0	0	1	0	0	1	0
38	0	0	1	0	0	1	0
38	0	0	1	0	0	1	0
38	0	0	1	0	0	1	0
38	0	0	0	0	1	0	0
38	1	1	0	1	1	0	1
38	0	0	0	0	0	1	0