I need to group my dataset for 4 variables. Variables a,b,c, and d.
I wish to run the group_by
function or group the 4 columns like this-
group_by(col1, col2,col3,col4)
but it doesn't work, it only takes the first 3 columns.
CodePudding user response:
My guess, since you share no code: because you said "it only takes first 3 columns", that suggests that with four arguments, the first is being interpreted as the dataframe. From that: group_by(col1, col2, col3, col4)
with nothing before it is assuming that col1
is data. If you had
mydata # with or without this
group_by(col1, col2, col3, col4) %>%
...
then change it to
mydata %>%
group_by(col1, col2, col3, col4) %>%
...
or
group_by(mydata, col1, col2, col3, col4) %>%
...
CodePudding user response:
Using only group_by
does not make any visual change in the dataframe. You need to do something more after group_by
.
Here is an example with mtcars
dataset -
df <- mtcars[1:10, c(1, 2, 8, 9)]
df
# mpg cyl vs am
#Mazda RX4 21.0 6 0 1
#Mazda RX4 Wag 21.0 6 0 1
#Datsun 710 22.8 4 1 1
#Hornet 4 Drive 21.4 6 1 0
#Hornet Sportabout 18.7 8 0 0
#Valiant 18.1 6 1 0
#Duster 360 14.3 8 0 0
#Merc 240D 24.4 4 1 0
#Merc 230 22.8 4 1 0
#Merc 280 19.2 6 1 0
Using only group_by
-
df %>% group_by(cyl, vs, am)
# A tibble: 10 × 4
# Groups: cyl, vs, am [5]
# mpg cyl vs am
# <dbl> <dbl> <dbl> <dbl>
# 1 21 6 0 1
# 2 21 6 0 1
# 3 22.8 4 1 1
# 4 21.4 6 1 0
# 5 18.7 8 0 0
# 6 18.1 6 1 0
# 7 14.3 8 0 0
# 8 24.4 4 1 0
# 9 22.8 4 1 0
#10 19.2 6 1 0
You need to "tell" what you want to do after group_by
, for example you can sum
the mpg
values.
df %>% group_by(cyl, vs, am) %>% summarise(sum_mpg = sum(mpg), .groups = 'drop')
# cyl vs am sum_mpg
# <dbl> <dbl> <dbl> <dbl>
#1 4 1 0 47.2
#2 4 1 1 22.8
#3 6 0 1 42
#4 6 1 0 58.7
#5 8 0 0 33