I'm computing the frequency by group under dplyr. But the output is not automatically saved as a dataframe and only shows the first 10 rows. Does anyone know how to do that? I need to use all rows of data for further analyses. THANKS!
library(dplyr)
data01 %>%
group_by(Country, relsta) %>%
summarize(Freq=n()) %>%
mutate (married = Freq/sum(Freq))
Output
Country relsta Freq married
<int> <chr> <int> <dbl>
1 1 1 15 0.176
2 1 3 1 0.0118
3 1 4 28 0.329
4 1 5 6 0.0706
5 1 6 22 0.259
6 1 7 1 0.0118
7 1 99 12 0.141
8 2 NA 273 1
9 3 NA 129 1
10 4 2 9 0.0796
# ... with 115 more rows
CodePudding user response:
The summarize
function always returns just one row per group. mutate
will keep all the rows here. Try:
library(dplyr)
data02 = data01 %>%
group_by(Country, relsta) %>%
mutate(Freq=n()) %>%
mutate (married = Freq/sum(Freq))
CodePudding user response:
dplyr
throws tibble
s, the output is just hidden from you. Here an example using iris
library(dplyr)
res1 <- iris %>%
group_by(Sepal.Length, Species) %>%
summarize(Freq=n()) %>%
mutate(foo = Freq/sum(Freq))
res1
# Sepal.Length Species Freq foo
# <dbl> <fct> <int> <dbl>
# 1 4.3 setosa 1 1
# 2 4.4 setosa 3 1
# 3 4.5 setosa 1 1
# 4 4.6 setosa 4 1
# 5 4.7 setosa 2 1
# 6 4.8 setosa 5 1
# 7 4.9 setosa 4 0.667
# 8 4.9 versicolor 1 0.167
# 9 4.9 virginica 1 0.167
# 10 5 setosa 8 0.8
# # … with 47 more rows
Notice the … with 47 more rows
. You may also check the dim
ensions:
dim(res1)
# [1] 57 4
Also,
class(res1)
# [1] "grouped_df" "tbl_df" "tbl" "data.frame"
whereas:
class(iris)
# [1] "data.frame"
To see more data, use as.data.frame()
. If the data is too large, rows also get omitted. You may customize that with e.g. options(max.print=3000)
where default is 1000
.
as.data.frame(res1)
# Sepal.Length Species Freq foo
# 1 4.3 setosa 1 1.0000000
# 2 4.4 setosa 3 1.0000000
# 3 4.5 setosa 1 1.0000000
# [...]
# 55 7.6 virginica 1 1.0000000
# 56 7.7 virginica 4 1.0000000
# 57 7.9 virginica 1 1.0000000
You could also consider using base R. Since following line already gives you the "Freq"
column,
as.data.frame.table(with(iris, table(Sepal.Length, Species)))
you could do this:
res2 <- with(iris, table(Sepal.Length, Species)) |>
as.data.frame.table() |>
transform(foo=ave(Freq, Sepal.Length, FUN=\(x) x/sum(x))) |>
subset(Freq > 0)
res2
# Sepal.Length Species Freq foo
# 1 4.3 setosa 1 1.0000000
# 2 4.4 setosa 3 1.0000000
# 3 4.5 setosa 1 1.0000000
# [...]
# 103 7.6 virginica 1 1.0000000
# 104 7.7 virginica 4 1.0000000
# 105 7.9 virginica 1 1.0000000
Where:
dim(res2)
# [1] 57 4
class(res2)
# [1] "data.frame"
Note: R >= 4.1 used