Here's my dataframe "fulldays" (The column header Plastic refers to the second column of numbers, the first column is just a numbered list of the rows that R puts in that I didn't know how to remove):
Plastic Age Ones Zeros Nonzeros CellsCounted AllDaysAvail
1 2 10 2 5 5 10 TRUE
2 57 8 4 2 8 10 TRUE
3 3 9 2 4 6 10 TRUE
4 81 9 3 1 9 10 TRUE
5 131 20 8 1 9 10 TRUE
6 5 8 5 5 5 10 TRUE
7 26 10 4 4 6 10 TRUE
8 76 12 2 6 4 10 TRUE
9 9 9 8 2 8 10 TRUE
10 36 14 2 5 5 10 TRUE
11 64 12 3 4 6 10 TRUE
12 74 22 5 4 6 10 TRUE
13 10 10 1 4 6 10 TRUE
14 21 9 7 3 7 10 TRUE
15 16 9 5 3 7 10 TRUE
17 18 8 4 3 7 10 TRUE
18 23 22 6 4 6 10 TRUE
19 106 11 2 1 9 10 TRUE
20 113 9 1 4 6 10 TRUE
21 24 11 2 5 5 10 TRUE
22 29 9 3 2 8 10 TRUE
23 85 9 6 4 6 10 TRUE
24 403 19 1 6 4 10 TRUE
25 25 19 1 2 8 10 TRUE
26 27 10 3 3 7 10 TRUE
27 121 7 7 3 7 10 TRUE
29 35 12 1 4 6 10 TRUE
30 39 18 2 6 4 10 TRUE
31 37 8 5 1 9 10 TRUE
32 63 7 8 2 8 10 TRUE
33 122 11 3 2 8 10 TRUE
34 148 9 4 4 6 10 TRUE
37 42 13 2 3 7 10 TRUE
38 144 12 0 9 1 10 TRUE
39 43 12 1 2 8 10 TRUE
40 47 20 6 4 6 10 TRUE
41 90 12 2 5 5 10 TRUE
42 119 12 2 4 6 10 TRUE
43 138 7 7 3 7 10 TRUE
44 56 4 7 3 7 10 TRUE
45 58 12 2 5 5 10 TRUE
46 60 22 3 4 6 10 TRUE
47 71 9 2 5 5 10 TRUE
48 288 18 0 10 0 10 TRUE
49 66 22 1 5 5 10 TRUE
50 67 9 0 8 2 10 TRUE
51 149 12 0 5 5 10 TRUE
52 70 14 5 4 6 10 TRUE
53 72 12 1 4 6 10 TRUE
54 78 12 0 4 6 10 TRUE
59 79 12 4 3 7 10 TRUE
60 83 11 4 4 6 10 TRUE
61 87 8 6 4 6 10 TRUE
63 92 11 1 4 6 10 TRUE
64 96 8 0 5 5 10 TRUE
65 125 7 7 3 7 10 TRUE
66 98 9 3 4 6 10 TRUE
67 107 6 2 3 7 10 TRUE
68 102 11 5 3 7 10 TRUE
69 103 10 0 1 9 10 TRUE
72 108 12 3 3 7 10 TRUE
73 153 12 4 3 7 10 TRUE
74 109 12 3 4 6 10 TRUE
75 118 10 4 5 5 10 TRUE
77 133 12 0 4 6 10 TRUE
79 157 8 0 10 0 10 TRUE
81 318 14 2 5 5 10 TRUE
I have this code:
new_data <- fulldays %>%
group_by(Age) %>%
summarize(OnesMean=mean(Ones), ZerosMean=mean(Zeros), NonZeroMean=mean(Nonzeros))
This is the output "new_data" (again, Age starts on the second column, not the first):
Age OnesMean ZerosMean NonZeroMean
<int> <dbl> <dbl> <dbl>
1 4 7 3 7
2 6 2 3 7
3 7 7.25 2.75 7.25
4 8 3.43 4.29 5.71
5 9 3.67 3.67 6.33
6 10 2.33 3.67 6.33
7 11 2.83 3.17 6.83
8 12 1.75 4.31 5.69
9 13 2 3 7
10 14 3 4.67 5.33
11 18 1 8 2
12 19 1 4 6
13 20 7 2.5 7.5
14 22 3.75 4.25 5.75
I have three questions:
- Why is groupby creating a tibble and not a dataframe?
- Why when I click on "new_data" as an object in the "Data" section does it display the values with so many decimal places (see image below)? [1]: https://i.stack.imgur.com/724G9.png
- How can I add/bind a column to new_data that shows the sample size for each age? In other words, I want to know how many of each individual is contributing to the mean score for each column (ideally I would add this column between "Age" and "OnesMean")
Thank you so much, and please let me know if there are any questions!
CodePudding user response:
tidyverse
works withtibble
s, which are stilldata.frame
s with a couple of differences. So when you usedplyr
functions on adata.frame
, it will become atibble
data.frame
.- The
mean
function doesn't round the result. If you want them to be rounded, you need to ask R to do that, withOnesMean = round(mean(Ones), 2)
, for example. - You add
n()
as one of the arguments ofsummarise()
. That is,summarise(OnesMean = mean(Ones), <...>, n())
.