Home > OS >  Group by in R questions: Tibble, decimal places, and show sample size?
Group by in R questions: Tibble, decimal places, and show sample size?

Time:03-21

Here's my dataframe "fulldays" (The column header Plastic refers to the second column of numbers, the first column is just a numbered list of the rows that R puts in that I didn't know how to remove):

Plastic Age Ones Zeros Nonzeros CellsCounted AllDaysAvail
1        2  10    2     5        5           10         TRUE
2       57   8    4     2        8           10         TRUE
3        3   9    2     4        6           10         TRUE
4       81   9    3     1        9           10         TRUE
5      131  20    8     1        9           10         TRUE
6        5   8    5     5        5           10         TRUE
7       26  10    4     4        6           10         TRUE
8       76  12    2     6        4           10         TRUE
9        9   9    8     2        8           10         TRUE
10      36  14    2     5        5           10         TRUE
11      64  12    3     4        6           10         TRUE
12      74  22    5     4        6           10         TRUE
13      10  10    1     4        6           10         TRUE
14      21   9    7     3        7           10         TRUE
15      16   9    5     3        7           10         TRUE
17      18   8    4     3        7           10         TRUE
18      23  22    6     4        6           10         TRUE
19     106  11    2     1        9           10         TRUE
20     113   9    1     4        6           10         TRUE
21      24  11    2     5        5           10         TRUE
22      29   9    3     2        8           10         TRUE
23      85   9    6     4        6           10         TRUE
24     403  19    1     6        4           10         TRUE
25      25  19    1     2        8           10         TRUE
26      27  10    3     3        7           10         TRUE
27     121   7    7     3        7           10         TRUE
29      35  12    1     4        6           10         TRUE
30      39  18    2     6        4           10         TRUE
31      37   8    5     1        9           10         TRUE
32      63   7    8     2        8           10         TRUE
33     122  11    3     2        8           10         TRUE
34     148   9    4     4        6           10         TRUE
37      42  13    2     3        7           10         TRUE
38     144  12    0     9        1           10         TRUE
39      43  12    1     2        8           10         TRUE
40      47  20    6     4        6           10         TRUE
41      90  12    2     5        5           10         TRUE
42     119  12    2     4        6           10         TRUE
43     138   7    7     3        7           10         TRUE
44      56   4    7     3        7           10         TRUE
45      58  12    2     5        5           10         TRUE
46      60  22    3     4        6           10         TRUE
47      71   9    2     5        5           10         TRUE
48     288  18    0    10        0           10         TRUE
49      66  22    1     5        5           10         TRUE
50      67   9    0     8        2           10         TRUE
51     149  12    0     5        5           10         TRUE
52      70  14    5     4        6           10         TRUE
53      72  12    1     4        6           10         TRUE
54      78  12    0     4        6           10         TRUE
59      79  12    4     3        7           10         TRUE
60      83  11    4     4        6           10         TRUE
61      87   8    6     4        6           10         TRUE
63      92  11    1     4        6           10         TRUE
64      96   8    0     5        5           10         TRUE
65     125   7    7     3        7           10         TRUE
66      98   9    3     4        6           10         TRUE
67     107   6    2     3        7           10         TRUE
68     102  11    5     3        7           10         TRUE
69     103  10    0     1        9           10         TRUE
72     108  12    3     3        7           10         TRUE
73     153  12    4     3        7           10         TRUE
74     109  12    3     4        6           10         TRUE
75     118  10    4     5        5           10         TRUE
77     133  12    0     4        6           10         TRUE
79     157   8    0    10        0           10         TRUE
81     318  14    2     5        5           10         TRUE

I have this code:

new_data <- fulldays %>%
              group_by(Age) %>%
              summarize(OnesMean=mean(Ones), ZerosMean=mean(Zeros), NonZeroMean=mean(Nonzeros))

This is the output "new_data" (again, Age starts on the second column, not the first):

Age OnesMean ZerosMean NonZeroMean
   <int>    <dbl>     <dbl>       <dbl>
 1     4     7         3           7   
 2     6     2         3           7   
 3     7     7.25      2.75        7.25
 4     8     3.43      4.29        5.71
 5     9     3.67      3.67        6.33
 6    10     2.33      3.67        6.33
 7    11     2.83      3.17        6.83
 8    12     1.75      4.31        5.69
 9    13     2         3           7   
10    14     3         4.67        5.33
11    18     1         8           2   
12    19     1         4           6   
13    20     7         2.5         7.5 
14    22     3.75      4.25        5.75

I have three questions:

  1. Why is groupby creating a tibble and not a dataframe?
  2. Why when I click on "new_data" as an object in the "Data" section does it display the values with so many decimal places (see image below)? [1]: https://i.stack.imgur.com/724G9.png
  3. How can I add/bind a column to new_data that shows the sample size for each age? In other words, I want to know how many of each individual is contributing to the mean score for each column (ideally I would add this column between "Age" and "OnesMean")

Thank you so much, and please let me know if there are any questions!

CodePudding user response:

  1. tidyverse works with tibbles, which are still data.frames with a couple of differences. So when you use dplyr functions on a data.frame, it will become a tibble data.frame.
  2. The mean function doesn't round the result. If you want them to be rounded, you need to ask R to do that, with OnesMean = round(mean(Ones), 2), for example.
  3. You add n() as one of the arguments of summarise(). That is, summarise(OnesMean = mean(Ones), <...>, n()).
  • Related