Home > other >  Add sample size to data frame after aggregating using R
Add sample size to data frame after aggregating using R

Time:01-26

I have a data frame with plot plot numbers, and independently-taken data for 4 test subjects as shown below:

data <- data.frame(plot=c(101,
                          101,
                          101,
                          101,
                          101,
                          101,
                          101,
                          101,
                          102,
                          102,
                          102,
                          102,
                          102,
                          102,
                          102,
                          102),
                          subject1 = c(3,
                                       4,
                                       2,
                                       3,
                                       6,
                                       5,
                                       4,
                                       2,
                                       3,
                                       6,
                                       2,
                                       2,
                                       3,
                                       2,
                                       5,
                                       2),
                          subject2 = c(2,
                                       3,
                                       2,
                                       1,
                                       5,
                                       2,
                                       23,
                                       2,
                                       5,
                                       2,
                                       3,
                                       2,
                                       1,
                                       2,
                                       5,
                                       4),
                          subject3 = c(3,
                                       2,
                                       1,
                                       2,
                                       52,
                                       5,
                                       2,
                                       2,
                                       5,
                                       2,
                                       2,
                                       3,
                                       2,
                                       2,
                                       2,
                                       2),
                          subject4 = c(2,
                                       2,
                                       2,
                                       2,
                                       23,
                                       3,
                                       2,
                                       21,
                                       5,
                                       5,
                                       3,
                                       2,
                                       1,
                                       4,
                                       2,
                                       3))

My next task is to aggregate the data to find the mean score of each subject within each plots, so I did the following:

library(dplyr)
library(tibble)

#Aggregate by mean
mean <- aggregate(data, by=list(data$plot), mean)

#Select unwanted columns
mean <- select(mean, -Group.1)

#Add new column for the next part of the question
mean <- mean%>%
  add_column(sample_size = "sample_size")

What I need to do is to create a column with the sum of the total sample sizes for each plot. For instance, the number of occurrences of "101" in this dataset is 8, so I need that value listed at the end of my aggregated data frame. It would look like:

mean_data <- data.frame(plot=c(101, 102),
                        subject1=c(3.625, 3.125),
                        subject2=c(5, 3),
                        subject3=c(8.625, 2.5),
                        subject4=c(7.125, 3.125),
                        sample_size=c(8, 8))

How can I do this?

CodePudding user response:

With across, in summarise, we can have multiple function in a flexible way after grouping by 'plot'

library(dplyr)
data %>% 
 group_by(plot) %>% 
 summarise(across(everything(), mean), sample_size = n())

-output

# A tibble: 2 × 6
   plot subject1 subject2 subject3 subject4 sample_size
  <dbl>    <dbl>    <dbl>    <dbl>    <dbl>       <int>
1   101     3.62        5     8.62     7.12           8
2   102     3.12        3     2.5      3.12           8
  •  Tags:  
  • Related