I have a data frame with plot plot numbers, and independently-taken data for 4 test subjects as shown below:
data <- data.frame(plot=c(101,
101,
101,
101,
101,
101,
101,
101,
102,
102,
102,
102,
102,
102,
102,
102),
subject1 = c(3,
4,
2,
3,
6,
5,
4,
2,
3,
6,
2,
2,
3,
2,
5,
2),
subject2 = c(2,
3,
2,
1,
5,
2,
23,
2,
5,
2,
3,
2,
1,
2,
5,
4),
subject3 = c(3,
2,
1,
2,
52,
5,
2,
2,
5,
2,
2,
3,
2,
2,
2,
2),
subject4 = c(2,
2,
2,
2,
23,
3,
2,
21,
5,
5,
3,
2,
1,
4,
2,
3))
My next task is to aggregate the data to find the mean score of each subject within each plots, so I did the following:
library(dplyr)
library(tibble)
#Aggregate by mean
mean <- aggregate(data, by=list(data$plot), mean)
#Select unwanted columns
mean <- select(mean, -Group.1)
#Add new column for the next part of the question
mean <- mean%>%
add_column(sample_size = "sample_size")
What I need to do is to create a column with the sum of the total sample sizes for each plot. For instance, the number of occurrences of "101" in this dataset is 8, so I need that value listed at the end of my aggregated data frame. It would look like:
mean_data <- data.frame(plot=c(101, 102),
subject1=c(3.625, 3.125),
subject2=c(5, 3),
subject3=c(8.625, 2.5),
subject4=c(7.125, 3.125),
sample_size=c(8, 8))
How can I do this?
CodePudding user response:
With across
, in summarise
, we can have multiple function in a flexible way after grouping by 'plot'
library(dplyr)
data %>%
group_by(plot) %>%
summarise(across(everything(), mean), sample_size = n())
-output
# A tibble: 2 × 6
plot subject1 subject2 subject3 subject4 sample_size
<dbl> <dbl> <dbl> <dbl> <dbl> <int>
1 101 3.62 5 8.62 7.12 8
2 102 3.12 3 2.5 3.12 8