I am very new to R and this is probably not a difficult problem to solve, but I have been going around and around and can't get what I need, so I would be very grateful if someone could give me some advice. I've also never asked a question on one of these forums before, so I apologize if I am not following all of the normal conventions for posting.
I have multiple output files from another program that I am trying to do some analysis with using R. The number of output files will not be known in advance. I read them into my R code and store them in the variable listFinal.data.
I am trying to loop through the output files and group by the different values in the column Entity.Type, count the number of occurrences for each of the different entity types and then I need to get the average number of occurrences for each entity type across all of the output files.
Here is a snippet of the column I need to work with in the output files:
ID | Entity.Type |
---|---|
1 | Ground |
2 | Ground |
3 | Air |
4 | Air |
5 | Sea |
6 | Ground |
7 | Sea |
8 | Ground |
9 | Air |
10 | Ground |
Results I am looking for for this single file would be:
Ground | Air | Sea |
---|---|---|
5 | 3 | 2 |
I can do this successfully for just one file, but when I use the code that I have written and I have multiple files, I get a result like above for each file when what I really want is a single result like above that is the average across all files.
Here is the code that I am using:
for (h in 1:length(listFinal.data)) #listFinal.data is all the output files from another program
listVeh.data[[h]] <- listFinal.data[[h]] %>%
filter(Entity.Type != "Lifeform") %>% #remove people, just count vehicles
group_by(Entity.Type) %>%
summarize(n = n())
CodePudding user response:
Here's a toy example, where you have written the output data as a list:
set.seed(4)
d1 <- data.frame(ID = 1:30,
Entity.Type = sample(c("Ground", "Air", "Sea"), 30, replace=TRUE))
d2 <- data.frame(ID = 1:30,
Entity.Type = sample(c("Ground", "Air", "Sea"), 30, replace=TRUE))
datlist <- list(d1, d2)
names(datlist) <- c("d1", "d2")
I prefer ldply
over do.call(rbind, lapply(...))
as it adds the id of the data directly for named list.
output <- plyr::ldply(datlist, function(x) x %>% group_by(Entity.Type) %>% summarise(n=n()))
.id Entity.Type n
1 d1 Air 9
2 d1 Ground 9
3 d1 Sea 12
4 d2 Air 14
5 d2 Ground 9
6 d2 Sea 7
Calculating the mean value in the whole list will be straightforward.
output %>% group_by(Entity.Type) %>% summarise(mean(n))
# A tibble: 3 x 2
Entity.Type `mean(n)`
<chr> <dbl>
1 Air 11.5
2 Ground 9
3 Sea 9.5