I have 4 datasets that contains the same var called "siteid_public". The ultimate goal is: I want to see how many unique "siteid_public" in this four datasets. I will add them together and then use length (unique()) to get the number.
I use very stupid way to get this goal,the code like this:
site1<-dflist[[1]]%>%
select(siteid_public)
site2<-dflist[[2]]%>%
select(siteid_public)
site3<-dflist[[3]]%>%
select(siteid_public)
site4<-dflist[[4]]%>%
select(siteid_public)
site<-c(site1$siteid_public, site2$siteid_public,site3$siteid_public,site4$siteid_public)
length(unique(site))
But now, I want to improve its efficiency.
So, first, I use this code to create a list called "sitelist" which contains 4 lists coming from for datasets.(The dflist[[i]] in the code is the place where I store these 4 datasets.) After I run the code below, each list has one same var called siteid_public. The code is here:
sitelist<-list()
for (i in 1:4){
sitelist[[i]]<-dflist[[i]]%>%
select(siteid_public)
}
Now I want to add all 4 lists in sitelist as one list, and then use unique to see how many unique siteid_public value in this combined list. Could people help me to continue this code and achieve that goal? thanks a lot~~!
CodePudding user response:
You can use lapply
to iterate on a list of frames, either on the whole list or just as easily a subset (including one or zero).
Your site1
through site4
can be created as a list with
sites <- lapply(dflist[1:4], function(z) select(z, siteid_public))
and you can do your unique-counting with
unique(unlist(sites))
This works as well with
sites <- lapply(dflist, ...) # all of it
sites <- lapply(dflist[3], ...) # singleton, note not the `[[` index operator
indices <- ... # integer or logical of indices to include
sites <- lapply(dflist[indices], ...)