I have a dataset like this:
data <- tibble(year=c(2010,2010,2012,2010,2011,2011,2013,2013,2010,2011,2012,2013),
state=c("ca", "ca", "ca", "ny", "ny", "ny", "ny", "ny", "wa", "wa", "wa", "wa"),
variable2=c("a", "b", "c", "b", "c", "a", "d", "a", "b", "b", "c", "b"),
value=c(6,5,2,6,3,1,7,8,3,2,5,7))
I would to select only the data for states with at least 3 unique years. In this data, that would be ny and wa. I would like to retain all the data for those respective states. Because of variable 2, some states have multiple data points for the same year, but I'm only interested in states with at least 3 unique years, regardless of the value for variable2. Thanks.
CodePudding user response:
You may try
library(dplyr)
data %>%
group_by(state) %>% summarise(n = length(unique(year))) %>%
filter(n>=3) %>% pull(state)
CodePudding user response:
You could define a function ulen
for unique length, and use it in ave
.
ulen <- \(x) length(unique(x))
data[with(data, ave(year, state, FUN=ulen)) > 2, ]
# year state variable2 value
# 4 2010 ny b 6
# 5 2011 ny c 3
# 6 2011 ny a 1
# 7 2013 ny d 7
# 8 2013 ny a 8
# 9 2010 wa b 3
# 10 2011 wa b 2
# 11 2012 wa c 5
# 12 2013 wa b 7
Data:
data <- structure(list(year = c(2010, 2010, 2012, 2010, 2011, 2011, 2013,
2013, 2010, 2011, 2012, 2013), state = c("ca", "ca", "ca", "ny",
"ny", "ny", "ny", "ny", "wa", "wa", "wa", "wa"), variable2 = c("a",
"b", "c", "b", "c", "a", "d", "a", "b", "b", "c", "b"), value = c(6,
5, 2, 6, 3, 1, 7, 8, 3, 2, 5, 7)), class = "data.frame", row.names = c(NA,
-12L))
CodePudding user response:
Try this. The code removes rows where there are less then three unique years.
n<-levels(factor(data$state))
for(i in n){
data_group<- data[data$state==i,]
length_year<- length(unique(data_group$year))
if(length_year<3){
data<- data[!data$state==i, ]
}
}