I have a dataframe that basically looks like this:
df<-data.frame(yearseason = c("1999 1", "1999 1", "1999 1", "1999 3", "1999 3", "1999 3", "2000 1", "2000 1", "2000 1") ,
species = c("a", "b", "c", "a", "b", "c", "a", "b", "c"),
count = c(1, 6, 3, 7, 2, 9, 4, 5, 7))
I want to add a "next_yearseason" column and fill it with the next unique yearseason value for each row. I.e. "1999 3" for rows 1-3, "2000 1" for rows 4-6, etc.
Is there a simple way to write a for loop that will do this?
I tried this:
for (i in unique(df$yearseason)){
(unique(df$next_yearseason))[i]<-(unique(df$yearseason))[i 1]
}
...but that did not work, I got an error: Error in i 1 : non-numeric argument to binary operator
I have a workaround to get the results without a loop, I'm just wondering if a loop can do this.
CodePudding user response:
With dplyr, you can do the following:
library(dplyr)
inner_join(
df, df %>% distinct(yearseason) %>% mutate(next_yearseason = lead(yearseason))
)
Output:
yearseason species count next_yearseason
1 1999 1 a 1 1999 3
2 1999 1 b 6 1999 3
3 1999 1 c 3 1999 3
4 1999 3 a 7 2000 1
5 1999 3 b 2 2000 1
6 1999 3 c 9 2000 1
7 2000 1 a 4 <NA>
8 2000 1 b 5 <NA>
9 2000 1 c 7 <NA>
You could do in a loop like this:
ys = unique(df$yearseason)
for(i in 1:(length(ys)-1)) {
df[df$yearseason==ys[i], "next_yearseason"] <- ys[i 1]
}
Output:
yearseason species count next_yearseason
1 1999 1 a 1 1999 3
2 1999 1 b 6 1999 3
3 1999 1 c 3 1999 3
4 1999 3 a 7 2000 1
5 1999 3 b 2 2000 1
6 1999 3 c 9 2000 1
7 2000 1 a 4 <NA>
8 2000 1 b 5 <NA>
9 2000 1 c 7 <NA>
CodePudding user response:
The base R way would of course not be using a loop but rather would be to use the tail-minus-3-items and pad with NA's (or make up the next one). tail
drops n
leading items, where n
is the second argument :
df$next_yrseas <- c( tail(df$yearseason, -3), rep(NA, 3))
OR
df$next_yrseas <- c( tail(df$yearseason, -3), rep("2000 3", 3))
> df
yearseason species count next_yrseas
1 1999 1 a 1 1999 3
2 1999 1 b 6 1999 3
3 1999 1 c 3 1999 3
4 1999 3 a 7 2000 1
5 1999 3 b 2 2000 1
6 1999 3 c 9 2000 1
7 2000 1 a 4 2000 3
8 2000 1 b 5 2000 3
9 2000 1 c 7 2000 3
I think the tidyverse
counterpart of tail might be lag
or lead
and think that they are designed to do the padding automagically. (There is a lag in base R which does not behave similarly and seems to be there mainly to confuse newbies, at least I was confused early on.)