Home > Mobile >  R--Looping through a dataframe and referring to the next unique value
R--Looping through a dataframe and referring to the next unique value

Time:05-10

I have a dataframe that basically looks like this:

 df<-data.frame(yearseason = c("1999 1", "1999 1", "1999 1", "1999 3", "1999 3", "1999 3", "2000 1", "2000 1", "2000 1") , 
species = c("a", "b", "c", "a", "b", "c", "a", "b", "c"), 
count = c(1, 6, 3, 7, 2, 9, 4, 5, 7))

I want to add a "next_yearseason" column and fill it with the next unique yearseason value for each row. I.e. "1999 3" for rows 1-3, "2000 1" for rows 4-6, etc.

Is there a simple way to write a for loop that will do this?

I tried this:

for (i in unique(df$yearseason)){
  (unique(df$next_yearseason))[i]<-(unique(df$yearseason))[i 1]
}

...but that did not work, I got an error: Error in i 1 : non-numeric argument to binary operator

I have a workaround to get the results without a loop, I'm just wondering if a loop can do this.

CodePudding user response:

With dplyr, you can do the following:

library(dplyr)
inner_join(
  df, df %>% distinct(yearseason) %>% mutate(next_yearseason = lead(yearseason))
)

Output:

  yearseason species count next_yearseason
1     1999 1       a     1          1999 3
2     1999 1       b     6          1999 3
3     1999 1       c     3          1999 3
4     1999 3       a     7          2000 1
5     1999 3       b     2          2000 1
6     1999 3       c     9          2000 1
7     2000 1       a     4            <NA>
8     2000 1       b     5            <NA>
9     2000 1       c     7            <NA>

You could do in a loop like this:

ys = unique(df$yearseason) 
for(i in 1:(length(ys)-1)) {
  df[df$yearseason==ys[i], "next_yearseason"] <- ys[i 1] 
}

Output:

  yearseason species count next_yearseason
1     1999 1       a     1          1999 3
2     1999 1       b     6          1999 3
3     1999 1       c     3          1999 3
4     1999 3       a     7          2000 1
5     1999 3       b     2          2000 1
6     1999 3       c     9          2000 1
7     2000 1       a     4            <NA>
8     2000 1       b     5            <NA>
9     2000 1       c     7            <NA>

CodePudding user response:

The base R way would of course not be using a loop but rather would be to use the tail-minus-3-items and pad with NA's (or make up the next one). tail drops n leading items, where n is the second argument :

df$next_yrseas <- c( tail(df$yearseason, -3), rep(NA, 3))

OR

df$next_yrseas <- c( tail(df$yearseason, -3), rep("2000 3", 3))

> df

  yearseason species count next_yrseas
1     1999 1       a     1      1999 3
2     1999 1       b     6      1999 3
3     1999 1       c     3      1999 3
4     1999 3       a     7      2000 1
5     1999 3       b     2      2000 1
6     1999 3       c     9      2000 1
7     2000 1       a     4      2000 3
8     2000 1       b     5      2000 3
9     2000 1       c     7      2000 3 

I think the tidyverse counterpart of tail might be lag or lead and think that they are designed to do the padding automagically. (There is a lag in base R which does not behave similarly and seems to be there mainly to confuse newbies, at least I was confused early on.)

  •  Tags:  
  • r
  • Related