I have a dataframe like this:
ID <- c("AB1","AB1","CD2","AB3","KK4","AB3","AB3","AB1","AB1","CD2")
year <- c(2005,2008,2005,2010,2007,2009,2009,2007,2000,2010)
df <- data.frame(ID, year)
df
ID year
1 AB1 2005
2 AB1 2008
3 CD2 2005
4 AB3 2010
5 KK4 2007
6 AB3 2009
7 AB3 2009
8 AB1 2007
9 AB1 2000
10 CD2 2010
I want to add a column xp with the number of rows that match ID and have a smaller value of year than the current row. I am looking for something like that:
df$xp <- nrow( ID == "ID in current row" & year < "year in current row" )
The result should be:
ID year xp
1 AB1 2005 1
2 AB1 2008 3
3 CD2 2005 0
4 AB3 2010 2
5 KK4 2007 0
6 AB3 2009 0
7 AB3 2009 0
8 AB1 2007 2
9 AB1 2000 0
10 CD2 2010 1
CodePudding user response:
I'm sure there are more concise base R or data.table approaches, but here's an approach using dplyr and tidyr. This approach relies on a "non-equi join", which dplyr does not currently include (but which data.table
and sqldf
do), so instead I'm doing a cartesian join and then filtering, which will be less efficient for large data.
library(dplyr);library(tidyr)
left_join( # join...
df, # each row of df...
df %>% # with each matching row of a table where...
left_join(df, by = "ID") %>% # each row of df is joined to all the rows with same ID
filter(year.y < year.x) %>% # and we only keep preceding years
count(ID, year = year.x), # and we count how many there are per ID
by = c("ID", "year")) %>%
replace_na(list(n=0)) # and we replace the NA's with zeroes
CodePudding user response:
Here is an approach using dplyr
and purrr
:
library(dplyr)
library(purrr)
df %>%
group_by(ID) %>%
mutate(xp = map_int(year, function(x) sum(cur_data()$year < x)))
purrr::map_int
runs the anonymous function for all elements of the year
column. dplyr::cur_data()
returns the data of the current group as a data frame.
CodePudding user response:
Here is a data.table
solution:
library(data.table)
setDT(df)
df[, xp:=sapply(1:.N, \(x) sum(year < year[x])), by=ID][]
#> ID year xp
#> 1: AB1 2005 1
#> 2: AB1 2008 3
#> 3: CD2 2005 0
#> 4: AB3 2010 2
#> 5: KK4 2007 0
#> 6: AB3 2009 0
#> 7: AB3 2009 0
#> 8: AB1 2007 2
#> 9: AB1 2000 0
#> 10: CD2 2010 1