Home > database >  How to identify the addition/deletion across years in R
How to identify the addition/deletion across years in R

Time:09-06

Hi I would like to identify which row is an addition/deletion compared to data in the previous year. Here is the data frame.

a <- data.frame(Ticker=c("AA","AA","BBX","AA","AAAAX","BBX"), Year=c(2009,2010,2010,2011,2011,2011))

I want to create a dummy variable I=1 if the Ticker has not been shown in the previous year. I want to create another dummy variable A=1 if Ticker has not been shown in any of the previous years. In this sample, there are only two years. But in the real data, there are around 10 years in total. The expected output is as follows,

b <- data.frame(Ticker=c("AA","AA","BBX","AA","AAAAX","BBX"), Year=c(2009,2010,2010,2011,2011,2011),I=c(0,0,1,0,1,0),A=c(0,0,1,0,1,0))

Thank you!

CodePudding user response:

data.table version based on self-joins, just for fun.

The code sets both the A and I flags to 1, then changes the rows to 0 when the previous year, or any prior year matches on Ticker:

I am joining the a dataset, back on itself via the data.table syntax a[a] (table[jointable]), joining on= the Ticker value, and also comparing the Years within each matching Ticker group.
When there is a match, the I or A value is over-written := with a 0

library(data.table)
setDT(a)

a[, c("A","I","Yearp1") := .(1, 1, Year - 1)]
a[a, on=.(Ticker, Yearp1 = Year), I := 0]
a[a, on=.(Ticker, Year > Year), A := 0]
a[, Yearp1 := NULL]
a

#   Ticker Year A I
#1:     AA 2009 1 1
#2:     AA 2010 0 0
#3:    BBX 2010 1 1
#4:     AA 2011 0 0
#5:  AAAAX 2011 1 1
#6:    BBX 2011 0 0

CodePudding user response:

This solution makes two extra columns, one with only the previous year's tickers and one with all the previous tickers, and mutates based on inclusion in these groups.

a %>%
  mutate(prevs = map(Year, function(x) filter(., x - 1 == Year)$Ticker),
         I = map2_dbl(Ticker, prevs, function(x, y) if_else(x %in% y, 0, 1)),
         all_prevs = map(Year, function(x) filter(., x > Year)$Ticker),
         A = map2_dbl(Ticker, all_prevs, function(x, y) if_else(x %in% y, 0, 1))) %>%
  select(Ticker, Year, I, A)

#   Ticker Year I A
# 1     AA 2009 1 1
# 2     AA 2010 0 0
# 3    BBX 2010 1 1
# 4     AA 2011 0 0
# 5  AAAAX 2011 1 1
# 6    BBX 2011 0 0

The first row is off, but it makes more sense this way as it technically has not been included in any previous years.

  • Related