Home > Blockchain >  How to detect missing year in time series data in R?
How to detect missing year in time series data in R?

Time:07-01

Let's say we have column with the following years:

2012, 2013, 2014, 2015, 2017, 2018, 2019, 2020, 2021, 2022

Now I need a code which will identify which years is missing (2016 in this case)

CodePudding user response:

You could use setdiff().

setdiff(seq(min(x), max(x)), x)
# [1] 2016
Data
x <- c(2012,2013,2014,2015,2017,2018,2019,2020,2021,2022)

Update

According to the additional request, the code could be extended as

yr <- setdiff(seq(min(x), max(x)), x)
if( !length(yr) ) yr <- "no year is missing"

CodePudding user response:

Do you just need to know which year is missing?

If so, you can try with :

all_years <- seq(2012, 2022, 1)
years_in_column <- c(2012,2013,2014,2015,2017,2018,2019,2020,2021,2022)

all_years[!all_years %in% years_in_column]

CodePudding user response:

You can use setdiff:

years <- c(2012,2013,2014,2015,2017,2018,2019,2020,2021,2022)
all_years <- seq(min(years), max(years))
setdiff(all_years, years)
#> [1] 2016

CodePudding user response:

You can do in base R:

df <- data.frame(year = c(2012,2013,2014,2015,2017,2018,2019,2020,2021,2022))
all_years <- seq(min(df$year), max(df$year))
result <- all_years[!all_years %in% df$year]
result
[1] 2016

if(lenth(result) == 0) result <- "no year is missing"
  • Related