Home > Net >  Dynamically create a variable within a loop and add to dataframe
Dynamically create a variable within a loop and add to dataframe

Time:11-05

In STATA creating a variable dynamically within a loop is easy because the quotations `' identify the iterator. This example is to create a binary variable Y200X that takes value 1 if the Year is lesser than year 200X:

set obs 10
gen Year = 2005
replace Year = 2010 if _n > 4

forvalues y = 2005(1)2020 {
    gen byte Y`y' = 0
    replace Y`y' = 1 if Year < `y' 
    }

In R the iterator cannot be used directly for creating the variable name. The best I found was first create variables in the loop then assemble them back into the dataframe outside the loop:

Year <- c(2005,2010,1996,1994,2001,2006,2019,2021, 2018,1987)
ls.output <- as.data.frame(Year)

for(y in 2005:2020) {
  assign(paste0("Y",y), ifelse(ls.output$Year < y, 1, 0))
}
ls.output<- cbind(ls.output, Y2005,Y2006,Y2007,Y2009, Y2010)

Is there a better way to do this directly in the loop?

CodePudding user response:

Column names can be pasted together. Skip the step of creating separate variables and then adding them to the data frame as columns, instead add them directly:

for(y in 2005:2020) {
  ls.output[, paste0("Y", y)] <- ifelse(ls.output$Year < y, 1, 0)
}

ls.output
#    Year Y2005 Y2006 Y2007 Y2008 Y2009 Y2010 Y2011 Y2012 Y2013 Y2014 Y2015 Y2016 Y2017 Y2018 Y2019 Y2020
# 1  2005     0     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1
# 2  2010     0     0     0     0     0     0     1     1     1     1     1     1     1     1     1     1
# 3  1996     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1
# 4  1994     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1
# 5  2001     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1
# 6  2006     0     0     1     1     1     1     1     1     1     1     1     1     1     1     1     1
# 7  2019     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1
# 8  2021     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0     0
# 9  2018     0     0     0     0     0     0     0     0     0     0     0     0     0     0     1     1
# 10 1987     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1     1

CodePudding user response:

Use outer where Year is as defined in the question.

data.frame(Year,  outer(Year, setNames(2005:2010, paste0("Y", 2005:2010)), `<`))

giving:

   Year Y2005 Y2006 Y2007 Y2008 Y2009 Y2010
1  2005     0     1     1     1     1     1
2  2010     0     0     0     0     0     0
3  1996     1     1     1     1     1     1
4  1994     1     1     1     1     1     1
5  2001     1     1     1     1     1     1
6  2006     0     0     1     1     1     1
7  2019     0     0     0     0     0     0
8  2021     0     0     0     0     0     0
9  2018     0     0     0     0     0     0
10 1987     1     1     1     1     1     1
  • Related