I want to calculate whether an individual survived from one year to the next or not. 0 means it died and 1 that it survived. The dataset consist of different years (2007 to 2020) and the calculation should start with year 2008. I only want R to use a proportion of the data I have.
My data set looks like the following:
the first 17 rows of my data set
> ID 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
1 0 1 0 0 0 0 0 0 0 0 0 0 0 0
3 0 1 1 1 0 0 0 0 0 0 0 0 0 0
4 0 1 1 1 0 0 0 0 0 0 0 0 0 0
9 0 1 0 0 0 0 0 0 0 0 0 0 0 0
24 0 0 1 1 1 1 1 1 1 1 1 1 1 0
...
In total I have 1,121 entries, 16 total columns.
I want R to start in the first row in year 2008 and see whether there is a 1 or not. If there is a 1, I want R to look at the next column (2009) and see if there is also a 1 (should give me a 1 as output) or a 0 (should give me a 0 as output). If there is no 1 I want R to check the next columns until it find a year with a 1 then it should check the next column as described above. After it found a 1 and did the checking it should ignore the remaining columns and move to the next row and repeat the process. The output should be saved in a new column.
I tried for loop and if else statement as well as ifelse, if ...
The closest I was able to get to my goal is with the following code
for(x in foal_fates_2)) {
if (foal_fates_2$`2008`=="1" && foal_fates_2$`2009` =="1") {
print("1")
} else if (foal_fates_2$`2008`== "1" && foal_fates_2$`2009` =="0") {
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="1" && foal_fates_2$`2010` == "1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="1" && foal_fates_2$`2010`== "0") {
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="1" &&
foal_fates_2$`2011`=="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="1" &&
foal_fates_2$`2011`=="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="1" && foal_fates_2$`2012`=="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="1" && foal_fates_2$`2012`=="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="1" && foal_fates_2$`2013`=="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="1" && foal_fates_2$`2013`=="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="1" &&
foal_fates_2$`2014`== "1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="1" &&
foal_fates_2$`2014`=="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "1" && foal_fates_2$`2015`=="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "1" && foal_fates_2$`2015`=="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="1" && foal_fates_2$`2016` =="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="1" && foal_fates_2$`2016` =="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="1" &&
foal_fates_2$`2017`=="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="1" &&
foal_fates_2$`2017`=="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
foal_fates_2$`2017`=="1" && foal_fates_2$`2018`=="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
foal_fates_2$`2017`=="1" && foal_fates_2$`2018`=="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
foal_fates_2$`2017`=="0" && foal_fates_2$`2018`=="1" && foal_fates_2$`2019`=="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
foal_fates_2$`2017`=="0" && foal_fates_2$`2018`=="1" && foal_fates_2$`2019`=="0"){
print("0")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
foal_fates_2$`2017`=="0" && foal_fates_2$`2018`=="0" && foal_fates_2$`2019`=="1" &&
foal_fates_2$`2020`=="1"){
print("1")
} else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" &&
foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
foal_fates_2$`2017`=="0" && foal_fates_2$`2018`=="0" && foal_fates_2$`2019`=="1" &&
foal_fates_2$`2020`=="0"){
print("0")
}
}
With this code R at least does something, and the result has the correct number of entities but the output is not correct. R gives me 0 and 1 but not at the correct place. Meaning e.g. for the first five rows R gave me the results "0" "0" "0" "1" "0" but it should be "0" "1" "1" "1" "0". At least if I understand it correctly. I am new to R so maybe for loop and if else are not the right tools for what I want to do. So, the question is how can I get to my goal. I would really appreciate any help.
CodePudding user response:
I would write a function to be applied on each row. Something like the following one (which could of course be more elaborate, but should do the job):
numberAfterFirstOne <- function(myRow){
x <- which(myRow == 1)[1]
if (length(x 1) < length(myRow)) #
return(myRow[x 1])
else
return(NA)
}
Explanation:
- Which indices are equal to one, just select the first one; if none is 1, x will be NA.
- if there is a value after that first one, return it
- return NA (could be also 0 or whatever 'key value' you wish
For testing here is an example dataset:
n <- 5
m <- 16
set.seed(1562) # for reproducability
dataset <- as.data.frame(matrix(ncol = m, nrow = n, data = round(runif(m * n, 0, 0.7))))
dataset <- rbind(dataset, rep(0, 16))
V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
1 1 0 0 1 0 0 0 1 0 1 0 1 0 0 1 0
2 1 1 0 0 0 1 1 0 0 1 1 0 0 0 1 0
3 0 0 0 0 0 1 0 0 0 0 0 1 0 0 0 1
4 1 0 0 0 0 0 0 0 0 1 0 0 1 0 1 0
5 0 1 1 0 0 1 0 1 0 1 0 1 0 0 1 0
6 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
Then apply
the function numberAfterFirstOne
on each row (apply is similar to a for-loop, but neater to write and read).
apply(dataset, 1, numberAfterFirstOne)
[1] 0 1 0 0 1 NA
This is similar to the more clumpsy construction with a for-loop:
result <- c()
for (i in 1:nrow(dataset)){
result[i] <- numberAfterFirstOne(dataset[i, ])
}
You could now tweak the function to return what you want. At the moment there could be 0, 1, or NA returned, maybe you just want 1 and 0 or 1 and NA. The check with if (length(x 1))
would not be necessary, because if the index is out of bounce, NA is returned by myRow[x 1]
which would make the function even simpler.
You could also modify the code, so that the year is also returned:
colnames(dataset) <- 2007:2020 # name the columns of the example dataset
numberAfterFirstOne <- function(myRow){
x <- which(myRow == 1)[1]
return(c(x, myRow[x 1])) # return the column index the value
}
result <- apply(dataset, 1, numberAfterFirstOne) #save the result
result[1, ] <- names(dataset)[result[1, ]] # set column index to name of dataset column
[,1] [,2] [,3] [,4] [,5] [,6]
[1,] "2007" "2007" "2012" "2007" "2008" NA
[2,] "0" "1" "0" "0" "1" NA