Home > Software engineering >  Analyse binary dataset row-wise regarding value after a one
Analyse binary dataset row-wise regarding value after a one

Time:03-06

I want to calculate whether an individual survived from one year to the next or not. 0 means it died and 1 that it survived. The dataset consist of different years (2007 to 2020) and the calculation should start with year 2008. I only want R to use a proportion of the data I have.

My data set looks like the following:

the first 17 rows of my data set

> ID 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020
   1   0    1    0    0    0    0    0    0    0    0    0     0   0   0  
   3   0    1    1    1    0    0    0    0    0    0    0     0   0   0 
   4   0    1    1    1    0    0    0    0    0    0    0     0   0   0
   9   0    1    0    0    0    0    0    0    0    0    0     0   0   0
   24  0    0    1    1    1    1    1    1    1    1    1     1   1   0
  ...

In total I have 1,121 entries, 16 total columns.

I want R to start in the first row in year 2008 and see whether there is a 1 or not. If there is a 1, I want R to look at the next column (2009) and see if there is also a 1 (should give me a 1 as output) or a 0 (should give me a 0 as output). If there is no 1 I want R to check the next columns until it find a year with a 1 then it should check the next column as described above. After it found a 1 and did the checking it should ignore the remaining columns and move to the next row and repeat the process. The output should be saved in a new column.

I tried for loop and if else statement as well as ifelse, if ...

The closest I was able to get to my goal is with the following code

for(x in foal_fates_2)) {
  if (foal_fates_2$`2008`=="1" && foal_fates_2$`2009` =="1") {
    print("1")
  } else if (foal_fates_2$`2008`== "1" && foal_fates_2$`2009` =="0") {
    print("0")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="1" && foal_fates_2$`2010` == "1"){
    print("1")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="1" && foal_fates_2$`2010`== "0") {
    print("0")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="1" && 
             foal_fates_2$`2011`=="1"){
    print("1")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="1" && 
             foal_fates_2$`2011`=="0"){
    print("0")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="1" && foal_fates_2$`2012`=="1"){
    print("1")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="1" && foal_fates_2$`2012`=="0"){
    print("0")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="1" && foal_fates_2$`2013`=="1"){
    print("1")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="1" && foal_fates_2$`2013`=="0"){
    print("0")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="1" &&
             foal_fates_2$`2014`== "1"){
    print("1")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="1" &&
             foal_fates_2$`2014`=="0"){
    print("0")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
             foal_fates_2$`2014`== "1" && foal_fates_2$`2015`=="1"){
    print("1")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
            foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
            foal_fates_2$`2014`== "1" && foal_fates_2$`2015`=="0"){
    print("0")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
             foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="1" && foal_fates_2$`2016` =="1"){
    print("1")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
             foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="1" && foal_fates_2$`2016` =="0"){
    print("0")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
             foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="1" &&
             foal_fates_2$`2017`=="1"){
    print("1")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
             foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="1" &&
             foal_fates_2$`2017`=="0"){
    print("0")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
             foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
             foal_fates_2$`2017`=="1" && foal_fates_2$`2018`=="1"){
    print("1")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
             foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
             foal_fates_2$`2017`=="1" && foal_fates_2$`2018`=="0"){
    print("0")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
             foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
             foal_fates_2$`2017`=="0" && foal_fates_2$`2018`=="1" && foal_fates_2$`2019`=="1"){
    print("1")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
             foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
             foal_fates_2$`2017`=="0" && foal_fates_2$`2018`=="1" && foal_fates_2$`2019`=="0"){
    print("0")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
             foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
             foal_fates_2$`2017`=="0" && foal_fates_2$`2018`=="0" && foal_fates_2$`2019`=="1" &&
             foal_fates_2$`2020`=="1"){
    print("1")
  } else if (foal_fates_2$`2008`== "0" && foal_fates_2$`2009` =="0" && foal_fates_2$`2010` =="0" && 
             foal_fates_2$`2011`=="0" && foal_fates_2$`2012`=="0" && foal_fates_2$`2013`=="0" &&
             foal_fates_2$`2014`== "0" && foal_fates_2$`2015`=="0" && foal_fates_2$`2016` =="0" &&
             foal_fates_2$`2017`=="0" && foal_fates_2$`2018`=="0" && foal_fates_2$`2019`=="1" &&
             foal_fates_2$`2020`=="0"){
    print("0")
  } 

}

With this code R at least does something, and the result has the correct number of entities but the output is not correct. R gives me 0 and 1 but not at the correct place. Meaning e.g. for the first five rows R gave me the results "0" "0" "0" "1" "0" but it should be "0" "1" "1" "1" "0". At least if I understand it correctly. I am new to R so maybe for loop and if else are not the right tools for what I want to do. So, the question is how can I get to my goal. I would really appreciate any help.

CodePudding user response:

I would write a function to be applied on each row. Something like the following one (which could of course be more elaborate, but should do the job):

numberAfterFirstOne <- function(myRow){
  x <- which(myRow == 1)[1] 
  if (length(x   1) < length(myRow)) # 
    return(myRow[x   1])
  else 
    return(NA)
}

Explanation:

  1. Which indices are equal to one, just select the first one; if none is 1, x will be NA.
  2. if there is a value after that first one, return it
  3. return NA (could be also 0 or whatever 'key value' you wish

For testing here is an example dataset:

n <- 5
m <- 16
set.seed(1562) # for reproducability
dataset <- as.data.frame(matrix(ncol = m, nrow = n, data = round(runif(m * n, 0, 0.7))))
dataset <- rbind(dataset, rep(0, 16))

  V1 V2 V3 V4 V5 V6 V7 V8 V9 V10 V11 V12 V13 V14 V15 V16
1  1  0  0  1  0  0  0  1  0   1   0   1   0   0   1   0
2  1  1  0  0  0  1  1  0  0   1   1   0   0   0   1   0
3  0  0  0  0  0  1  0  0  0   0   0   1   0   0   0   1
4  1  0  0  0  0  0  0  0  0   1   0   0   1   0   1   0
5  0  1  1  0  0  1  0  1  0   1   0   1   0   0   1   0
6  0  0  0  0  0  0  0  0  0   0   0   0   0   0   0   0

Then apply the function numberAfterFirstOne on each row (apply is similar to a for-loop, but neater to write and read).

apply(dataset, 1, numberAfterFirstOne)
[1]  0  1  0  0  1 NA

This is similar to the more clumpsy construction with a for-loop:

result <- c()
for (i in 1:nrow(dataset)){
  result[i] <- numberAfterFirstOne(dataset[i, ])
}

You could now tweak the function to return what you want. At the moment there could be 0, 1, or NA returned, maybe you just want 1 and 0 or 1 and NA. The check with if (length(x 1)) would not be necessary, because if the index is out of bounce, NA is returned by myRow[x 1] which would make the function even simpler.

You could also modify the code, so that the year is also returned:

colnames(dataset) <- 2007:2020 # name the columns of the example dataset

numberAfterFirstOne <- function(myRow){
  x <- which(myRow == 1)[1]
  return(c(x, myRow[x   1])) # return the column index   the value
}

result <- apply(dataset, 1, numberAfterFirstOne) #save the result
result[1, ] <- names(dataset)[result[1, ]] # set column index to name of dataset column
     [,1]   [,2]   [,3]   [,4]   [,5]   [,6]
[1,] "2007" "2007" "2012" "2007" "2008" NA  
[2,] "0"    "1"    "0"    "0"    "1"    NA  
  • Related