How to sum values from two adjacent columns in a data.frame in R but keep 0s as such?-CodePudding

I have a data.frame with absence/presence data (0/1) for a group of animals, with columns as years and rows as individuals.

My data:

df <- data.frame(Year1 = c('1','0','0','0','0','0'),
                 Year2 = c('1','1','1','0','0','0'),
                 Year3 = c('1','1','1','1','1','0'),
                 Year4 = c('0','1','1','1','1','1'),
                 Year5 = c('0','0','1','1','1','1'),
                 Year6 = c('0','0','0','0','0','0'))

df
     Year1 Year2 Year3 Year4 Year5 Year6
1:     1     1     1     0     0     0
2:     0     1     1     1     0     0
3:     0     1     1     1     1     0
4:     0     0     1     1     1     0
5:     0     0     1     1     1     0
6:     0     0     0     1     1     0

What I would like to do is to calculate the age per individual per year, meaning I would like to add col1 to col2, then that that sum to col3, and so on, so that the above data frame becomes:

df
     Year1 Year2 Year3 Year4 Year5 Year6
1:     1     2     3     0     0     0
2:     0     1     2     3     0     0
3:     0     1     2     3     4     0
4:     0     0     1     2     3     0
5:     0     0     1     2     3     0
6:     0     0     0     1     2     0

Importantly, zeros should remain zeros: once there is a column with a 0 after a sequence of non-zero values, the value should be 0 again, as the animal has died and does not continue in the population.

I have browsed many stackoverflow questions, e.g.:

sum adjacent columns for each column in a matrix in R

However, I could not find a solution that does the cut-off part after the individual has passed away (a 0 after 4 years of living means the animal has left the population and the age should no longer be recorded for that year).

Thank you in advance for your advice! :)

CodePudding user response：

Here's a pretty simple way. We do a cumulative sum by row, and multiply by the original data frame -- multiplying by 0 zeros out the 0 entries, and multiplying by 1 keeps the summed entries as-is. Since you have quotes around your numbers making them character class, we start by converting all your columns to numeric:

df[] = lapply(df, as.numeric)
result = t(apply(df, 1, cumsum)) * df
result
#   Year1 Year2 Year3 Year4 Year5 Year6
# 1     1     2     3     0     0     0
# 2     0     1     2     3     0     0
# 3     0     1     2     3     4     0
# 4     0     0     1     2     3     0
# 5     0     0     1     2     3     0
# 6     0     0     0     1     2     0