I have a data.frame with absence/presence data (0/1) for a group of animals, with columns as years and rows as individuals.
My data:
df <- data.frame(Year1 = c('1','0','0','0','0','0'),
Year2 = c('1','1','1','0','0','0'),
Year3 = c('1','1','1','1','1','0'),
Year4 = c('0','1','0','0','0','1'),
Year5 = c('0','0','1','1','0','1'),
Year6 = c('0','0','0','1','1','1'))
df
Year1 Year2 Year3 Year4 Year5 Year6
1: 1 1 1 0 0 0
2: 0 1 1 1 0 0
3: 0 1 1 0 1 0
4: 0 0 1 0 1 1
5: 0 0 1 0 0 1
6: 0 0 0 1 1 1
Some individuals have sighting gaps (seen one year (1), then not seen the next year (0), but spotted again in the third year (1)). In total there are 400 rows (=individuals).
What I would like to do is fill the gaps (0s between 1s) with 1s, so that the above data frame becomes:
df
Year1 Year2 Year3 Year4 Year5 Year6
1: 1 1 1 0 0 0
2: 0 1 1 1 0 0
3: 0 1 1 1 1 0
4: 0 0 1 1 1 1
5: 0 0 1 1 1 1
6: 0 0 0 1 1 1
Zeros before the first 1 and after the last 1 should not be affected.
I have browsed many stackoverflow questions, e.g.:
find and replace numeric sequence in r
Replace a sequence of values by group depending on preceeding values
However, I could not find a solution that works across all columns at once, on a row-by-row basis.
Thank you in advance for your advice! :)
CodePudding user response:
We may do this by row. An efficient option is using dapply
from collapse
. Loop over the rows, find the position index of 1s, get the sequence between the first and last, and replace
those elements to 1.
library(collapse)
dapply(df, MARGIN = 1, FUN = function(x)
replace(x, do.call(`:`, as.list(range(which(x == 1)))), 1 ))
-output
Year1 Year2 Year3 Year4 Year5 Year6
1 1 1 1 0 0 0
2 0 1 1 1 0 0
3 0 1 1 1 1 0
4 0 0 1 1 1 1
5 0 0 1 1 1 1
6 0 0 0 1 1 1
An option is also to get the row/column index with which
and arr.ind = TRUE
, then create the sequence, and use the row/column index to do the assignment which is vectorized
ind <- which(df ==1, arr.ind = TRUE)
m1 <- as.matrix(transform(stack(lapply(split(ind[,2], ind[,1]),
function(x) x[1]:x[length(x)]))[2:1], ind = as.integer(ind)))
df[m1] <- 1
CodePudding user response:
Use max.col
to find the "first" and "last" 1
in each row, and then compare to the col()
umn number:
df[col(df) >= max.col(df, "first") & col(df) <= max.col(df, "last")] <- 1
df
# Year1 Year2 Year3 Year4 Year5 Year6
#1 1 1 1 0 0 0
#2 0 1 1 1 0 0
#3 0 1 1 1 1 0
#4 0 0 1 1 1 1
#5 0 0 1 1 1 1
#6 0 0 0 1 1 1
CodePudding user response:
For a column you may use the following:
col[min(which(col == 1)):max(which(col == 1))] <- 1
This function may be wrapped in apply
.