I would like to generate a new df1
database, which checks row by row, and excludes columns with zeros in a row and which has no later numbers.
Example: In line 1, it has DR08
, DR09
and DR10
with zeros, but it has DR11
equal to 4, so I need to consider these DR because it has this later number, which is 4, however DR12
, DR013
and DR14
, I exclude, as it has no value afterwards. In line 2, in this case, it goes to DR07
, because the rest is 0. I think I get the idea, right?
df1 <- structure(
list(date= c("2021-06-28","2021-06-28","2021-06-28","2021-06-28","2021-06-28",
"2021-06-28","2021-06-28","2021-06-28"),
DR01 = c(4,1,4,3,3,4,3,6), DR02= c(4,2,6,7,3,2,7,4),DR03= c(9,5,4,3,3,2,1,5),
DR04 = c(5,4,3,3,6,2,1,9),DR05 = c(5,4,5,3,6,2,1,9),
DR06 = c(2,4,3,3,5,6,7,8),DR07 = c(2,5,4,4,9,4,7,8),
DR08 = c(0,0,0,1,2,0,0,0),DR09 = c(0,0,0,0,0,0,0,0),DR010 = c(0,0,0,0,0,0,0,0),DR011 = c(4,0,0,0,0,0,0,0),
DR012 = c(0,0,0,3,0,0,0,5),DR013 = c(0,0,1,0,0,0,2,0),DR014 = c(0,0,0,1,0,2,0,0)),
class = "data.frame", row.names = c(NA, -8L))
date DR01 DR02 DR03 DR04 DR05 DR06 DR07 DR08 DR09 DR010 DR011 DR012 DR013 DR014
1 2021-06-28 4 4 9 5 5 2 2 0 0 0 4 0 0 0
2 2021-06-28 1 2 5 4 4 4 5 0 0 0 0 0 0 0
3 2021-06-28 4 6 4 3 5 3 4 0 0 0 0 0 1 0
4 2021-06-28 3 7 3 3 3 3 4 1 0 0 0 3 0 1
5 2021-06-28 3 3 3 6 6 5 9 2 0 0 0 0 0 0
6 2021-06-28 4 2 2 2 2 6 4 0 0 0 0 0 0 2
7 2021-06-28 3 7 1 1 1 7 7 0 0 0 0 0 2 0
8 2021-06-28 6 4 5 9 9 8 8 0 0 0 0 5 0 0
CodePudding user response:
I am not entirely sure what the OP wants.
If we want to replace rowwise trailing zeroes for these selected columns with NAs, we can pivot_longer
, group the data by original row, and use data.table::rleid
to replace
the groups in which value==0
and rleid==last(rleid)
. Then pivot_wider
to have the original layout.
library(dplyr)
library(tidyr)
library(data.table)
df1 %>% mutate(index=row_number()) %>%
pivot_longer(starts_with('DR')) %>%
mutate(rleid=rleid(value==0)) %>%
group_by(index) %>%
mutate(value=replace(value, value==0 & rleid==last(rleid), NA)) %>%
select(-index, -rleid) %>%
pivot_wider(names_from = name, values_from = value)
We can also use purrr::pmap
and purrr
::accumulate to mutate columns of a selected num_range
with c(...)
:
library(dplyr)
library(purrr)
library(tidyr)
df1 %>%
mutate(pmap(across(num_range('DR0', 14:1)), ~{vec<-c(...)
accumulate(vec, \(x,y) ifelse(is.na(x) && y==0, NA, y), .init = NA)}),
.keep = 'unused')%>%
unnest_wider(`pmap(...)`)%>%
select(date, num_range('DR0', 1:14))
output
# A tibble: 8 x 16
# Groups: index [8]
index date DR01 DR02 DR03 DR04 DR05 DR06 DR07 DR08 DR09 DR010 DR011 DR012 DR013 DR014
<int> <chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 1 2021-06-28 4 4 9 5 5 2 2 0 0 0 4 NA NA NA
2 2 2021-06-28 1 2 5 4 4 4 5 NA NA NA NA NA NA NA
3 3 2021-06-28 4 6 4 3 5 3 4 0 0 0 0 0 1 NA
4 4 2021-06-28 3 7 3 3 3 3 4 1 0 0 0 3 0 1
5 5 2021-06-28 3 3 3 6 6 5 9 2 NA NA NA NA NA NA
6 6 2021-06-28 4 2 2 2 2 6 4 0 0 0 0 0 0 2
7 7 2021-06-28 3 7 1 1 1 7 7 0 0 0 0 0 2 NA
8 8 2021-06-28 6 4 5 9 9 8 8 0 0 0 0 5 NA NA
>
CodePudding user response:
If by excluding you mean turn them to NA
s you can use this base R method with apply
.
df1[] <- t(apply(df1, 1, function(x) {
inds <- max(which(x != 0)) 1
if(length(inds) && inds <= length(x))
x[inds:length(x)] <- NA
x
}))
df1
# date DR01 DR02 DR03 DR04 DR05 DR06 DR07 DR08 DR09 DR010 DR011 DR012 DR013 DR014
#1 2021-06-28 4 4 9 5 5 2 2 0 0 0 4 <NA> <NA> <NA>
#2 2021-06-28 1 2 5 4 4 4 5 <NA> <NA> <NA> <NA> <NA> <NA> <NA>
#3 2021-06-28 4 6 4 3 5 3 4 0 0 0 0 0 1 <NA>
#4 2021-06-28 3 7 3 3 3 3 4 1 0 0 0 3 0 1
#5 2021-06-28 3 3 3 6 6 5 9 2 <NA> <NA> <NA> <NA> <NA> <NA>
#6 2021-06-28 4 2 2 2 2 6 4 0 0 0 0 0 0 2
#7 2021-06-28 3 7 1 1 1 7 7 0 0 0 0 0 2 <NA>
#8 2021-06-28 6 4 5 9 9 8 8 0 0 0 0 5 <NA> <NA>