I have several sheets that I import from excel. While these sheets are similar there are some differences due to manual entry. I am trying to filter out the rows that has "Total" and anything beyond that row. The logic I have works for df1 and df3 but I am not sure how to get it to work for df2. Could someone please help?
df1<-structure(list(...1 = structure(c(1630022400, 1630108800, 1630195200,
1630281600, 1630368000, NA), tzone = "UTC", class = c("POSIXct",
"POSIXt")), `Vinayak Trading` = c(1984.31, NA, NA, NA, NA, 2916.17
)), row.names = c(NA, -6L), class = c("tbl_df", "tbl", "data.frame"
))
df2<-structure(list(...1 = c("44526", "44527", "44528", "44529", "44530",
"Total"), `Vinayak Trading` = c(NA, NA, NA, NA, NA, 0)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
df3<-structure(list(...1 = c("44680", "44681", NA, "Total", NA, NA
), `Vinayak Trading` = c(NA, NA, NA, 2736.42, NA, NA)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
transform <- function(df) {
names(df)[1] <- "Date"
df <- df %>%
filter(row_number() < which(is.na(Date))) %>% #To tackle sheets where Total is not present
filter(row_number() < which(Date=="Total")) %>% #To remove Total in sheets where it is present
select(-Total) #To remove the Total column
}
df1 <- transform(df1)# Desired reults
df2 <- transform(df2)# Error due to no NAs - don't know how to handle
df3 <- transform(df3)#Desired result with warning
CodePudding user response:
We can use dplyr::cumany
to remove "Total"
(or NA) and anything beyond.
transform2 <- function(df) {
df %>%
rename(Date = 1) %>%
filter(!cumany(Date %in% c("Total", NA))) %>%
select(-any_of("Total"))
}
transform2(df1)
# # A tibble: 5 × 2
# Date `Vinayak Trading`
# <dttm> <dbl>
# 1 2021-08-27 00:00:00 1984.
# 2 2021-08-28 00:00:00 NA
# 3 2021-08-29 00:00:00 NA
# 4 2021-08-30 00:00:00 NA
# 5 2021-08-31 00:00:00 NA
transform2(df2)
# # A tibble: 5 × 2
# Date `Vinayak Trading`
# <chr> <dbl>
# 1 44526 NA
# 2 44527 NA
# 3 44528 NA
# 4 44529 NA
# 5 44530 NA
transform2(df3)
# # A tibble: 2 × 2
# Date `Vinayak Trading`
# <chr> <dbl>
# 1 44680 NA
# 2 44681 NA
- We can use
rename(Date = 1)
as an inline replacement fornames(df)[1] <- "Date"
, it seems a bit more pipe-esque; == NA
doesn't return true/false, but%in% NA
does; we can useis.na(Date) | Date == "Total"
, or we can useDate %in% c("Total", NA)
with the results one would expect;cumany
is a "cumulative any", meaning that when a value returns true, then all subsequent values will be true as well, seecumany(c(F,T,F))
; for its opposite, seecumall(c(T,F,T))
; for a base-R equivalent, usecumsum(cond) > 0
for cumany andcumsum(!cond) == 0
for cumall; and- I use
select(-any_of("Total"))
since it will remove the column if it exists and do nothing otherwise (none of your sample data included it, so I thought it better to be safe).