Home > Software design >  R how to squish a data frame so there aren't holes throughout
R how to squish a data frame so there aren't holes throughout

Time:12-22

Here is my data frame.

structure(list(INVOICE_DATE = structure(c(19205, 19205, 19205, 
19206, 19206, 19206, 19207, 19207, 19207), class = "Date"), CATEGORY = c("Accessory", 
"Concentrate", "Edible", "Accessory", "Concentrate", "Edible", 
"Accessory", "Concentrate", "Edible"), Crumble = c(NA, 47, NA, 
NA, 65, NA, NA, 85, NA), Tincture = c(NA, NA, 567, NA, NA, 1028, 
NA, NA, 830), Other = c(NA_real_, NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), Battery = c(1079, 
NA, NA, 1027, NA, NA, 1148, NA, NA)), row.names = c(NA, -9L), class = c("tbl_df", 
"tbl", "data.frame"))

enter image description here

I can't find the right verb or set of verbs to do this in R.

How can I remove 'NA' values here in a way the squishes the data frame into a nicely formatted table? The CATEGORY column could be deleted. Then all the rest of the columns should fit on one row without any holes.

I can't do df %>% na.omit() because I end up with a data frame with no observations. The same goes if I try this kind of treatment: filter(is.na()).

CodePudding user response:

library(tidyr)
library(dplyr)

select(df, -CATEGORY) %>%
  pivot_longer(-INVOICE_DATE) %>%
  filter(!is.na(value)) %>%
  pivot_wider()

# A tibble: 3 × 4
  INVOICE_DATE Battery Crumble Tincture
  <date>         <dbl>   <dbl>    <dbl>
1 2022-08-01      1079      47      567
2 2022-08-02      1027      65     1028
3 2022-08-03      1148      85      830

CodePudding user response:

Here’s a solution using a grouped summarize().

library(dplyr)

dat %>% 
  group_by(INVOICE_DATE) %>%
  summarize(across(
    Crumble:Battery, 
    ~ ifelse(sum(!is.na(.x)) > 0, .x[!is.na(.x)], NA)
  ))
# A tibble: 3 × 5
  INVOICE_DATE Crumble Tincture Other Battery
  <date>         <dbl>    <dbl> <lgl>   <dbl>
1 2022-08-01        47      567 NA       1079
2 2022-08-02        65     1028 NA       1027
3 2022-08-03        85      830 NA       1148

CodePudding user response:

Using colSums by INVOICE_DATE.

by(df[-2], df$INVOICE_DATE, \(x) 
   data.frame(INVOICE_DATE=x[1, 1], t(colSums(x[2:5], na.rm=TRUE)))) |>
  do.call(what=rbind)
#            INVOICE_DATE Crumble Tincture Other Battery
# 2022-08-01   2022-08-01      47      567     0    1079
# 2022-08-02   2022-08-02      65     1028     0    1027
# 2022-08-03   2022-08-03      85      830     0    1148
  •  Tags:  
  • r
  • Related