Home > Software design >  bind_rows repeating outputs
bind_rows repeating outputs

Time:12-06

I'm trying to extract data from xls-files, which have always the same structure (see below). After extracting the ranges of interest with:

all_data            <- (file.path(wdir, 'data_input.xlsx'))
excel_sheets(path = all_results)
tab_names           <- excel_sheets(path = all_results)

extracted_data      <- lapply(tab_names, function(x) read_excel(all_data, 
                                                                sheet = x, col_names=FALSE, range = "B2:B4"))

I would like to bring everything into a wide format. Yet, when I use:

extracted_data_wide <- extracted_data %>% bind_rows(extracted_data, .id=NULL)

it seems to repeat all inputs once, which is not plausible for me. Is there anyone who can help me understand where the problem is or who could probably offer a different solution for my question? I had also thought on pivot_wider but it doesn't work with the output extracted_data the way I thought.

Example data: exemplary data arrangement in xls-file called data_input.xlsx

CodePudding user response:

Still not really sure about your desired result, but I give it a try

  1. As you want two columns foo1 and foo2 I would guess that you want to extract only the first two rows of your data.
  2. Instead of just getting the values in column B of your excel sheets extract the names in column A too, using range="A2:B3".
  3. Afterwards use bind_rows. One issue with your code was that you passed the list of dataframe two times to bind_rows.
  4. When using bind_rows add an .id for the sheets
  5. Besides renaming you could then use pivot_wider to convert your data to wide format with rows corresponding to sheets and columns corresponding to the rows of your excel sheets
# Make example data
all_results <- tempfile(fileext = ".xlsx")

d1 <- data.frame(
  name = paste0("foo", 1:10),
  value = paste0("bar", 1:10)
)

d2 <- data.frame(
  name = paste0("foo", 1:10),
  value = paste0("bar", 11:20)
)

writexl::write_xlsx(list(d1, d2), path = all_results)

library(readxl)
library(dplyr)
library(tidyr)

tab_names <- excel_sheets(path = all_results)
extracted_data <- lapply(tab_names, function(x) read_excel(all_results, sheet = x, col_names=FALSE, range = "A2:B3"))
#> New names:
#> * `` -> ...1
#> * `` -> ...2
#> New names:
#> * `` -> ...1
#> * `` -> ...2

extracted_data_wide <- extracted_data %>% bind_rows(.id = "sheet")

extracted_data_wide %>% 
  rename(foo = 2, value = 3) %>% 
  pivot_wider(names_from = foo, values_from = value)
#> # A tibble: 2 × 3
#>   sheet foo1  foo2 
#>   <chr> <chr> <chr>
#> 1 1     bar1  bar2 
#> 2 2     bar11 bar12
  • Related