Clean Column Names in R-CodePudding

I have the following code in R that combines multiple (177) csv files. However in a lot of the files, some column names have spaces and the others have underscores as separators e.g 'Article Number' and 'Article_Number'. I have tried janitor::make_clean_names and make.names etc within the code but I just cannot figure out the correct way to do it.

Any help much appreciated

df <- list_of_files %>%
  set_names() %>% 
  map_dfr(
    ~read_csv(.x, col_types = cols(.default = "c", 'TY Stock Value' = "c"), col_names = TRUE,),
    .id = "file_name"  
  )

CodePudding user response：

You can add it insight the map_dfr function such that each columns get first harmoized before it gets bind together.

df <- list_of_files %>%
  set_names() %>%
  map_dfr(~ .x %>%
    read_csv(.,
      col_types = cols(.default = "c", "TY Stock Value" = "c"),
      col_names = TRUE
    )
    %>%
    janitor::clean_names(),
  .id = "file_name"
  )

EDIT: Step-by-step

There are several ways to tell map which function to use. The ~ operator creates a formula (or better that I started an anonymous function), i.e. a shortcut for a function. And the argument of the function is .x which is in your case one csv-filename. This filename get send via the pipe to the read_csv function. There I used the placeholder . to tell the function where to put it. Then it reads the data into R and then send it to the clean_names function to harmonize names. Finally, you add .id from map_dfr function. That's all the purrr magic :)