Home > Mobile >  R purrr how to rename column of nested df
R purrr how to rename column of nested df

Time:12-21

I have a list of data frames, each with two columns named "place" and "data". "place" is a character and "data" is a nested data frame with one numeric column named "value".

For each data frame from the list, I'd like to rename the "value" column of the nested data frame with the value of "place" column.

library(tidyverse)
some_dt = tibble(place = c("a","a", "b","b","c","c"), 
                 value = c(1,2,1,4,5,6))

# here is a list of data frames...
ls_df <- 
  some_dt %>% 
  group_by(place) %>% 
  nest() %>% 
  split(.$place)

I'm tried:

map2(ls_df$data, 
     ls_df$place,
     ~rename(.x, .y = "value"))

or:

map2(ls_df$data, 
     ls_df$place,
     ~rename_with(.x, ~ .y, "value"))

but I'm getting an empty list as result.

How can I rename the "value" column with the content of the outer data frame column?

CodePudding user response:

We may loop over the list ('ls_df') with map, extract the 'place' column and then rename the extracted 'data' column with the 'place' value

library(dplyr)
library(purrr)
ls_df2 <- map(ls_df, ~ {
     nm <- .x$place
    .x$data[[1]] <- .x$data[[1]] %>%
      rename_with(~ nm, "value")
   .x
})

-checking

> map(ls_df2, ~ .x$data)
$a
$a[[1]]
# A tibble: 2 × 1
      a
  <dbl>
1     1
2     2


$b
$b[[1]]
# A tibble: 2 × 1
      b
  <dbl>
1     1
2     4


$c
$c[[1]]
# A tibble: 2 × 1
      c
  <dbl>
1     5
2     6

Note that when we are splitting the data, it returns a list. Therefore, we cannot access the columns 'data' directly i.e

> ls_df$data
NULL
> ls_df$place
NULL

Or another option is

some_dt %>%
  nest_by(place) %>%
  mutate(data = data %>% 
          rename_with(~ place, value) %>%
          list(.)) %>% 
  ungroup
# A tibble: 3 × 2
  place data            
  <chr> <list>          
1 a     <tibble [2 × 1]>
2 b     <tibble [2 × 1]>
3 c     <tibble [2 × 1]>

CodePudding user response:

You could also try something like this:

library(tidyverse)

map(ls_df,
    ~ map2(.x$place, 
           .x$data,
           ~rename(.y, 
                   !!sym(.x) := value)
           )
    )
#> $a
#> $a[[1]]
#> # A tibble: 2 x 1
#>       a
#>   <dbl>
#> 1     1
#> 2     2
#> 
#> 
#> $b
#> $b[[1]]
#> # A tibble: 2 x 1
#>       b
#>   <dbl>
#> 1     1
#> 2     4
#> 
#> 
#> $c
#> $c[[1]]
#> # A tibble: 2 x 1
#>       c
#>   <dbl>
#> 1     5
#> 2     6

CodePudding user response:

You can create a function which renames using base colnames() then map that over all the list elements as follows:

# The fn:
rnm <- function(x) {
  colnames(x$data[[1]]) <- x$place
  x
}

# Result:
res <- ls_df |> purrr::map(.f = rnm)

# Check if it's the desired output:
res$a$data

# [[1]]
#  A tibble: 2 × 1
#      a
#   <dbl>
# 1     1
# 2     2

CodePudding user response:

You can also iterate over each list element and then use mutate to rename the nested data frame using the place.

ls_df %>% 
  modify(~ mutate(.x,
                  data = map(data,
                             ~ set_names(.x, first(place)))))

In this case, you can actually simplify this further.

ls_df %>% 
  modify(~ mutate(.x,
                  data = map2(data, place, set_names)))

# which can collapse down to as simple as this
ls_df %>% 
  modify(mutate, data = map2(data, place, set_names))

With that approach, you can actually consider whether you actually need the list. The nested tibble may be easier to work with directly.

ls_df %>% 
  bind_rows() %>% 
  mutate(data = map2(data, place, set_names))
  • Related