Home > OS >  Rename variable names with tidyverse
Rename variable names with tidyverse

Time:03-18

I've been struggling with a problem for a while so I was hoping to find some help here ;)

On R, I want to create a data table with two columns from a data set, modify the values of one of the two columns and calculate the number of people in the grouping for each group of variables.

I want to extract these two columns:

  • REF_YEAR : a column with the years : '2000' '2001' '2002' .
  • NAVLC9_COD : a column for my ship sizes : '[0-6[ m' '[10-12[ m' '[12-15[ m' ...

I want to modify NAVLC9_COD by renaming several variables from 0 to 12 meters with one variable and by renaming several variables over 12 meters with one variable.

STEP 1. With the idea that my script is reproducible I have created the values :

key_segment_size <- c("[0-6[ m","[6-10[ m","[10-12[ m")
other_size_segment <- c("[12-15[ m", "[15-18[ m", "[18-24[ m", "[24-40[ m", "[40-80[ m",">= 80 m")

STEP 2. Then I create my data table:

data_all_size <- data %>%
    dplyr::select(REF_YEAR,NAVLC9_COD) %>%
    str_replace(key_segment_size, "Inf. 12 m") %>% 
    str_replace(other_segment_size, "Sup. 12 m") %>% 
    droplevels() %>%
    group_by(REF_YEAR,NAVLC9_COD) %>% 
    summarize(
      number = n() 
    )

Error in type(pattern) : object 'other_segment_size' not found

in the case where I replace my values by a vector, it gives :

data_all_size <- data %>%
    dplyr::select(REF_YEAR,NAVLC9_COD) %>%
    str_replace(c("[0-6[ m","[6-10[ m","[10-12[ m"),"Inf. 12 m") %>% 
    str_replace(c("[12-15[ m", "[15-18[ m", "[18-24[ m", "[24-40[ m", "[40-80[ m",">= 80 m"),"Sup. 12 m") %>% 
    droplevels() %>%
    group_by(REF_YEAR,NAVLC9_COD) %>% 
    summarise(
      effectif = n() 
    )

Error in stri_replace_first_regex(string, pattern, fix_replacement(replacement), : Missing closing bracket on a bracket expression. (U_REGEX_MISSING_CLOSE_BRACKET, context=[0-6[ m) In addition: Warning messages: 1: In stri_replace_first_regex(string, pattern, fix_replacement(replacement), : argument is not an atomic vector; coercing 2: In stri_replace_first_regex(string, pattern, fix_replacement(replacement), : longer object length is not a multiple of shorter object length

I also tried with the mutate and recode functions :

data_all_size <- data %>%
    dplyr::select(REF_YEAR,NAVLC9_COD) %>%
    mutate(Taille = recode("Inf. 12 m" = key_segment_size)) %>%
    mutate(Taille = recode("Inf. 12 m" = other_segment_size)) %>% 
    droplevels() %>%
    group_by(REF_YEAR,NAVLC9_COD) %>% 
    summarise(
      effectif = n() 
    )

Error in mutate(): ! Problem while computing Taille = recode(Inf. 12 m = key_segment_size). Caused by error in recode.character(): ! argument ".x" is missing, with no default

Can anyone tell me what I'm doing wrong/what I should do instead? I'm using tidyverse, idk if that helps at all. Thank you for any help, I'm frustrated to tears.

CodePudding user response:

Thank's for your answers, I'm sorry I didn't show any data they are very numerous.

@Gregor Thomas From the dataset I selected the two columns and applied head to show you the columns I am interested in: data input

As shown the columns contain for:

  • REF_YEAR: '[0-6[ m,' '[10-12[ m', '[12-15[ m', '[15-18[ m', '[18-24[ m', '[24-40[ m', '[6-10[ m'
  • NAVL9_COD : 2000, 2001, 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009, 2010, 2011, 2012, 2013, 2014, 2015, 2016, 2017, 2018, 2019, 2020

@pacomet For the vectors (if they are key_segment_size and other_segment_size), I used c()

I didn't apply rename because I understood that it applies for columns. By the way, if I try to apply it it gives this:

data_all_size <- data %>%
    dplyr::select(REF_YEAR,NAVLC9_COD) %>%
    mutate(Size = recode("Inf. 12 m" = "[0-6[ m")) %>%
    rename(NAVLC9_COD, "Inf. 12 m" = key_segment_size) %>%
    rename(NAVLC9_COD, "Sup. 12 m" = other_segment_size) %>%
    droplevels() %>%
    group_by(REF_YEAR,NAVLC9_COD) %>% 
    summarize(
      number = n() 
    )

Note: Using an external vector in selections is ambiguous. i Use all_of(key_segment_size) instead of key_segment_size to silence this message. i See https://tidyselect.r-lib.org/reference/faq-external-vector.html. This message is displayed once per session. Error in stop_subscript(): ! Can't rename columns that don't exist. x Columns [0-6[ m, [6-10[ m, and [10-12[ m don't exist.

@Gregor Thomas I tried to apply it, it doesn't show an error BUT it gives me an array with the same variable names, so nothing renamed:

data_all_size <- data %>%
    dplyr::select(REF_YEAR,NAVLC9_COD) %>%
    mutate(Size = case_when(NAVLC9_COD %in% key_segment_size ~ "Inf. 12 m", 
                              NAVLC9_COD %in% other_size_segment ~ "Sup. 12 m", 
                              TRUE ~ NA_character_)) %>%
    droplevels() %>%
    group_by(REF_YEAR,NAVLC9_COD) %>% 
    summarize(
      number = n() 
    )

data output

I would like an output where the variables in the NAVLC9_COD column are: "Inf. 12 m" or "Sup 12 m".

Thanks in advance!

CodePudding user response:

I finally achieve it thanks to @Gregor Thomas comment with :

data_all_size <- data_sacrois %>%
 dplyr::select(REF_YEAR,NAVLC9_COD) %>%
 mutate(Taille = case_when(NAVLC9_COD %in% key_segment_size ~ "Inf. 12 m", 
                           NAVLC9_COD %in% other_size_segment ~ "Sup. 12 m", 
                           TRUE ~  "other")) %>%
 droplevels() %>%
 group_by(REF_YEAR,Taille) %>% 
 summarise(
      effectif = n() 
)

  •  Tags:  
  • r
  • Related