I've been struggling with a problem for a while so I was hoping to find some help here ;)
On R, I want to create a data table with two columns from a data set, modify the values of one of the two columns and calculate the number of people in the grouping for each group of variables.
I want to extract these two columns:
- REF_YEAR : a column with the years : '2000' '2001' '2002' .
- NAVLC9_COD : a column for my ship sizes : '[0-6[ m' '[10-12[ m' '[12-15[ m' ...
I want to modify NAVLC9_COD by renaming several variables from 0 to 12 meters with one variable and by renaming several variables over 12 meters with one variable.
STEP 1. With the idea that my script is reproducible I have created the values :
key_segment_size <- c("[0-6[ m","[6-10[ m","[10-12[ m")
other_size_segment <- c("[12-15[ m", "[15-18[ m", "[18-24[ m", "[24-40[ m", "[40-80[ m",">= 80 m")
STEP 2. Then I create my data table:
data_all_size <- data %>%
dplyr::select(REF_YEAR,NAVLC9_COD) %>%
str_replace(key_segment_size, "Inf. 12 m") %>%
str_replace(other_segment_size, "Sup. 12 m") %>%
droplevels() %>%
group_by(REF_YEAR,NAVLC9_COD) %>%
summarize(
number = n()
)
Error in type(pattern) : object 'other_segment_size' not found
in the case where I replace my values by a vector, it gives :
data_all_size <- data %>%
dplyr::select(REF_YEAR,NAVLC9_COD) %>%
str_replace(c("[0-6[ m","[6-10[ m","[10-12[ m"),"Inf. 12 m") %>%
str_replace(c("[12-15[ m", "[15-18[ m", "[18-24[ m", "[24-40[ m", "[40-80[ m",">= 80 m"),"Sup. 12 m") %>%
droplevels() %>%
group_by(REF_YEAR,NAVLC9_COD) %>%
summarise(
effectif = n()
)
Error in stri_replace_first_regex(string, pattern, fix_replacement(replacement), : Missing closing bracket on a bracket expression. (U_REGEX_MISSING_CLOSE_BRACKET, context=
[0-6[ m
) In addition: Warning messages: 1: In stri_replace_first_regex(string, pattern, fix_replacement(replacement), : argument is not an atomic vector; coercing 2: In stri_replace_first_regex(string, pattern, fix_replacement(replacement), : longer object length is not a multiple of shorter object length
I also tried with the mutate and recode functions :
data_all_size <- data %>%
dplyr::select(REF_YEAR,NAVLC9_COD) %>%
mutate(Taille = recode("Inf. 12 m" = key_segment_size)) %>%
mutate(Taille = recode("Inf. 12 m" = other_segment_size)) %>%
droplevels() %>%
group_by(REF_YEAR,NAVLC9_COD) %>%
summarise(
effectif = n()
)
Error in
mutate()
: ! Problem while computingTaille = recode(
Inf. 12 m= key_segment_size)
. Caused by error inrecode.character()
: ! argument ".x" is missing, with no default
Can anyone tell me what I'm doing wrong/what I should do instead? I'm using tidyverse, idk if that helps at all. Thank you for any help, I'm frustrated to tears.
CodePudding user response:
Thank's for your answers, I'm sorry I didn't show any data they are very numerous.
@Gregor Thomas From the dataset I selected the two columns and applied head to show you the columns I am interested in: data input
As shown the columns contain for:
- REF_YEAR: '[0-6[ m,' '[10-12[ m', '[12-15[ m', '[15-18[ m', '[18-24[ m', '[24-40[ m', '[6-10[ m'
- NAVL9_COD :
2000
,2001
,2002
,2003
,2004
,2005
,2006
,2007
,2008
,2009
,2010
,2011
,2012
,2013
,2014
,2015
,2016
,2017
,2018
,2019
,2020
@pacomet For the vectors (if they are key_segment_size and other_segment_size), I used c()
I didn't apply rename because I understood that it applies for columns. By the way, if I try to apply it it gives this:
data_all_size <- data %>%
dplyr::select(REF_YEAR,NAVLC9_COD) %>%
mutate(Size = recode("Inf. 12 m" = "[0-6[ m")) %>%
rename(NAVLC9_COD, "Inf. 12 m" = key_segment_size) %>%
rename(NAVLC9_COD, "Sup. 12 m" = other_segment_size) %>%
droplevels() %>%
group_by(REF_YEAR,NAVLC9_COD) %>%
summarize(
number = n()
)
Note: Using an external vector in selections is ambiguous. i Use
all_of(key_segment_size)
instead ofkey_segment_size
to silence this message. i See https://tidyselect.r-lib.org/reference/faq-external-vector.html. This message is displayed once per session. Error instop_subscript()
: ! Can't rename columns that don't exist. x Columns[0-6[ m
,[6-10[ m
, and[10-12[ m
don't exist.
@Gregor Thomas I tried to apply it, it doesn't show an error BUT it gives me an array with the same variable names, so nothing renamed:
data_all_size <- data %>%
dplyr::select(REF_YEAR,NAVLC9_COD) %>%
mutate(Size = case_when(NAVLC9_COD %in% key_segment_size ~ "Inf. 12 m",
NAVLC9_COD %in% other_size_segment ~ "Sup. 12 m",
TRUE ~ NA_character_)) %>%
droplevels() %>%
group_by(REF_YEAR,NAVLC9_COD) %>%
summarize(
number = n()
)
I would like an output where the variables in the NAVLC9_COD column are: "Inf. 12 m" or "Sup 12 m".
Thanks in advance!
CodePudding user response:
I finally achieve it thanks to @Gregor Thomas comment with :
data_all_size <- data_sacrois %>%
dplyr::select(REF_YEAR,NAVLC9_COD) %>%
mutate(Taille = case_when(NAVLC9_COD %in% key_segment_size ~ "Inf. 12 m",
NAVLC9_COD %in% other_size_segment ~ "Sup. 12 m",
TRUE ~ "other")) %>%
droplevels() %>%
group_by(REF_YEAR,Taille) %>%
summarise(
effectif = n()
)