I am working in R, but I don't know very well how to extract from any number a series of data, i.e., from the number 20102168056, I want to subdivide it like this
- 2010 -> year
- 2 -> semester
- 168 -> university career
- 056 -> unique number
I tried to do it with an if, but every time I got more errors, I am new to this and I would like to know if you can help me (By the way, it is for any number, as 20211888070, so I did not use the if I raised).
CodePudding user response:
You can use tidyr::separate
.
library(tidyverse)
df <- tibble(original = c(20102168056, 20141152013, 20182008006))
df %>%
separate(original, into = c("year", "semester", "university_career", "unique_number"), sep = c(4,5,8,11))
# A tibble: 3 × 4
year semester university_career unique_number
<chr> <chr> <chr> <chr>
1 2010 2 168 056
2 2014 1 152 013
3 2018 2 008 006
You may want to convert some of the columns to an integer:
df %>%
separate(original, into = c("year", "semester", "university_career", "unique_number"), sep = c(4,5,8,11)) %>%
mutate(across(year:unique_number, as.integer))
# A tibble: 3 × 4
year semester university_career unique_number
<int> <int> <int> <int>
1 2010 2 168 56
2 2014 1 152 13
3 2018 2 8 6
CodePudding user response:
You can use the substr()
function if you first make the number into a character with as.character()
.
test <- '20102168056'
data <- list()
data$year <- substr(test, 1, 4)
data$semester <- substr(test, 5, 5)
data$uni_career <- substr(test, 6, 8)
data$unique_num <- substr(test, 9, 11)
print(data)
#> $year
#> [1] "2010"
#>
#> $semester
#> [1] "2"
#>
#> $uni_career
#> [1] "168"
#>
#> $unique_num
#> [1] "056"
Created on 2021-12-08 by the reprex package (v2.0.1)
CodePudding user response:
We can use stringr::str_match()
.
library(tidyverse)
data <- c(20102168056, 20102168356)
str_match(data, '^(\\d{4})(\\d{1})(\\d{3})(\\d{3})') %>%
as.data.frame() %>%
set_names(c('value', 'year', 'semester', 'university_career', 'unique_number'))
#> value year semester university_career unique_number
#> 1 20102168056 2010 2 168 056
#> 2 20102168356 2010 2 168 356
Created on 2021-12-08 by the reprex package (v2.0.1)