Home > Blockchain >  split values of a column and store only numeric values in a different column
split values of a column and store only numeric values in a different column

Time:12-10

I have a dataframe in long table format like this one below:

mydf <- data.frame(id = c(123, 123, 123, 123, 123),
                    name =c("test_2001", "test_2002", "test_2003", "test_2004", "test_2005"),
                    value = c(15, 20, 25, 30, 35))
mydf
#>    id      name value
#> 1 123 test_2001    15
#> 2 123 test_2002    20
#> 3 123 test_2003    25
#> 4 123 test_2004    30
#> 5 123 test_2005    35

Now, I want to split attributes from column name in such a way that I could remove those numeric values and store them in another column named year, output like this:

desired <- data.frame(id = c(123, 123, 123, 123, 123),
                      name =c("test", "test", "test", "test", "test"),
                      year =c(2001:2005),
                      value = c(15, 20, 25, 30, 35))
desired
#>    id name year value
#> 1 123 test 2001    15
#> 2 123 test 2002    20
#> 3 123 test 2003    25
#> 4 123 test 2004    30
#> 5 123 test 2005    35

How can I automate this step considering I have to apply this change to a lot of data?

CodePudding user response:

base R

cbind(mydf, strcapture("(.*)_(.*)", mydf$name, list(name="", year=1L)))
#    id      name value name year
# 1 123 test_2001    15 test 2001
# 2 123 test_2002    20 test 2002
# 3 123 test_2003    25 test 2003
# 4 123 test_2004    30 test 2004
# 5 123 test_2005    35 test 2005

dplyr/tidyr

library(dplyr)
library(tidyr)
mydf %>%
  separate(name, into = c("name", "year"), sep = "_")
#    id name year value
# 1 123 test 2001    15
# 2 123 test 2002    20
# 3 123 test 2003    25
# 4 123 test 2004    30
# 5 123 test 2005    35

(Note that with this, year is a string, can be converted to ints with as.integer.)

data.table

library(data.table)
as.data.table(mydf)[, c("name", "year") := tstrsplit(name, "_")][]
#       id   name value   year
#    <num> <char> <num> <char>
# 1:   123   test    15   2001
# 2:   123   test    20   2002
# 3:   123   test    25   2003
# 4:   123   test    30   2004
# 5:   123   test    35   2005

(Again, year here is still a string.)

  • Related