I have a dataframe in long table format like this one below:
mydf <- data.frame(id = c(123, 123, 123, 123, 123),
name =c("test_2001", "test_2002", "test_2003", "test_2004", "test_2005"),
value = c(15, 20, 25, 30, 35))
mydf
#> id name value
#> 1 123 test_2001 15
#> 2 123 test_2002 20
#> 3 123 test_2003 25
#> 4 123 test_2004 30
#> 5 123 test_2005 35
Now, I want to split attributes from column name
in such a way that I could remove those numeric values and store them in another column named year
, output like this:
desired <- data.frame(id = c(123, 123, 123, 123, 123),
name =c("test", "test", "test", "test", "test"),
year =c(2001:2005),
value = c(15, 20, 25, 30, 35))
desired
#> id name year value
#> 1 123 test 2001 15
#> 2 123 test 2002 20
#> 3 123 test 2003 25
#> 4 123 test 2004 30
#> 5 123 test 2005 35
How can I automate this step considering I have to apply this change to a lot of data?
CodePudding user response:
base R
cbind(mydf, strcapture("(.*)_(.*)", mydf$name, list(name="", year=1L)))
# id name value name year
# 1 123 test_2001 15 test 2001
# 2 123 test_2002 20 test 2002
# 3 123 test_2003 25 test 2003
# 4 123 test_2004 30 test 2004
# 5 123 test_2005 35 test 2005
dplyr/tidyr
library(dplyr)
library(tidyr)
mydf %>%
separate(name, into = c("name", "year"), sep = "_")
# id name year value
# 1 123 test 2001 15
# 2 123 test 2002 20
# 3 123 test 2003 25
# 4 123 test 2004 30
# 5 123 test 2005 35
(Note that with this, year
is a string, can be converted to ints with as.integer
.)
data.table
library(data.table)
as.data.table(mydf)[, c("name", "year") := tstrsplit(name, "_")][]
# id name value year
# <num> <char> <num> <char>
# 1: 123 test 15 2001
# 2: 123 test 20 2002
# 3: 123 test 25 2003
# 4: 123 test 30 2004
# 5: 123 test 35 2005
(Again, year
here is still a string.)