I've tried pivot_longer()
and gather()
functions from dplyr
and tidyr
but I can't seem to figure it out.
example <- data.frame("country"="Central African Republic",
"year"=c(2017,2018,2019,2020,2021),
"life_expectancy.x" = c(51,52,53,54,55),
"sex.x"= "M",
"life_expectancy.y"= c(51.5,52.5,53.5,54.5,55.5),
"sex.y" = "F" )
At the moment I have 2 columns for sex and life expectancy corresponding to each sex. How do I make one column for sex with both male (M) and female (F) values and the same for life_expectancy, but still have the life expectancy values correspond to the correct sex? I would like to end up with 1 column for country, 1 column for year, 1 column for sex, and 1 column for life expectancy.
CodePudding user response:
You can select
and rename
the columns, then rbind
:
rbind(
example %>%
select(1:4) %>%
rename(life_expectancy = life_expectancy.x,
sex = sex.x),
example %>%
select(c(1, 2, 5, 6)) %>%
rename(life_expectancy = life_expectancy.y,
sex = sex.y)
)
Output:
country year life_expectancy sex
1 Central African Republic 2017 51.0 M
2 Central African Republic 2018 52.0 M
3 Central African Republic 2019 53.0 M
4 Central African Republic 2020 54.0 M
5 Central African Republic 2021 55.0 M
6 Central African Republic 2017 51.5 F
7 Central African Republic 2018 52.5 F
8 Central African Republic 2019 53.5 F
9 Central African Republic 2020 54.5 F
10 Central African Republic 2021 55.5 F
And in case you have more than just this two columns ending with .x
and .y
:
library(tidyverse)
rbind(
example %>%
select(1:4) %>%
rename_with(~ str_remove(.,".x$"), .cols = ends_with(".x")),
example %>%
select(c(1, 2, 5, 6)) %>%
rename_with(~ str_remove(.,".y$"), .cols = ends_with(".y"))
)
CodePudding user response:
In base r, try
df<-data.frame("country"=rep(example$country,2),"year"=rep(example$year,2),"life expectancy"=c(example$life_expectancy.x,example$life_expectancy.y),"sex"=c(example$sex.x,example$sex.y))
CodePudding user response:
Try reshape
. Works for any number of rows, no manual renaming.
You don't have to set the new row names but I thought it looks more elegant with just row numbers.
reshape(example, varying=c("life_expectancy.x","life_expectancy.y", "sex.x", "sex.y"),
idvar=c("country", "year"), timevar=NULL,
new.row.names=with( example, 1:length(c(sex.x,sex.y)) ), dir="long" )
country year life_expectancy sex
1 Central African Republic 2017 51.0 M
2 Central African Republic 2018 52.0 M
3 Central African Republic 2019 53.0 M
4 Central African Republic 2020 54.0 M
5 Central African Republic 2021 55.0 M
6 Central African Republic 2017 51.5 F
7 Central African Republic 2018 52.5 F
8 Central African Republic 2019 53.5 F
9 Central African Republic 2020 54.5 F
10 Central African Republic 2021 55.5 F
CodePudding user response:
Something like this using pivot_longer
?
example <- data.frame("country"="Central African Republic",
"year"=c(2017,2018,2019,2020,2021),
"life_expectancy.x" = c(51,52,53,54,55),
"sex.x"= "M",
"life_expectancy.y"= c(51.5,52.5,53.5,54.5,55.5),
"sex.y" = "F" )
library(tidyverse)
example %>%
pivot_longer(c(sex.x, sex.y), names_to = "names_sex", values_to = "sex") %>%
pivot_longer(c(life_expectancy.x, life_expectancy.y), names_to = "names_life",
values_to = "life_expectancy")
#> # A tibble: 20 × 6
#> country year names_sex sex names_life life_expectancy
#> <chr> <dbl> <chr> <chr> <chr> <dbl>
#> 1 Central African Republic 2017 sex.x M life_expectancy.x 51
#> 2 Central African Republic 2017 sex.x M life_expectancy.y 51.5
#> 3 Central African Republic 2017 sex.y F life_expectancy.x 51
#> 4 Central African Republic 2017 sex.y F life_expectancy.y 51.5
#> 5 Central African Republic 2018 sex.x M life_expectancy.x 52
#> 6 Central African Republic 2018 sex.x M life_expectancy.y 52.5
#> 7 Central African Republic 2018 sex.y F life_expectancy.x 52
#> 8 Central African Republic 2018 sex.y F life_expectancy.y 52.5
#> 9 Central African Republic 2019 sex.x M life_expectancy.x 53
#> 10 Central African Republic 2019 sex.x M life_expectancy.y 53.5
#> 11 Central African Republic 2019 sex.y F life_expectancy.x 53
#> 12 Central African Republic 2019 sex.y F life_expectancy.y 53.5
#> 13 Central African Republic 2020 sex.x M life_expectancy.x 54
#> 14 Central African Republic 2020 sex.x M life_expectancy.y 54.5
#> 15 Central African Republic 2020 sex.y F life_expectancy.x 54
#> 16 Central African Republic 2020 sex.y F life_expectancy.y 54.5
#> 17 Central African Republic 2021 sex.x M life_expectancy.x 55
#> 18 Central African Republic 2021 sex.x M life_expectancy.y 55.5
#> 19 Central African Republic 2021 sex.y F life_expectancy.x 55
#> 20 Central African Republic 2021 sex.y F life_expectancy.y 55.5
Created on 2021-12-14 by the reprex package (v0.3.0)
CodePudding user response:
example <- data.frame("country"="Central African Republic",
"year"=c(2017,2018,2019,2020,2021),
"life_expectancy.x" = c(51,52,53,54,55),
"sex.x"= "M",
"life_expectancy.y"= c(51.5,52.5,53.5,54.5,55.5),
"sex.y" = "F" )
library(data.table)
setDT(example)
output <- melt(example, measure = patterns("^life_expectancy", "^sex"), value.name = c("life_expectancy", "sex"))
output[, variable := NULL]
output
# country year life_expectancy sex
# 1: Central African Republic 2017 51.0 M
# 2: Central African Republic 2018 52.0 M
# 3: Central African Republic 2019 53.0 M
# 4: Central African Republic 2020 54.0 M
# 5: Central African Republic 2021 55.0 M
# 6: Central African Republic 2017 51.5 F
# 7: Central African Republic 2018 52.5 F
# 8: Central African Republic 2019 53.5 F
# 9: Central African Republic 2020 54.5 F
# 10: Central African Republic 2021 55.5 F