Home > database >  How do I combine two sets of 2 columns in a dataframe into 2 columns in R?
How do I combine two sets of 2 columns in a dataframe into 2 columns in R?

Time:12-15

I've tried pivot_longer() and gather() functions from dplyr and tidyr but I can't seem to figure it out.

example <- data.frame("country"="Central African Republic",
                   "year"=c(2017,2018,2019,2020,2021),
                   "life_expectancy.x" = c(51,52,53,54,55),
                   "sex.x"= "M",
                   "life_expectancy.y"= c(51.5,52.5,53.5,54.5,55.5),
                   "sex.y" = "F" )

At the moment I have 2 columns for sex and life expectancy corresponding to each sex. How do I make one column for sex with both male (M) and female (F) values and the same for life_expectancy, but still have the life expectancy values correspond to the correct sex? I would like to end up with 1 column for country, 1 column for year, 1 column for sex, and 1 column for life expectancy.

CodePudding user response:

You can select and rename the columns, then rbind:

rbind(
  example %>%
    select(1:4) %>%
    rename(life_expectancy = life_expectancy.x,
           sex = sex.x),
  example %>%
    select(c(1, 2, 5, 6)) %>%
    rename(life_expectancy = life_expectancy.y,
           sex = sex.y)
)

Output:

                    country year life_expectancy sex
1  Central African Republic 2017            51.0   M
2  Central African Republic 2018            52.0   M
3  Central African Republic 2019            53.0   M
4  Central African Republic 2020            54.0   M
5  Central African Republic 2021            55.0   M
6  Central African Republic 2017            51.5   F
7  Central African Republic 2018            52.5   F
8  Central African Republic 2019            53.5   F
9  Central African Republic 2020            54.5   F
10 Central African Republic 2021            55.5   F

And in case you have more than just this two columns ending with .x and .y:

library(tidyverse)
rbind(
  example %>%
    select(1:4) %>%
    rename_with(~ str_remove(.,".x$"), .cols = ends_with(".x")),
  example %>%
    select(c(1, 2, 5, 6)) %>%
    rename_with(~ str_remove(.,".y$"), .cols = ends_with(".y"))
)

CodePudding user response:

In base r, try

df<-data.frame("country"=rep(example$country,2),"year"=rep(example$year,2),"life expectancy"=c(example$life_expectancy.x,example$life_expectancy.y),"sex"=c(example$sex.x,example$sex.y))

CodePudding user response:

Try reshape. Works for any number of rows, no manual renaming.
You don't have to set the new row names but I thought it looks more elegant with just row numbers.

reshape(example, varying=c("life_expectancy.x","life_expectancy.y", "sex.x", "sex.y"), 
  idvar=c("country", "year"), timevar=NULL,
  new.row.names=with( example, 1:length(c(sex.x,sex.y)) ), dir="long" )

                    country year life_expectancy sex
1  Central African Republic 2017            51.0   M
2  Central African Republic 2018            52.0   M
3  Central African Republic 2019            53.0   M
4  Central African Republic 2020            54.0   M
5  Central African Republic 2021            55.0   M
6  Central African Republic 2017            51.5   F
7  Central African Republic 2018            52.5   F
8  Central African Republic 2019            53.5   F
9  Central African Republic 2020            54.5   F
10 Central African Republic 2021            55.5   F

CodePudding user response:

Something like this using pivot_longer?

example <- data.frame("country"="Central African Republic",
                      "year"=c(2017,2018,2019,2020,2021),
                      "life_expectancy.x" = c(51,52,53,54,55),
                      "sex.x"= "M",
                      "life_expectancy.y"= c(51.5,52.5,53.5,54.5,55.5),
                      "sex.y" = "F" )

library(tidyverse)

example %>% 
    pivot_longer(c(sex.x, sex.y), names_to = "names_sex", values_to = "sex") %>% 
    pivot_longer(c(life_expectancy.x, life_expectancy.y), names_to = "names_life", 
                 values_to = "life_expectancy")
#> # A tibble: 20 × 6
#>    country                   year names_sex sex   names_life        life_expectancy
#>    <chr>                    <dbl> <chr>     <chr> <chr>                       <dbl>
#>  1 Central African Republic  2017 sex.x     M     life_expectancy.x            51  
#>  2 Central African Republic  2017 sex.x     M     life_expectancy.y            51.5
#>  3 Central African Republic  2017 sex.y     F     life_expectancy.x            51  
#>  4 Central African Republic  2017 sex.y     F     life_expectancy.y            51.5
#>  5 Central African Republic  2018 sex.x     M     life_expectancy.x            52  
#>  6 Central African Republic  2018 sex.x     M     life_expectancy.y            52.5
#>  7 Central African Republic  2018 sex.y     F     life_expectancy.x            52  
#>  8 Central African Republic  2018 sex.y     F     life_expectancy.y            52.5
#>  9 Central African Republic  2019 sex.x     M     life_expectancy.x            53  
#> 10 Central African Republic  2019 sex.x     M     life_expectancy.y            53.5
#> 11 Central African Republic  2019 sex.y     F     life_expectancy.x            53  
#> 12 Central African Republic  2019 sex.y     F     life_expectancy.y            53.5
#> 13 Central African Republic  2020 sex.x     M     life_expectancy.x            54  
#> 14 Central African Republic  2020 sex.x     M     life_expectancy.y            54.5
#> 15 Central African Republic  2020 sex.y     F     life_expectancy.x            54  
#> 16 Central African Republic  2020 sex.y     F     life_expectancy.y            54.5
#> 17 Central African Republic  2021 sex.x     M     life_expectancy.x            55  
#> 18 Central African Republic  2021 sex.x     M     life_expectancy.y            55.5
#> 19 Central African Republic  2021 sex.y     F     life_expectancy.x            55  
#> 20 Central African Republic  2021 sex.y     F     life_expectancy.y            55.5

Created on 2021-12-14 by the reprex package (v0.3.0)

CodePudding user response:

example <- data.frame("country"="Central African Republic",
                   "year"=c(2017,2018,2019,2020,2021),
                   "life_expectancy.x" = c(51,52,53,54,55),
                   "sex.x"= "M",
                   "life_expectancy.y"= c(51.5,52.5,53.5,54.5,55.5),
                   "sex.y" = "F" )

library(data.table)

setDT(example)

output <- melt(example, measure = patterns("^life_expectancy", "^sex"), value.name = c("life_expectancy", "sex"))
output[, variable := NULL]
output

#                      country year life_expectancy sex
#  1: Central African Republic 2017            51.0   M
#  2: Central African Republic 2018            52.0   M
#  3: Central African Republic 2019            53.0   M
#  4: Central African Republic 2020            54.0   M
#  5: Central African Republic 2021            55.0   M
#  6: Central African Republic 2017            51.5   F
#  7: Central African Republic 2018            52.5   F
#  8: Central African Republic 2019            53.5   F
#  9: Central African Republic 2020            54.5   F
# 10: Central African Republic 2021            55.5   F
  •  Tags:  
  • r
  • Related