Home > Software design >  How can I combine several columns into one variable, tacking each onto the end of the other and grou
How can I combine several columns into one variable, tacking each onto the end of the other and grou

Time:12-13

I have a dataframe with multiple columns pertaining to the same variable, that I'd like to combine into a single column. However, most of the answers I can find on here about this regard concatenating columns (e.g. Merge 2 columns into one in dataframe), whereas I want to preserve each individual cell of data in my dataframe, but just assemble them into one single column.

For clarity, here's a sample of what my input data approximately look like.

a b c ID
string1 string11 string21 1111
string2 string12 string22 2222

Here is what I would like these data to look like:

newvar ID
string1 1111
string11 1111
string21 1111
string2 2222
string12 2222
string22 2222

So far, I've been trying to use "pivot_longer()" to accomplish this, like so:

pivot_longer(df, c("a", "b", "c"), "newvar")

but I think I must misunderstand the purpose of pivot_longer() because the df it returns has cells populated with values a b and c rather than with the row values from those columns. I'm also not sure that pivot_longer has the ability to group_by column ID like I wish, except maybe through piping. Any help is much appreciated.

Edit: I've realized that my issue in using pivot_longer() seems to be that I need to specify "values_to" as the argument "newvar" is answering.

pivot_longer(df, c("a", "b", "c"), values_to = "newvar")

This code mostly accomplishes what I need

CodePudding user response:

Try to set the inputs of the function pivot_longer()correctly as cols and values_to. cols=... defines the columns which you are taking the values from. values_to = ... defines the new name of the column where you are writing the values you took from 'cols'. Actually I think you were doing good, just pivot_longer returns always the names of the columns which values you are taking from, unless you try other trickier things.

library(tidyverse)

df = data.frame(
  a = c("string1","string2"),
  b= c("string11","string12"),
  c = c("string21", "string22"),
  ID = c("1111","2222")
)

df %>% 
  pivot_longer(cols = names(df)[1:3],
                    values_to = "newvar") %>% 
  select(newvar, ID)

Output:

# A tibble: 6 x 2
  newvar   ID   
  <chr>    <chr>
1 string1  1111 
2 string11 1111 
3 string21 1111 
4 string2  2222 
5 string12 2222 
6 string22 2222

CodePudding user response:

Or with data.table.

library(data.table)

df = data.table(a=c("string1", "string2"), b=c("string11", "string12"), c=c("string21", "string22"), ID=c(1111,2222))

df_final = melt(df,
     id.vars="ID",
     measure.vars=c("a", "b", "c"),
     value.name="newvar")[order(by=ID)][, c("ID", "newvar")]

Output:

> df_final
     ID   newvar
1: 1111  string1
2: 1111 string11
3: 1111 string21
4: 2222  string2
5: 2222 string12
6: 2222 string22
  • Related