I have a dataframe with multiple columns pertaining to the same variable, that I'd like to combine into a single column. However, most of the answers I can find on here about this regard concatenating columns (e.g. Merge 2 columns into one in dataframe), whereas I want to preserve each individual cell of data in my dataframe, but just assemble them into one single column.
For clarity, here's a sample of what my input data approximately look like.
a | b | c | ID |
---|---|---|---|
string1 | string11 | string21 | 1111 |
string2 | string12 | string22 | 2222 |
Here is what I would like these data to look like:
newvar | ID |
---|---|
string1 | 1111 |
string11 | 1111 |
string21 | 1111 |
string2 | 2222 |
string12 | 2222 |
string22 | 2222 |
So far, I've been trying to use "pivot_longer()" to accomplish this, like so:
pivot_longer(df, c("a", "b", "c"), "newvar")
but I think I must misunderstand the purpose of pivot_longer() because the df it returns has cells populated with values a b and c rather than with the row values from those columns. I'm also not sure that pivot_longer has the ability to group_by column ID like I wish, except maybe through piping. Any help is much appreciated.
Edit: I've realized that my issue in using pivot_longer() seems to be that I need to specify "values_to" as the argument "newvar" is answering.
pivot_longer(df, c("a", "b", "c"), values_to = "newvar")
This code mostly accomplishes what I need
CodePudding user response:
Try to set the inputs of the function pivot_longer()
correctly as cols and values_to. cols=...
defines the columns which you are taking the values from. values_to = ...
defines the new name of the column where you are writing the values you took from 'cols'. Actually I think you were doing good, just pivot_longer
returns always the names of the columns which values you are taking from, unless you try other trickier things.
library(tidyverse)
df = data.frame(
a = c("string1","string2"),
b= c("string11","string12"),
c = c("string21", "string22"),
ID = c("1111","2222")
)
df %>%
pivot_longer(cols = names(df)[1:3],
values_to = "newvar") %>%
select(newvar, ID)
Output:
# A tibble: 6 x 2
newvar ID
<chr> <chr>
1 string1 1111
2 string11 1111
3 string21 1111
4 string2 2222
5 string12 2222
6 string22 2222
CodePudding user response:
Or with data.table.
library(data.table)
df = data.table(a=c("string1", "string2"), b=c("string11", "string12"), c=c("string21", "string22"), ID=c(1111,2222))
df_final = melt(df,
id.vars="ID",
measure.vars=c("a", "b", "c"),
value.name="newvar")[order(by=ID)][, c("ID", "newvar")]
Output:
> df_final
ID newvar
1: 1111 string1
2: 1111 string11
3: 1111 string21
4: 2222 string2
5: 2222 string12
6: 2222 string22