I have a table which has 5 columns (ID, var, state, loc and position). The var column contains a description of a certain variant e.g. var1. Within the table there are multiple rows which include var 1 but they have a different state and position. What I want to do is make a new table where each var is included only once and the position is included in two columns based on its state.
For example, say I have four var1 rows; two with the state H and two with the state h. In the new table I need the columns to be: sample - var - loc - position if H and position if h - such that all the information for var 1 is in one row. I would need to be able to do this for every single variant in my original data set.
structure(list(ID = c(1234L, 1234L, 1234L, 1234L, 5678L, 5678L,
NA, NA, NA, NA), var = c("var1", "var1", "var1", "var1", "var2",
"var2", NA, NA, NA, NA), state = c("H", "H", "h", "h", "H", "h",
NA, NA, NA, NA), loc = c(4L, 4L, 4L, 4L, 12L, 12L, NA, NA, NA,
NA), position = c(6000L, 6002L, 6004L, 6006L, 3002L, 3004L, NA,
NA, NA, NA)), row.names = c("1", "2", "3", "4", "5", "6", "NA",
"NA.1", "NA.2", "NA.3"), class = "data.frame")
structure(list(V1 = c("ID", "1234", "5678", NA, NA, NA, NA, NA,
NA, NA), V2 = c("var1", "var1", "var2", NA, NA, NA, NA, NA, NA,
NA), V3 = c("loc", "4", "12", NA, NA, NA, NA, NA, NA, NA), V4 = c("state H",
"6000 6002", "3002", NA, NA, NA, NA, NA, NA, NA), V5 = c("state h",
"6004 6006", "3004", NA, NA, NA, NA, NA, NA, NA)), row.names = c("1",
"2", "3", "NA", "NA.1", "NA.2", "NA.3", "NA.4", "NA.5", "NA.6"
), class = "data.frame")
Any guidance would be appreciate
CodePudding user response:
The answer to your question is likely revolving around tidyr::pivot_wider
I changed the example data because I believe yours was inconsistent.
Data
df<-structure(list(ID = c(1234L, 1234L, 1234L, 1234L, 5678L, 5678L
), var = c("var1", "var1", "var1", "var1", "var2", "var2"), state = c("H",
"H", "h", "h", "H", "h"), loc = c(4L, 4L, 4L, 4L, 12L, 12L),
position = c(6000L, 6002L, 6004L, 6006L, 3002L, 3004L)), row.names = c("1",
"2", "3", "4", "5", "6"), class = "data.frame")
df
ID var state loc position
1 1234 var1 H 4 6000
2 1234 var1 H 4 6002
3 1234 var1 h 4 6004
4 1234 var1 h 4 6006
5 5678 var2 H 12 3002
6 5678 var2 h 12 3004
Answer
library(tidyr)
df %>% pivot_wider(names_from = state,
values_from = position,
values_fn = toString)
# A tibble: 2 × 5
ID var loc H h
<int> <chr> <int> <chr> <chr>
1 1234 var1 4 6000, 6002 6004, 6006
2 5678 var2 12 3002 3004