Home > Blockchain >  Using R to combine information into one row
Using R to combine information into one row

Time:12-08

I have a table which has 5 columns (ID, var, state, loc and position). The var column contains a description of a certain variant e.g. var1. Within the table there are multiple rows which include var 1 but they have a different state and position. What I want to do is make a new table where each var is included only once and the position is included in two columns based on its state.

For example, say I have four var1 rows; two with the state H and two with the state h. In the new table I need the columns to be: sample - var - loc - position if H and position if h - such that all the information for var 1 is in one row. I would need to be able to do this for every single variant in my original data set.

Current data example

structure(list(ID = c(1234L, 1234L, 1234L, 1234L, 5678L, 5678L, 
NA, NA, NA, NA), var = c("var1", "var1", "var1", "var1", "var2", 
"var2", NA, NA, NA, NA), state = c("H", "H", "h", "h", "H", "h", 
NA, NA, NA, NA), loc = c(4L, 4L, 4L, 4L, 12L, 12L, NA, NA, NA, 
NA), position = c(6000L, 6002L, 6004L, 6006L, 3002L, 3004L, NA, 
NA, NA, NA)), row.names = c("1", "2", "3", "4", "5", "6", "NA", 
"NA.1", "NA.2", "NA.3"), class = "data.frame")

wanted format

structure(list(V1 = c("ID", "1234", "5678", NA, NA, NA, NA, NA, 
NA, NA), V2 = c("var1", "var1", "var2", NA, NA, NA, NA, NA, NA, 
NA), V3 = c("loc", "4", "12", NA, NA, NA, NA, NA, NA, NA), V4 = c("state H", 
"6000 6002", "3002", NA, NA, NA, NA, NA, NA, NA), V5 = c("state h", 
"6004 6006", "3004", NA, NA, NA, NA, NA, NA, NA)), row.names = c("1", 
"2", "3", "NA", "NA.1", "NA.2", "NA.3", "NA.4", "NA.5", "NA.6"
), class = "data.frame")

Any guidance would be appreciate

CodePudding user response:

The answer to your question is likely revolving around tidyr::pivot_wider

I changed the example data because I believe yours was inconsistent.

Data

df<-structure(list(ID = c(1234L, 1234L, 1234L, 1234L, 5678L, 5678L
), var = c("var1", "var1", "var1", "var1", "var2", "var2"), state = c("H", 
"H", "h", "h", "H", "h"), loc = c(4L, 4L, 4L, 4L, 12L, 12L), 
    position = c(6000L, 6002L, 6004L, 6006L, 3002L, 3004L)), row.names = c("1", 
"2", "3", "4", "5", "6"), class = "data.frame")

df

    ID  var state loc position
1 1234 var1     H   4     6000
2 1234 var1     H   4     6002
3 1234 var1     h   4     6004
4 1234 var1     h   4     6006
5 5678 var2     H  12     3002
6 5678 var2     h  12     3004

Answer

library(tidyr)

df %>% pivot_wider(names_from = state,
                   values_from = position,
                   values_fn = toString)

# A tibble: 2 × 5
     ID var     loc H          h         
  <int> <chr> <int> <chr>      <chr>     
1  1234 var1      4 6000, 6002 6004, 6006
2  5678 var2     12 3002       3004    
  •  Tags:  
  • r
  • Related