Home > Mobile >  Transform duplicate rows to columns
Transform duplicate rows to columns

Time:12-18

I'm working on a database with hundreds of variables, however, as its origin is JSON, it's making me very difficult to organize it. For example, instead of the file bringing the information in the columns, it creates new lines. See the example.

df1 <- data_frame(ID = c(111,111,111,111,111,111,222,222,333),
                  NAME = c('JOHN','JOHN','MARY','MARY','JAMES','JAMES','WILL','WILL','MARK'),
                  ADRESS = c('NY','NY','NY','NY','ROMA','ROMA','LONDON','TOKYO',''),
                  COLOR = c('GREEN','GREEN','RED','RED','YELLOW','YELLOW','BLUE','BLUE','ORANGE'),
                  CAR = c('','','BMW','BMW','TRUCK','TRUCK','FORD','FORD','FERRARI'),
                  COUNTRY = c('USA','USA','USA','USA','USA','USA','USA','USA','USA'))

I would like to organize the file in a way that it is grouped by ID, as in the example below:

df2 <- data_frame(ID = c(111,222,333),
                  NAME1 = c('JOHN','WILL','MARK'),
                  NAME2 = c('MARY','',''),
                  NAME3 = c('JAMES','',''),
                  ADRESS1 = c('NY','LONDON',''),
                  ADRESS2 = c('NY','TOKYO',''),
                  ADRESS3 = c('ROMA','',''),
                  COLOR1 = c('GREEN','BLUE','ORANGE'),
                  COLOR2 = c('RED','',''),
                  COLOR3 = c('YELLOW','',''),
                  CAR1 = c('','FORD','FERRARI'),
                  CAR2 = c('BMW','',''),
                  CAR3 = c('TRUCK','',''),
                  COUNTRY = c('USA','USA','USA'))

However, note that the COUNTRY variable does not need to have numerous columns (COUNTRY1, COUNTRY2, COUNTRY3) as the results are repeated. In my original file, I will find numerous situations like this. How would I arrange the data evenly in df2?

CodePudding user response:

Perhaps we can try the following base R code using reshape

u <- reshape(
  transform(
    unique(df1),
    GRP = ave(seq_along(ID), ID, FUN = seq_along)
  ),
  direction = "wide",
  idvar = "ID",
  timevar = "GRP"
)

u[order(match(gsub("\\.\\d ", "", names(u)), names(df1)))]

which gives

> u[order(match(gsub("\\.\\d ", "", names(u)), names(df1)))]
   ID NAME.1 NAME.2 NAME.3 ADRESS.1 ADRESS.2 ADRESS.3 COLOR.1 COLOR.2 COLOR.3
1 111   JOHN   MARY  JAMES       NY       NY     ROMA   GREEN     RED  YELLOW
7 222   WILL   WILL   <NA>   LONDON    TOKYO     <NA>    BLUE    BLUE    <NA>
9 333   MARK   <NA>   <NA>              <NA>     <NA>  ORANGE    <NA>    <NA>
    CAR.1 CAR.2 CAR.3 COUNTRY.1 COUNTRY.2 COUNTRY.3
1           BMW TRUCK       USA       USA       USA
7    FORD  FORD  <NA>       USA       USA      <NA>
9 FERRARI  <NA>  <NA>       USA      <NA>      <NA>

CodePudding user response:

An option is also with pivot_wider

library(dplyr)
library(tidyr)
library(data.table)
distinct(df1) %>% 
  mutate(rn = rowid(ID)) %>%
  pivot_wider(names_from = rn, values_from = NAME:CAR, 
    names_sep = "", values_fill = "") %>%
  select(-COUNTRY, COUNTRY)

-output

# A tibble: 3 × 14
     ID NAME1 NAME2  NAME3   ADRESS1  ADRESS2 ADRESS3 COLOR1 COLOR2 COLOR3   CAR1      CAR2   CAR3    COUNTRY
  <dbl> <chr> <chr>  <chr>   <chr>    <chr>   <chr>   <chr>  <chr>  <chr>    <chr>     <chr>  <chr>   <chr>  
1   111 JOHN  "MARY" "JAMES" "NY"     "NY"    "ROMA"  GREEN  "RED"  "YELLOW" ""        "BMW"  "TRUCK" USA    
2   222 WILL  "WILL" ""      "LONDON" "TOKYO" ""      BLUE   "BLUE" ""       "FORD"    "FORD" ""      USA    
3   333 MARK  ""     ""      ""       ""      ""      ORANGE ""     ""       "FERRARI" ""     ""      USA    
  • Related