Home > Mobile >  Quicker way? Remove rows in book1, take row 4 values as column name, set some column name same as bo
Quicker way? Remove rows in book1, take row 4 values as column name, set some column name same as bo

Time:06-26

Below is the first dataframe where I want to remove the first 3 rows:

book1 <- structure(list(Instructions..xyz = c("Note: abc", "", "Set1", 
                                              "id", "632592651", "633322173", "634703802", "634927873", "635812953", 
                                              "636004739", "636101211", "636157799", "636263106", "636752420"
), X = c("", "", "", "title", "asdf", "cat", "dog", "mouse", 
         "elephant", "goose", "rat", "mice", "kitty", "kitten"), X.1 = c("", 
                                                                         "", "", "hazard", "y", "y", "y", "n", "n", "y", "y", "n", "n", 
                                                                         "y"), X.2 = c("", "", "Set2", "id", "632592651", "633322173", 
                                                                                       "634703802", "634927873", "635812953", "636004739", "636101211", 
                                                                                       "636157799", "636263106", "636752420"), X.3 = c("", "", "", "title", 
                                                                                                                                       "asdf2", "cat2", "dog2", "mouse2", "elephant2", "goose2", "rat2", 
                                                                                                                                       "mice2", "kitty2", "kitten2"), X.4 = c("", "", "", "index", "0.664883807", 
                                                                                                                                                                              "0.20089779", "0.752228086", "0.124729276", "0.626285086", "0.134537909", 
                                                                                                                                                                              "0.612526768", "0.769622463", "0.682532524", "0.819015658")), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                                                                                                                                -14L))

I did book1 <- book1[-c(1:3),] but I'm not sure how to make id, title, hazard, id, title, index as the column name instead of Instructions..xyz, etc. See image below for desired output

enter image description here

Then for the second dataframe,

book2 <- structure(list(identity = c(632592651L, 633322173L, 634703802L, 
                                     634927873L, 635812953L, 636004739L, 636101211L, 636157799L, 636263106L, 
                                     636752420L, 636809222L, 2004722036L, 2004894388L, 2005045755L, 
                                     2005535472L, 2005630542L, 2005788781L, 2005809679L, 2005838317L, 
                                     2005866692L), text = c("asdf_xyz", "cat", "dog", "mouse", "elephant", 
                                                            "goose", "rat", "mice", "kitty", "kitten", "tiger_xyz", "lion", 
                                                            "leopard", "ostrich", "kangaroo", "platypus", "fish", "reptile", 
                                                            "mammals", "amphibians_xyz"), volume = c(1234L, 432L, 324L, 333L, 
                                                                                                     2223L, 412346L, 7456L, 3456L, 2345L, 2345L, 6L, 345L, 23L, 2L, 
                                                                                                     4778L, 234L, 8675L, 3459L, 8L, 9L)), class = "data.frame", row.names = c(NA, 
                                                                                                                                                                              -20L))

I then rename column 1 and 2 in book2 so that it matches that of book1 by names(book2)[1:2] <- c('id','title') where I can later do inner_join. The desired output is shown in the image below by

library(dplyr)
book1 %>%
  inner_join(book2, by = c("id", "title")) 

enter image description here

This is taking quite a few steps and wondering if there's a simplified version to this?

CodePudding user response:

Something like this?

# split the data by columns
book2a <- book1[-(1:4), 1:3]
book2b <- book1[-(1:4), 4:6]

# take care of names
names(book2a) <- book1[4, 1:3, drop = TRUE]
names(book2b) <- book1[4, 4:6, drop = TRUE]

# book2b needs processing
book2b$title <- sub("2", "", book2b$title)
book2b$index <- as.numeric(book2b$index)

# join both data sets and clean-up
book2 <- merge(book2a, book2b, all = TRUE)
rm(book2a, book2b)

book2
#>           id    title hazard     index
#> 1  632592651     asdf      y 0.6648838
#> 2  633322173      cat      y 0.2008978
#> 3  634703802      dog      y 0.7522281
#> 4  634927873    mouse      n 0.1247293
#> 5  635812953 elephant      n 0.6262851
#> 6  636004739    goose      y 0.1345379
#> 7  636101211      rat      y 0.6125268
#> 8  636157799     mice      n 0.7696225
#> 9  636263106    kitty      n 0.6825325
#> 10 636752420   kitten      y 0.8190157

Created on 2022-06-25 by the reprex package (v2.0.1)

CodePudding user response:

Found the solution to the first question

library(janitor)
book1 <- row_to_names(dat=book1, row_number=4, remove_row = TRUE, remove_rows_above = TRUE)

I applied

names(book1)[4:5] <- c('id1','title1')

to obtain unique column name, then tried inner_join as proposed earlier but with error and found that book1$id is character where book2$id is int and so I did

book1$id <- as.integer(book1$id)

and finally it works with

Yeah <- book1 %>%
  inner_join(book2, by = c("id", "title")) 

Output below:

 id    title hazard       id1    title1       index volume
1 633322173      cat      y 633322173      cat2  0.20089779    432
2 634703802      dog      y 634703802      dog2 0.752228086    324
3 634927873    mouse      n 634927873    mouse2 0.124729276    333
4 635812953 elephant      n 635812953 elephant2 0.626285086   2223
5 636004739    goose      y 636004739    goose2 0.134537909 412346
6 636101211      rat      y 636101211      rat2 0.612526768   7456
7 636157799     mice      n 636157799     mice2 0.769622463   3456
8 636263106    kitty      n 636263106    kitty2 0.682532524   2345
9 636752420   kitten      y 636752420   kitten2 0.819015658   2345

Still wondering if there's quicker way?

  • Related