Below is the first dataframe where I want to remove the first 3 rows:
book1 <- structure(list(Instructions..xyz = c("Note: abc", "", "Set1",
"id", "632592651", "633322173", "634703802", "634927873", "635812953",
"636004739", "636101211", "636157799", "636263106", "636752420"
), X = c("", "", "", "title", "asdf", "cat", "dog", "mouse",
"elephant", "goose", "rat", "mice", "kitty", "kitten"), X.1 = c("",
"", "", "hazard", "y", "y", "y", "n", "n", "y", "y", "n", "n",
"y"), X.2 = c("", "", "Set2", "id", "632592651", "633322173",
"634703802", "634927873", "635812953", "636004739", "636101211",
"636157799", "636263106", "636752420"), X.3 = c("", "", "", "title",
"asdf2", "cat2", "dog2", "mouse2", "elephant2", "goose2", "rat2",
"mice2", "kitty2", "kitten2"), X.4 = c("", "", "", "index", "0.664883807",
"0.20089779", "0.752228086", "0.124729276", "0.626285086", "0.134537909",
"0.612526768", "0.769622463", "0.682532524", "0.819015658")), class = "data.frame", row.names = c(NA,
-14L))
I did book1 <- book1[-c(1:3),]
but I'm not sure how to make id, title, hazard, id, title, index as the column name instead of Instructions..xyz, etc. See image below for desired output
Then for the second dataframe,
book2 <- structure(list(identity = c(632592651L, 633322173L, 634703802L,
634927873L, 635812953L, 636004739L, 636101211L, 636157799L, 636263106L,
636752420L, 636809222L, 2004722036L, 2004894388L, 2005045755L,
2005535472L, 2005630542L, 2005788781L, 2005809679L, 2005838317L,
2005866692L), text = c("asdf_xyz", "cat", "dog", "mouse", "elephant",
"goose", "rat", "mice", "kitty", "kitten", "tiger_xyz", "lion",
"leopard", "ostrich", "kangaroo", "platypus", "fish", "reptile",
"mammals", "amphibians_xyz"), volume = c(1234L, 432L, 324L, 333L,
2223L, 412346L, 7456L, 3456L, 2345L, 2345L, 6L, 345L, 23L, 2L,
4778L, 234L, 8675L, 3459L, 8L, 9L)), class = "data.frame", row.names = c(NA,
-20L))
I then rename column 1 and 2 in book2 so that it matches that of book1 by names(book2)[1:2] <- c('id','title')
where I can later do inner_join. The desired output is shown in the image below by
library(dplyr)
book1 %>%
inner_join(book2, by = c("id", "title"))
This is taking quite a few steps and wondering if there's a simplified version to this?
CodePudding user response:
Something like this?
# split the data by columns
book2a <- book1[-(1:4), 1:3]
book2b <- book1[-(1:4), 4:6]
# take care of names
names(book2a) <- book1[4, 1:3, drop = TRUE]
names(book2b) <- book1[4, 4:6, drop = TRUE]
# book2b needs processing
book2b$title <- sub("2", "", book2b$title)
book2b$index <- as.numeric(book2b$index)
# join both data sets and clean-up
book2 <- merge(book2a, book2b, all = TRUE)
rm(book2a, book2b)
book2
#> id title hazard index
#> 1 632592651 asdf y 0.6648838
#> 2 633322173 cat y 0.2008978
#> 3 634703802 dog y 0.7522281
#> 4 634927873 mouse n 0.1247293
#> 5 635812953 elephant n 0.6262851
#> 6 636004739 goose y 0.1345379
#> 7 636101211 rat y 0.6125268
#> 8 636157799 mice n 0.7696225
#> 9 636263106 kitty n 0.6825325
#> 10 636752420 kitten y 0.8190157
Created on 2022-06-25 by the reprex package (v2.0.1)
CodePudding user response:
Found the solution to the first question
library(janitor)
book1 <- row_to_names(dat=book1, row_number=4, remove_row = TRUE, remove_rows_above = TRUE)
I applied
names(book1)[4:5] <- c('id1','title1')
to obtain unique column name, then tried inner_join as proposed earlier but with error and found that book1$id is character where book2$id is int and so I did
book1$id <- as.integer(book1$id)
and finally it works with
Yeah <- book1 %>%
inner_join(book2, by = c("id", "title"))
Output below:
id title hazard id1 title1 index volume
1 633322173 cat y 633322173 cat2 0.20089779 432
2 634703802 dog y 634703802 dog2 0.752228086 324
3 634927873 mouse n 634927873 mouse2 0.124729276 333
4 635812953 elephant n 635812953 elephant2 0.626285086 2223
5 636004739 goose y 636004739 goose2 0.134537909 412346
6 636101211 rat y 636101211 rat2 0.612526768 7456
7 636157799 mice n 636157799 mice2 0.769622463 3456
8 636263106 kitty n 636263106 kitty2 0.682532524 2345
9 636752420 kitten y 636752420 kitten2 0.819015658 2345
Still wondering if there's quicker way?