Beginner here: I have a dataframe that looks like this:
Name Col_1 Col_2 Col_3 Col_4
abc Website https://LINK1 Twitter https://LINK4
def Email https://LINK2 Facebook https://LINK5
ghj Document https://LINK3 Website https://LINK6
I want to create columns for the values in Col_1 and Col_3 (aka Website, Email, Document, Twitter and Facebook) move the LINKS in Col_2 and Col_4 into the respective columns. The output should look like this:
Names Website Email Document Twitter Facebook
abc https://LINK1 NA NA https://LINK4 NA
def NA https://LINK2 NA NA https://LINK5
ghj https://LINK6 NA https://LINK3 NA NA
How can I achieve this? Thank you!
CodePudding user response:
this output format is sooo way against all tidy data-conventions, that it took me a lot of wrangling to get it right..
Not the most pretty, but it gets the job done
library(data.table)
mydata <- read.table(text = "Col_1 Col_2 Col_3 Col_4
Website https://LINK1 Twitter https://LINK4
Email https://LINK2 Facebook https://LINK5
Document https://LINK3 Website https://LINK6", header = TRUE)
#split in chunks of 2 columns
L <- split.default(mydata, f = 2:(ncol(mydata) 1) %/% 2)
# bind rows together
DT <- data.table::rbindlist(L, use.names = FALSE)
# split to list by Col_1
L2 <- split(DT, by = "Col_1", keep.by = FALSE)
# make vectors
L3 <- unlist(lapply(L2, c), recursive = FALSE)
# make lengts equal
L4 <- lapply(L3, `length<-`, max(lengths(L3)))
# colbind
ans <- Reduce(cbind, L4)
# set colnames
colnames(ans) <- names(L2)
# make data.frame
as.data.frame(ans)
# Website Email Document Twitter Facebook
# 1 https://LINK1 https://LINK2 https://LINK3 https://LINK4 https://LINK5
# 2 https://LINK6 <NA> <NA> <NA> <NA>
CodePudding user response:
Here is an option using two times pivot_wider
and coalesce
once the duplicated website columns like this:
library(dplyr)
library(tidyr)
df %>%
pivot_wider(names_from = Col_1, values_from = Col_2) %>%
pivot_wider(names_from = Col_3, values_from = Col_4, names_repair = "unique") %>%
mutate(Website = coalesce(`Website...2`, `Website...7`)) %>%
select(-c(`Website...2`, `Website...7`))
#> New names:
#> • `Website` -> `Website...2`
#> • `Website` -> `Website...7`
#> # A tibble: 3 × 6
#> Name Email Document Twitter Facebook Website
#> <chr> <chr> <chr> <chr> <chr> <chr>
#> 1 abc <NA> <NA> https://LINK4 <NA> https://LINK1
#> 2 def https://LINK2 <NA> <NA> https://LINK5 <NA>
#> 3 ghj <NA> https://LINK3 <NA> <NA> https://LINK6
Created on 2023-01-02 with reprex v2.0.2
CodePudding user response:
Firstly, you need to bind the columns on top each other to make it look like a long format data;
df_new <- data.frame(Names=df[,1],data.table::rbindlist(list(df[,2:3],df[,4:5])))
# Names Col_1 Col_2
# 1 abc Website https://LINK1
# 2 def Email https://LINK2
# 3 fgh Document https://LINK3
# 4 abc Twitter https://LINK4
# 5 def Facebook https://LINK5
# 6 fgh Website https://LINK6
After that, you can use any long to wide format option to end up with your output. In here, I chose dcast
from reshape2
library.
reshape2::dcast(df_new, Names ~ Col_1)
gives,
# Names Document Email Website Facebook Twitter
#1 abc <NA> <NA> https://LINK1 <NA> https://LINK4
#2 def <NA> https://LINK2 <NA> https://LINK5 <NA>
#3 fgh https://LINK3 <NA> https://LINK6 <NA> <NA>
CodePudding user response:
Here is another way with group_split
and map_dfr
in tidyverse. First, split on Name
column, and then create tibbles for each, specifying the sources and links. Then, using pivot_wider
put into final format.
library(tidyverse)
df |>
group_split(Name) |>
map_dfr(
~tibble(
Name = .x$Name,
Source = c(.x$Col_1, .x$Col_3),
Link = c(.x$Col_2, .x$Col_4)
)
) |>
pivot_wider(id_cols = Name, names_from = Source, values_from = Link)
Output
Name Website Twitter Email Facebook Document
<chr> <chr> <chr> <chr> <chr> <chr>
1 abc https://LINK1 https://LINK4 NA NA NA
2 def NA NA https://LINK2 https://LINK5 NA
3 ghj https://LINK6 NA NA NA https://LINK3