Beginner here, I have a large dataframe with multiple columns in which some values are misplaced but at least have the right column name in front of the value. Imagine a dataframe like this:
Country <- c("Spain", "Time:16 Mar 2018 - 23 Apr 2018", "USA")
Platform <- c("Twitter", "Country:Germany", "Cap:200")
Start_Time <- c("10 Jun 2018 - 2 Jul 2018", "Platform:Facebook", "Platform:Instagram")
Cap <- c("300", "500", "Time:10 Jun 2018 - 2 Jul 2018")
dat <- data.frame(Country, Platform, Start_Time, Cap)
Output:
Country Platform Start_Time Cap
Spain Twitter 10 Jun 2018 - 2 Jul 2018 300
Time:16 Mar 2018 - 23 Apr 2018 Country:Germany Platform:Facebook 500
USA Cap:200 Platform:Instagram Time:10 Jun 2018 - 2 Jul 2018
As you can see, if the values are misplaced, the correct column name (or at least an indication like with Start_Time and Time:) is set in front of the value.
How can I switch the values into their respective columns? My original dataframe has 729 rows and 45 columns, so the less manual work, the better. The correct output should look like this:
Output:
Country Platform Start_Time Cap
Spain Twitter 10 Jun 2018 - 2 Jul 2018 300
Germany Facebook 16 Mar 2018 - 23 Apr 2018 500
USA Instagram 10 Jun 2018 - 2 Jul 2018 200
Thank you very much.
CodePudding user response:
library(tidyverse)
# Supplied sample data
Country <- c("Spain", "Time:16 Mar 2018 - 23 Apr 2018", "USA")
Platform <- c("Twitter", "Country:Germany", "Cap:200")
Start_Time <- c("10 Jun 2018 - 2 Jul 2018", "Platform:Facebook", "Platform:Instagram")
Cap <- c("300", "500", "Time:10 Jun 2018 - 2 Jul 2018")
dat <- data.frame(Country, Platform, Start_Time, Cap)
# Code
dat2 <- dat |>
pivot_longer(everything()) |>
separate(value, into = c("name2", "value"), sep = ":") |>
mutate(name = if_else(!is.na(value), name2, name),
value = if_else(is.na(value), name2, value),
name = if_else(str_detect(name, "Time"), "Start_Time", name)) |>
select(-name2) |>
mutate(group = rep(1:nrow(dat), each = 4)) |>
pivot_wider(names_from = "name", values_from = "value")
dat2
#> # A tibble: 3 × 5
#> group Country Platform Start_Time Cap
#> <int> <chr> <chr> <chr> <chr>
#> 1 1 Spain Twitter 10 Jun 2018 - 2 Jul 2018 300
#> 2 2 Germany Facebook 16 Mar 2018 - 23 Apr 2018 500
#> 3 3 USA Instagram 10 Jun 2018 - 2 Jul 2018 200
Created on 2022-07-13 by the reprex package (v2.0.1)
CodePudding user response:
Ok, I do not know if you are familiarized with tidyverse and dplyr functions, so I used classic functions in my solution:
# Input
Country <- c("Spain", "Time:16 Mar 2018 - 23 Apr 2018", "USA")
Platform <- c("Twitter", "Country:Germany", "Cap:200")
Start_Time <- c("10 Jun 2018 - 2 Jul 2018", "Platform:Facebook", "Platform:Instagram")
Cap <- c("300", "500", "Time:10 Jun 2018 - 2 Jul 2018")
dat <- data.frame(Country, Platform, Start_Time, Cap)
# Define a vector with (columns') categories
groupNames <- c("country", "platform", "time", "cap")
# Define a function for correcting missplacing
correcting <- function(x, groupNames){
# Coerce input vector as character
x <- as.character(x)
# Find the missplaced positions
index <- do.call(c, sapply(groupNames, grep, x = x, ignore.case = TRUE))
# If there is any missplaced value...
if(length(index) > 0){
# Correct misplacing and remove text used as clue
x[match(names(index), groupNames)] <- gsub(x = x[index],
pattern = "^[[:alpha:]]{1,}:",
replacement = "")
}
return(x)
}
# Apply function by row, transposing and coerce output as data frame
out <- as.data.frame(t(apply(dat, 1, correcting, groupNames = groupNames)))
# Replace names of columns
colnames(out) <- colnames(dat)
# Coerce cap as numeric
out$Cap <- as.numeric(out$Cap)