Home > OS >  R - Moving misplaced values in correct column
R - Moving misplaced values in correct column

Time:07-13

Beginner here, I have a large dataframe with multiple columns in which some values are misplaced but at least have the right column name in front of the value. Imagine a dataframe like this:

Country <- c("Spain", "Time:16 Mar 2018 - 23 Apr 2018", "USA")
Platform <- c("Twitter", "Country:Germany", "Cap:200")
Start_Time <- c("10 Jun 2018 - 2 Jul 2018", "Platform:Facebook", "Platform:Instagram")
Cap <- c("300", "500", "Time:10 Jun 2018 - 2 Jul 2018")

dat <- data.frame(Country, Platform, Start_Time, Cap) 
Output:
Country                          Platform          Start_Time                       Cap

Spain                            Twitter           10 Jun 2018 - 2 Jul 2018         300
Time:16 Mar 2018 - 23 Apr 2018   Country:Germany   Platform:Facebook                500
USA                              Cap:200           Platform:Instagram               Time:10 Jun 2018 - 2 Jul 2018

As you can see, if the values are misplaced, the correct column name (or at least an indication like with Start_Time and Time:) is set in front of the value.

How can I switch the values into their respective columns? My original dataframe has 729 rows and 45 columns, so the less manual work, the better. The correct output should look like this:

Output:
Country    Platform          Start_Time                       Cap

Spain      Twitter           10 Jun 2018 - 2 Jul 2018         300
Germany    Facebook          16 Mar 2018 - 23 Apr 2018        500
USA        Instagram         10 Jun 2018 - 2 Jul 2018         200

Thank you very much.

CodePudding user response:

library(tidyverse)

# Supplied sample data
Country <- c("Spain", "Time:16 Mar 2018 - 23 Apr 2018", "USA")
Platform <- c("Twitter", "Country:Germany", "Cap:200")
Start_Time <- c("10 Jun 2018 - 2 Jul 2018", "Platform:Facebook", "Platform:Instagram")
Cap <- c("300", "500", "Time:10 Jun 2018 - 2 Jul 2018")

dat <- data.frame(Country, Platform, Start_Time, Cap) 

# Code
dat2 <- dat |> 
  pivot_longer(everything()) |> 
  separate(value, into = c("name2", "value"), sep = ":") |> 
  mutate(name = if_else(!is.na(value), name2, name),
         value = if_else(is.na(value), name2, value),
         name = if_else(str_detect(name, "Time"), "Start_Time", name)) |> 
  select(-name2) |> 
  mutate(group = rep(1:nrow(dat), each = 4)) |> 
  pivot_wider(names_from = "name", values_from = "value")

dat2
#> # A tibble: 3 × 5
#>   group Country Platform  Start_Time                Cap  
#>   <int> <chr>   <chr>     <chr>                     <chr>
#> 1     1 Spain   Twitter   10 Jun 2018 - 2 Jul 2018  300  
#> 2     2 Germany Facebook  16 Mar 2018 - 23 Apr 2018 500  
#> 3     3 USA     Instagram 10 Jun 2018 - 2 Jul 2018  200

Created on 2022-07-13 by the reprex package (v2.0.1)

CodePudding user response:

Ok, I do not know if you are familiarized with tidyverse and dplyr functions, so I used classic functions in my solution:

# Input
Country <- c("Spain", "Time:16 Mar 2018 - 23 Apr 2018", "USA")
Platform <- c("Twitter", "Country:Germany", "Cap:200")
Start_Time <- c("10 Jun 2018 - 2 Jul 2018", "Platform:Facebook", "Platform:Instagram")
Cap <- c("300", "500", "Time:10 Jun 2018 - 2 Jul 2018")

dat <- data.frame(Country, Platform, Start_Time, Cap) 

# Define a vector with (columns') categories
groupNames <- c("country", "platform", "time", "cap")

# Define a function for correcting missplacing
correcting <- function(x, groupNames){
  
  # Coerce input vector as character
  x <- as.character(x)
  
  # Find the missplaced positions
  index <- do.call(c, sapply(groupNames, grep, x = x, ignore.case = TRUE))
  
  # If there is any missplaced value...
  if(length(index) > 0){
    # Correct misplacing and remove text used as clue
    x[match(names(index), groupNames)] <- gsub(x = x[index], 
                                               pattern = "^[[:alpha:]]{1,}:", 
                                               replacement = "")
  }
  
  return(x)
}

# Apply function by row, transposing and coerce output as data frame
out <- as.data.frame(t(apply(dat, 1, correcting, groupNames = groupNames)))

# Replace names of columns
colnames(out) <- colnames(dat)

# Coerce cap as numeric
out$Cap <- as.numeric(out$Cap)
  • Related