Home > Enterprise >  Clean content in R to desired tibble
Clean content in R to desired tibble

Time:07-30

I have downloaded a CSV file that looks rather messy named Henry and shows the following content:

Name;Number;Year;Drink;Country
Henry;252;2020;Fanta;France

Name of persen;Part of a larger test;Food drink type; Value of gear type, Phonetype;Country of fabrication

From period;Period from time;Country;Latitud;Longitud
2020-02-01 00:00:00;France;142.63;45.5232;19.5653

Date;Time;Food;M/F
2020-08-15;09:00:00;Hotdog;M
2020-08-15;10:00:00;Hamburger;M
2020-08-15;11:00:00;Hamburger;F
2020-08-15;12:00:00;Hotdog;M
2020-08-15;13:00:00;Hamburger;F
2020-08-15;14:00:00;Hotdog;M
2020-08-15;15:00:00;Hamburger;F
#With 250 more rows

Removed to make it easier to help: Down below is a dataset ## 4

I want to clean it and extract the following tibble. So taking the firs two columns Name/Number, then remove the rest of the content until the data and then remove M/F column. Like this:

Name Number  Date                   Time           Food
Henry   252      2020-08-15     09:00:00     Hotdog
Henry   252      2020-08-15     10:00:00     Hamburger
Henry   252      2020-08-15     11:00:00     Hamburger
Henry   252      2020-08-15     12:00:00     Hotdog
Henry   252      2020-08-15     13:00:00     Hamburger
#With 252 more rows

I suppose it has to do with the delim function?

CodePudding user response:

Here is a version how we could form a data frame after reading in with read_delim. This is hard coded for this example only. The problem is that we do not have the full dataset:

library(tidyverse)

test <- read_delim("test.csv", delim = ";", 
                   escape_double = FALSE, trim_ws = TRUE)

test %>% 
  slice(7:n()) %>% 
  mutate(Date = Name, 
         Time = Number,
         Food = Year,
         Name = NA_character_, 
         Number= NA_character_) %>% 
  select(-c(Year, Drink, Country)) %>% 
  {. ->>intermediateResult} %>%  # this saves intermediate 
  bind_rows(test) %>% 
  select(Name, Number, Date, Time, Food) %>% 
  filter(row_number()==9) %>% 
  bind_rows(intermediateResult) %>% 
  fill(c(Name, Number), .direction = "down") %>% 
  na.omit()
  Name  Number Date       Time     Food     
  <chr> <chr>  <chr>      <chr>    <chr>    
1 Henry 252    2020-08-15 09:00:00 Hotdog   
2 Henry 252    2020-08-15 10:00:00 Hamburger
3 Henry 252    2020-08-15 11:00:00 Hamburger
4 Henry 252    2020-08-15 12:00:00 Hotdog   
5 Henry 252    2020-08-15 13:00:00 Hamburger
6 Henry 252    2020-08-15 14:00:00 Hotdog   
7 Henry 252    2020-08-15 15:00:00 Hamburger
  •  Tags:  
  • r
  • Related