I have downloaded a CSV file that looks rather messy named Henry
and shows the following content:
Name;Number;Year;Drink;Country
Henry;252;2020;Fanta;France
Name of persen;Part of a larger test;Food drink type; Value of gear type, Phonetype;Country of fabrication
From period;Period from time;Country;Latitud;Longitud
2020-02-01 00:00:00;France;142.63;45.5232;19.5653
Date;Time;Food;M/F
2020-08-15;09:00:00;Hotdog;M
2020-08-15;10:00:00;Hamburger;M
2020-08-15;11:00:00;Hamburger;F
2020-08-15;12:00:00;Hotdog;M
2020-08-15;13:00:00;Hamburger;F
2020-08-15;14:00:00;Hotdog;M
2020-08-15;15:00:00;Hamburger;F
#With 250 more rows
Removed to make it easier to help:
Down below is a dataset ## 4
I want to clean it and extract the following tibble. So taking the firs two columns Name/Number, then remove the rest of the content until the data and then remove M/F column. Like this:
Name Number Date Time Food
Henry 252 2020-08-15 09:00:00 Hotdog
Henry 252 2020-08-15 10:00:00 Hamburger
Henry 252 2020-08-15 11:00:00 Hamburger
Henry 252 2020-08-15 12:00:00 Hotdog
Henry 252 2020-08-15 13:00:00 Hamburger
#With 252 more rows
I suppose it has to do with the delim
function?
CodePudding user response:
Here is a version how we could form a data frame after reading in with read_delim
. This is hard coded for this example only. The problem is that we do not have the full dataset:
library(tidyverse)
test <- read_delim("test.csv", delim = ";",
escape_double = FALSE, trim_ws = TRUE)
test %>%
slice(7:n()) %>%
mutate(Date = Name,
Time = Number,
Food = Year,
Name = NA_character_,
Number= NA_character_) %>%
select(-c(Year, Drink, Country)) %>%
{. ->>intermediateResult} %>% # this saves intermediate
bind_rows(test) %>%
select(Name, Number, Date, Time, Food) %>%
filter(row_number()==9) %>%
bind_rows(intermediateResult) %>%
fill(c(Name, Number), .direction = "down") %>%
na.omit()
Name Number Date Time Food
<chr> <chr> <chr> <chr> <chr>
1 Henry 252 2020-08-15 09:00:00 Hotdog
2 Henry 252 2020-08-15 10:00:00 Hamburger
3 Henry 252 2020-08-15 11:00:00 Hamburger
4 Henry 252 2020-08-15 12:00:00 Hotdog
5 Henry 252 2020-08-15 13:00:00 Hamburger
6 Henry 252 2020-08-15 14:00:00 Hotdog
7 Henry 252 2020-08-15 15:00:00 Hamburger