Home > Software engineering >  Reorganise Data in r
Reorganise Data in r

Time:11-24

Hi i do have a Table for example:

House,Name1,Email1@xyz.com
Flat,Name2;Name3,Email2@xyz.com;Email3@xyz.com
Mobile Home,Name4,Email4@xyz.com
Camper-Van,Name5;Name6;Name7;Name8,Email5@xyz.com;Email6@xyz.com;Email7@xyz.com;Email8@xyz.com

and i do need:

House,Name1,Email1@xyz.com
Flat,Name2,Email2@xyz.com
Flat,Name3,Email3@xyz.com
Mobile Home,Name4,Email4@xyz.com
Camper-Van,Name5,Emil5@xyz.com
Camper-Van,Name6,Email6@xyz.com
Camper-Van,Name7,Email7@xyz.com
Camper-Van,Name8,Email8@xyz.com

the problem is, the ammount of names and emails for one kind of housing is unknown. i did generate three list:

Housing:      
House
Flat
Campervan 

Names:
Name1
Name2
Name3
Name4
Name5
Name6
Name7
Name8

Email:
Email1@xyz.com
Email2@xyz.com
...
Email8@xyz.com

But i am stuck how to repeat House and Flat and Campervan as much as there are names or emails (both always excact the same ammount) for each category in Column 1. This would make all List match each other in length. If i was abled to this i could just generate the Information i do need. any Help is appreciated.

ATTENTION: names and Email adress are not the same so for example Name1 is hans his email might be [email protected] by numbering names and emails i did try to show that emails and names are kind of sorted and can not be enlistetd randomly

CodePudding user response:

library(tidyverse)

example_text <-"House,Name1,Email@1
Flat,Name2;Name3,Email@2;Email@3
Mobile Home,Name4,Email@4
Camper-Van,Name5;Name6;Name7;Name8,Email@5;Email@6;Email@7;Email@8
"
example_text %>%
  read_lines() %>%
  map(~ {
    # the first words until a delimiter
    house <- .x %>% str_extract("^[^;,] ")
    elements <- .x %>% str_remove(house) %>% str_split("[,;]") %>% simplify() %>% discard(~ .x == "")
    # Everything with an @ symbol betwwen two demiliters (, or ;)
    Emails <- elements %>% keep(~ .x %>% str_detect("@"))
    # Everything which is not one of the above
    Names <- elements %>% setdiff(Emails)
    
    tibble(
      House = house,
      Emails = Emails,
      Names = Names
    )
  }) %>%
  reduce(bind_rows)
#> # A tibble: 8 x 3
#>   House       Emails  Names
#>   <chr>       <chr>   <chr>
#> 1 House       Email@1 Name1
#> 2 Flat        Email@2 Name2
#> 3 Flat        Email@3 Name3
#> 4 Mobile Home Email@4 Name4
#> 5 Camper-Van  Email@5 Name5
#> 6 Camper-Van  Email@6 Name6
#> 7 Camper-Van  Email@7 Name7
#> 8 Camper-Van  Email@8 Name8

Created on 2021-11-24 by the reprex package (v2.0.1)

CodePudding user response:

With the data in a data.table (convert using setDT()), using data.table joins and the data.table tstrsplit() function:

library(data.table)
# Data for the demo (please provide this yourself in future questions)
dt1 <-
  data.table(type = c("House", "Flat", "Mobile", "Camper-van"),
             name = c("Name1", "Name2;Name3", "Name4", "Name5;Name6;Name7;Name8"),
             mail = c("Email1", "Email2;Email3", "Email4", "Email5;Email6;Email7;Email8"))

# solution
dt1[, c("type" = list(type), tstrsplit(name, ";"))][, melt(.SD, id.vars="type")][!is.na(value), .(.I, type, "name" = value)][
  dt1[, c("type" = list(type), tstrsplit(mail, ";"))][, melt(.SD, id.vars="type")][!is.na(value), .(.I, "mail" = value)], on="I"][, -c("I")]
  • Related