Conditional reshaping data from wide to long format in R-CodePudding

I have a dataset that contains information about patients and tumor diagnoses (similar to Table 1). Each row contains several attributes about the patient and up to seven tumor diagnoses (including morphology and topography for every tumor).

I would like to reshape the dataset so that each row contains one single tumor diagnosis. The respective patient data should be transferred along with the tumor data (Table 2).

I have already tried some codes, but since the tumor count is between one and seven [Tumor 1-7 Morpho/ Topo], I am looking for a code (R studio) that "recognizes" and rehsapes only filled data fields.

Table 1:

Animal	Breed	Sex	Tumor 1 Morpho	Tumor 1 Topo	Tumor 2 Morpho	Tumor 2 Topo
Animal A	Breed 1	M	Tumor a	Topo I	Tumor aa	Topo Ia
Animal B	Breed 2	F	Tumor b	Topo II
Animal C	Breed 3	M	Tumor c	Topo III	Tumor cc	Topo IIIc

Table 2:

Animal	Breed	Sex	Tumor Morpho	Tumor Topo
Animal A	Breed 1	M	Tumor a	Topo I
Animal A	Breed 1	M	Tumor aa	Topo Ia
Animal B	Breed 2	F	Tumor b	Topo II
Animal C	Breed 3	M	Tumor c	Topo III
Animal C	Breed 3	M	Tumor cc	Topo IIIc

Any help on this is highly appreciated.

CodePudding user response：

Try option A.

I've added an option B if it's necessary to retain the variable names in the data too.

library(tidyverse)

tribble(
  ~Animal, ~Breed, ~Sex, ~"Tumor 1 Morpho", ~"Tumor 1 Topo", ~"Tumor 2 Morpho", ~"Tumor 2 Topo",
  "Animal A", "Breed 1", "M", "Tumor a", "Topo I", "Tumor aa", "Topo Ia",
  "Animal B", "Breed 2", "F", "Tumor b", "Topo II", NA, NA,
  "Animal C", "Breed 3", "M", "Tumor c", "Topo III", "Tumor cc", "Topo IIIc"
) |> 
  pivot_longer(-c(Animal:Sex), names_pattern = ".*(\\d).*", values_drop_na = TRUE) |> 
  separate(value, into = c("prefix", "suffix")) |> 
  pivot_wider(names_from = prefix, values_from = suffix) |> 
  select(-name)
#> # A tibble: 5 × 5
#>   Animal   Breed   Sex   Tumor Topo 
#>   <chr>    <chr>   <chr> <chr> <chr>
#> 1 Animal A Breed 1 M     a     I    
#> 2 Animal A Breed 1 M     aa    Ia   
#> 3 Animal B Breed 2 F     b     II   
#> 4 Animal C Breed 3 M     c     III  
#> 5 Animal C Breed 3 M     cc    IIIc

^{Created on 2022-05-16 by the reprex package (v2.0.1)}

library(tidyverse)

tribble(
  ~Animal, ~Breed, ~Sex, ~"Tumor 1 Morpho", ~"Tumor 1 Topo", ~"Tumor 2 Morpho", ~"Tumor 2 Topo",
  "Animal A", "Breed 1", "M", "Tumor a", "Topo I", "Tumor aa", "Topo Ia",
  "Animal B", "Breed 2", "F", "Tumor b", "Topo II", NA, NA,
  "Animal C", "Breed 3", "M", "Tumor c", "Topo III", "Tumor cc", "Topo IIIc"
) |> 
  pivot_longer(-c(Animal:Sex), names_pattern = ".*(\\d).*", values_drop_na = TRUE) |> 
  separate(value, into = c("prefix", "suffix")) |> 
  mutate(suffix = str_c(prefix, " ", suffix)) |> 
  pivot_wider(names_from = prefix, values_from = suffix) |> 
  select(-name)
#> # A tibble: 5 × 5
#>   Animal   Breed   Sex   Tumor    Topo     
#>   <chr>    <chr>   <chr> <chr>    <chr>    
#> 1 Animal A Breed 1 M     Tumor a  Topo I   
#> 2 Animal A Breed 1 M     Tumor aa Topo Ia  
#> 3 Animal B Breed 2 F     Tumor b  Topo II  
#> 4 Animal C Breed 3 M     Tumor c  Topo III 
#> 5 Animal C Breed 3 M     Tumor cc Topo IIIc

^{Created on 2022-05-16 by the reprex package (v2.0.1)}

CodePudding user response：

First, split the table into 2 separate dataframes.

One with Animal, Breed, Sex and columns ending with Morpho and other with Topo

library(dplyr)

df1 <- df %>% select(Animal, Breed, Sex, ends_with("Morpho"))
df2 <-  df %>% select(Animal, Breed, Sex, ends_with("Topo"))

Then, convert them from wide to long using tidyr::pivot_longer() individually and then merge them

Sorry, away from laptop, typing this from phone.