I have a dataset that contains information about patients and tumor diagnoses (similar to Table 1). Each row contains several attributes about the patient and up to seven tumor diagnoses (including morphology and topography for every tumor).
I would like to reshape the dataset so that each row contains one single tumor diagnosis. The respective patient data should be transferred along with the tumor data (Table 2).
I have already tried some codes, but since the tumor count is between one and seven [Tumor 1-7 Morpho/ Topo], I am looking for a code (R studio) that "recognizes" and rehsapes only filled data fields.
Table 1:
Animal | Breed | Sex | Tumor 1 Morpho | Tumor 1 Topo | Tumor 2 Morpho | Tumor 2 Topo |
---|---|---|---|---|---|---|
Animal A | Breed 1 | M | Tumor a | Topo I | Tumor aa | Topo Ia |
Animal B | Breed 2 | F | Tumor b | Topo II | ||
Animal C | Breed 3 | M | Tumor c | Topo III | Tumor cc | Topo IIIc |
Table 2:
Animal | Breed | Sex | Tumor Morpho | Tumor Topo |
---|---|---|---|---|
Animal A | Breed 1 | M | Tumor a | Topo I |
Animal A | Breed 1 | M | Tumor aa | Topo Ia |
Animal B | Breed 2 | F | Tumor b | Topo II |
Animal C | Breed 3 | M | Tumor c | Topo III |
Animal C | Breed 3 | M | Tumor cc | Topo IIIc |
Any help on this is highly appreciated.
CodePudding user response:
Try option A.
I've added an option B if it's necessary to retain the variable names in the data too.
library(tidyverse)
tribble(
~Animal, ~Breed, ~Sex, ~"Tumor 1 Morpho", ~"Tumor 1 Topo", ~"Tumor 2 Morpho", ~"Tumor 2 Topo",
"Animal A", "Breed 1", "M", "Tumor a", "Topo I", "Tumor aa", "Topo Ia",
"Animal B", "Breed 2", "F", "Tumor b", "Topo II", NA, NA,
"Animal C", "Breed 3", "M", "Tumor c", "Topo III", "Tumor cc", "Topo IIIc"
) |>
pivot_longer(-c(Animal:Sex), names_pattern = ".*(\\d).*", values_drop_na = TRUE) |>
separate(value, into = c("prefix", "suffix")) |>
pivot_wider(names_from = prefix, values_from = suffix) |>
select(-name)
#> # A tibble: 5 × 5
#> Animal Breed Sex Tumor Topo
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Animal A Breed 1 M a I
#> 2 Animal A Breed 1 M aa Ia
#> 3 Animal B Breed 2 F b II
#> 4 Animal C Breed 3 M c III
#> 5 Animal C Breed 3 M cc IIIc
Created on 2022-05-16 by the reprex package (v2.0.1)
library(tidyverse)
tribble(
~Animal, ~Breed, ~Sex, ~"Tumor 1 Morpho", ~"Tumor 1 Topo", ~"Tumor 2 Morpho", ~"Tumor 2 Topo",
"Animal A", "Breed 1", "M", "Tumor a", "Topo I", "Tumor aa", "Topo Ia",
"Animal B", "Breed 2", "F", "Tumor b", "Topo II", NA, NA,
"Animal C", "Breed 3", "M", "Tumor c", "Topo III", "Tumor cc", "Topo IIIc"
) |>
pivot_longer(-c(Animal:Sex), names_pattern = ".*(\\d).*", values_drop_na = TRUE) |>
separate(value, into = c("prefix", "suffix")) |>
mutate(suffix = str_c(prefix, " ", suffix)) |>
pivot_wider(names_from = prefix, values_from = suffix) |>
select(-name)
#> # A tibble: 5 × 5
#> Animal Breed Sex Tumor Topo
#> <chr> <chr> <chr> <chr> <chr>
#> 1 Animal A Breed 1 M Tumor a Topo I
#> 2 Animal A Breed 1 M Tumor aa Topo Ia
#> 3 Animal B Breed 2 F Tumor b Topo II
#> 4 Animal C Breed 3 M Tumor c Topo III
#> 5 Animal C Breed 3 M Tumor cc Topo IIIc
Created on 2022-05-16 by the reprex package (v2.0.1)
CodePudding user response:
First, split the table into 2 separate dataframes.
One with Animal, Breed, Sex and columns ending with Morpho and other with Topo
library(dplyr)
df1 <- df %>% select(Animal, Breed, Sex, ends_with("Morpho"))
df2 <- df %>% select(Animal, Breed, Sex, ends_with("Topo"))
Then, convert them from wide to long using tidyr::pivot_longer()
individually and then merge them
Sorry, away from laptop, typing this from phone.