I have data that has information about each visit. Data is ordered in terms of each aspect, not each visit. I want to put together the columns including each visit information together. I provided sample data below.
I obtained what I desired somehow but, I want to see if there is a tidier way to get the same result. Here is what I tried.
library(tidyverse)
col_of_interest <- c("Total", "InstantDate", "VisitType")
col_orders <- paste0(col_of_interest, "_", rep(c(1:11), each = 3))
data %>%
select(all_of(col_orders))
Here is sample data. You can ignore the "NULL"
s and NA
s in the data since I will deal with "NULL"
somehow. NA
s says that some people have up to 11 visit information but, some don't. Therefore, there are NA
s.
data <- structure(list(Total_1 = c("NULL", "NULL", "NULL", "NULL", "NULL",
"NULL"), Total_2 = c("17", "5", "3", "13", "NULL", "0"), Total_3 = c("15",
"NULL", NA, "2", "6", NA), Total_4 = c("9", NA, NA, "8", NA,
NA), Total_5 = c("15", NA, NA, "14", NA, NA), Total_6 = c("NULL",
NA, NA, NA, NA, NA), Total_7 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_),
Total_8 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), Total_9 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), Total_10 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), Total_11 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), InstantDate_1 = structure(c(18327,
18330, 18332, 18332, 18332, 18333), class = "Date"), InstantDate_2 = structure(c(18673,
18858, 18794, 18527, 18516, 18533), class = "Date"), InstantDate_3 = structure(c(18703,
19044, NA, 18673, 18726, NA), class = "Date"), InstantDate_4 = structure(c(18786,
NA, NA, 18905, NA, NA), class = "Date"), InstantDate_5 = structure(c(18855,
NA, NA, 19006, NA, NA), class = "Date"), InstantDate_6 = structure(c(19229,
NA, NA, NA, NA, NA), class = "Date"), InstantDate_7 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"),
InstantDate_8 = structure(c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), class = "Date"), InstantDate_9 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"),
InstantDate_10 = structure(c(NA_real_, NA_real_, NA_real_,
NA_real_, NA_real_, NA_real_), class = "Date"), InstantDate_11 = structure(c(NA_real_,
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"),
VisitType_1 = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL"
), VisitType_2 = c("FOLLOW UP", "FOLLOW UP", "VIRTUAL VISIT",
"OFFICE VISIT", "NULL", "VIRTUAL VISIT"), VisitType_3 = c("FOLLOW UP",
"FOLLOW UP", NA, "VIRTUAL VISIT", "VIRTUAL VISIT", NA), VisitType_4 = c("FOLLOW UP",
NA, NA, "VIRTUAL VISIT", NA, NA), VisitType_5 = c("FOLLOW UP",
NA, NA, "FOLLOW UP", NA, NA), VisitType_6 = c("FOLLOW UP",
NA, NA, NA, NA, NA), VisitType_7 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), VisitType_8 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_), VisitType_9 = c(NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_,
NA_character_), VisitType_10 = c(NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_, NA_character_
), VisitType_11 = c(NA_character_, NA_character_, NA_character_,
NA_character_, NA_character_, NA_character_)), row.names = c(NA,
-6L), class = c("tbl_df", "tbl", "data.frame"))
CodePudding user response:
We may order on the numeric part of the column names and select
library(dplyr)
data %>%
select(order(readr::parse_number(names(.))))
-output
# A tibble: 6 × 33
Total_1 InstantDate_1 VisitTy…¹ Total_2 InstantD…² Visit…³ Total_3 InstantD…⁴ Visit…⁵ Total_4 InstantD…⁶ Visit…⁷ Total_5 InstantD…⁸ Visit…⁹
<chr> <date> <chr> <chr> <date> <chr> <chr> <date> <chr> <chr> <date> <chr> <chr> <date> <chr>
1 NULL 2020-03-06 NULL 17 2021-02-15 FOLLOW… 15 2021-03-17 FOLLOW… 9 2021-06-08 FOLLOW… 15 2021-08-16 FOLLOW…
2 NULL 2020-03-09 NULL 5 2021-08-19 FOLLOW… NULL 2022-02-21 FOLLOW… <NA> NA <NA> <NA> NA <NA>
3 NULL 2020-03-11 NULL 3 2021-06-16 VIRTUA… <NA> NA <NA> <NA> NA <NA> <NA> NA <NA>
4 NULL 2020-03-11 NULL 13 2020-09-22 OFFICE… 2 2021-02-15 VIRTUA… 8 2021-10-05 VIRTUA… 14 2022-01-14 FOLLOW…
5 NULL 2020-03-11 NULL NULL 2020-09-11 NULL 6 2021-04-09 VIRTUA… <NA> NA <NA> <NA> NA <NA>
6 NULL 2020-03-12 NULL 0 2020-09-28 VIRTUA… <NA> NA <NA> <NA> NA <NA> <NA> NA <NA>
# … with 18 more variables: Total_6 <chr>, InstantDate_6 <date>, VisitType_6 <chr>, Total_7 <chr>, InstantDate_7 <date>, VisitType_7 <chr>,
# Total_8 <chr>, InstantDate_8 <date>, VisitType_8 <chr>, Total_9 <chr>, InstantDate_9 <date>, VisitType_9 <chr>, Total_10 <chr>,
# InstantDate_10 <date>, VisitType_10 <chr>, Total_11 <chr>, InstantDate_11 <date>, VisitType_11 <chr>, and abbreviated variable names
# ¹VisitType_1, ²InstantDate_2, ³VisitType_2, ⁴InstantDate_3, ⁵VisitType_3, ⁶InstantDate_4, ⁷VisitType_4, ⁸InstantDate_5, ⁹VisitType_5
CodePudding user response:
Here's an approach building on @r2evans (since deleted) answer, specifically their use of strcapture()
. It will group columns together by their suffix number, with columns sorted within suffixes by their first appearance in the dataset, without having to hard code column prefixes or suffixes.
col_orders <- strcapture("(.*)_(.*)", names(data), list(nm="", num=0L))
col_orders <- col_orders[order(col_orders$num),]
col_orders <- with(col_orders, paste(nm, num, sep = "_"))
data[col_orders]
# A tibble: 6 × 33
Total_1 InstantD…¹ Visit…² Total_2 InstantD…³ Visit…⁴ Total_3 InstantD…⁵ Visit…⁶
<chr> <date> <chr> <chr> <date> <chr> <chr> <date> <chr>
1 NULL 2020-03-06 NULL 17 2021-02-15 FOLLOW… 15 2021-03-17 FOLLOW…
2 NULL 2020-03-09 NULL 5 2021-08-19 FOLLOW… NULL 2022-02-21 FOLLOW…
3 NULL 2020-03-11 NULL 3 2021-06-16 VIRTUA… NA NA NA
4 NULL 2020-03-11 NULL 13 2020-09-22 OFFICE… 2 2021-02-15 VIRTUA…
5 NULL 2020-03-11 NULL NULL 2020-09-11 NULL 6 2021-04-09 VIRTUA…
6 NULL 2020-03-12 NULL 0 2020-09-28 VIRTUA… NA NA NA
# … with 24 more variables: Total_4 <chr>, InstantDate_4 <date>,
# VisitType_4 <chr>, Total_5 <chr>, InstantDate_5 <date>, VisitType_5 <chr>,
# Total_6 <chr>, InstantDate_6 <date>, VisitType_6 <chr>, Total_7 <chr>,
# InstantDate_7 <date>, VisitType_7 <chr>, Total_8 <chr>, InstantDate_8 <date>,
# VisitType_8 <chr>, Total_9 <chr>, InstantDate_9 <date>, VisitType_9 <chr>,
# Total_10 <chr>, InstantDate_10 <date>, VisitType_10 <chr>, Total_11 <chr>,
# InstantDate_11 <date>, VisitType_11 <chr>, and abbreviated variable names …
CodePudding user response:
We could do it this way:
Notice: values_transform = list(val = as.character))
library(tidyverse)
data %>%
pivot_longer(everything(),
names_to = "key",
values_to = "val",
values_transform = list(val = as.character)) %>%
arrange(parse_number(key)) %>%
pivot_wider(names_from = key, values_from = val, values_fn = list) %>%
unnest(everything())
Total_1 InstantDate_1 Visit…¹ Total_2 Insta…² Visit…³ Total_3 Insta…⁴ Visit…⁵ Total_4 Insta…⁶ Visit…⁷ Total_5
<chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr> <chr>
1 NULL 2020-03-06 NULL 17 2021-0… FOLLOW… 15 2021-0… FOLLOW… 9 2021-0… FOLLOW… 15
2 NULL 2020-03-09 NULL 5 2021-0… FOLLOW… NULL 2022-0… FOLLOW… NA NA NA NA
3 NULL 2020-03-11 NULL 3 2021-0… VIRTUA… NA NA NA NA NA NA NA
4 NULL 2020-03-11 NULL 13 2020-0… OFFICE… 2 2021-0… VIRTUA… 8 2021-1… VIRTUA… 14
5 NULL 2020-03-11 NULL NULL 2020-0… NULL 6 2021-0… VIRTUA… NA NA NA NA
6 NULL 2020-03-12 NULL 0 2020-0… VIRTUA… NA NA NA NA NA NA NA
# … with 20 more variables: InstantDate_5 <chr>, VisitType_5 <chr>, Total_6 <chr>, InstantDate_6 <chr>,
# VisitType_6 <chr>, Total_7 <chr>, InstantDate_7 <chr>, VisitType_7 <chr>, Total_8 <chr>,
# InstantDate_8 <chr>, VisitType_8 <chr>, Total_9 <chr>, InstantDate_9 <chr>, VisitType_9 <chr>,
# Total_10 <chr>, InstantDate_10 <chr>, VisitType_10 <chr>, Total_11 <chr>, InstantDate_11 <chr>,
# VisitType_11 <chr>, and abbreviated variable names ¹VisitType_1, ²InstantDate_2, ³VisitType_2,
# ⁴InstantDate_3, ⁵VisitType_3, ⁶InstantDate_4, ⁷VisitType_4
# ℹ Use `colnames()` to see all variable names