How to reorder columns systematically in r?-CodePudding

I have data that has information about each visit. Data is ordered in terms of each aspect, not each visit. I want to put together the columns including each visit information together. I provided sample data below.

I obtained what I desired somehow but, I want to see if there is a tidier way to get the same result. Here is what I tried.

library(tidyverse)

col_of_interest <- c("Total", "InstantDate", "VisitType")
col_orders <- paste0(col_of_interest, "_", rep(c(1:11), each = 3))

data %>%
    select(all_of(col_orders))

Here is sample data. You can ignore the "NULL"s and NAs in the data since I will deal with "NULL" somehow. NAs says that some people have up to 11 visit information but, some don't. Therefore, there are NAs.

data <- structure(list(Total_1 = c("NULL", "NULL", "NULL", "NULL", "NULL", 
"NULL"), Total_2 = c("17", "5", "3", "13", "NULL", "0"), Total_3 = c("15", 
"NULL", NA, "2", "6", NA), Total_4 = c("9", NA, NA, "8", NA, 
NA), Total_5 = c("15", NA, NA, "14", NA, NA), Total_6 = c("NULL", 
NA, NA, NA, NA, NA), Total_7 = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_), 
Total_8 = c(NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_), Total_9 = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_), Total_10 = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_
), Total_11 = c(NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_), InstantDate_1 = structure(c(18327, 
18330, 18332, 18332, 18332, 18333), class = "Date"), InstantDate_2 = structure(c(18673, 
18858, 18794, 18527, 18516, 18533), class = "Date"), InstantDate_3 = structure(c(18703, 
19044, NA, 18673, 18726, NA), class = "Date"), InstantDate_4 = structure(c(18786, 
NA, NA, 18905, NA, NA), class = "Date"), InstantDate_5 = structure(c(18855, 
NA, NA, 19006, NA, NA), class = "Date"), InstantDate_6 = structure(c(19229, 
NA, NA, NA, NA, NA), class = "Date"), InstantDate_7 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), 
InstantDate_8 = structure(c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), class = "Date"), InstantDate_9 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), 
InstantDate_10 = structure(c(NA_real_, NA_real_, NA_real_, 
NA_real_, NA_real_, NA_real_), class = "Date"), InstantDate_11 = structure(c(NA_real_, 
NA_real_, NA_real_, NA_real_, NA_real_, NA_real_), class = "Date"), 
VisitType_1 = c("NULL", "NULL", "NULL", "NULL", "NULL", "NULL"
), VisitType_2 = c("FOLLOW UP", "FOLLOW UP", "VIRTUAL VISIT", 
"OFFICE VISIT", "NULL", "VIRTUAL VISIT"), VisitType_3 = c("FOLLOW UP", 
"FOLLOW UP", NA, "VIRTUAL VISIT", "VIRTUAL VISIT", NA), VisitType_4 = c("FOLLOW UP", 
NA, NA, "VIRTUAL VISIT", NA, NA), VisitType_5 = c("FOLLOW UP", 
NA, NA, "FOLLOW UP", NA, NA), VisitType_6 = c("FOLLOW UP", 
NA, NA, NA, NA, NA), VisitType_7 = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_
), VisitType_8 = c(NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_), VisitType_9 = c(NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_, 
NA_character_), VisitType_10 = c(NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_, NA_character_
), VisitType_11 = c(NA_character_, NA_character_, NA_character_, 
NA_character_, NA_character_, NA_character_)), row.names = c(NA, 
-6L), class = c("tbl_df", "tbl", "data.frame"))

CodePudding user response：

We may order on the numeric part of the column names and select

library(dplyr)
data %>%
   select(order(readr::parse_number(names(.))))

-output

# A tibble: 6 × 33
  Total_1 InstantDate_1 VisitTy…¹ Total_2 InstantD…² Visit…³ Total_3 InstantD…⁴ Visit…⁵ Total_4 InstantD…⁶ Visit…⁷ Total_5 InstantD…⁸ Visit…⁹
  <chr>   <date>        <chr>     <chr>   <date>     <chr>   <chr>   <date>     <chr>   <chr>   <date>     <chr>   <chr>   <date>     <chr>  
1 NULL    2020-03-06    NULL      17      2021-02-15 FOLLOW… 15      2021-03-17 FOLLOW… 9       2021-06-08 FOLLOW… 15      2021-08-16 FOLLOW…
2 NULL    2020-03-09    NULL      5       2021-08-19 FOLLOW… NULL    2022-02-21 FOLLOW… <NA>    NA         <NA>    <NA>    NA         <NA>   
3 NULL    2020-03-11    NULL      3       2021-06-16 VIRTUA… <NA>    NA         <NA>    <NA>    NA         <NA>    <NA>    NA         <NA>   
4 NULL    2020-03-11    NULL      13      2020-09-22 OFFICE… 2       2021-02-15 VIRTUA… 8       2021-10-05 VIRTUA… 14      2022-01-14 FOLLOW…
5 NULL    2020-03-11    NULL      NULL    2020-09-11 NULL    6       2021-04-09 VIRTUA… <NA>    NA         <NA>    <NA>    NA         <NA>   
6 NULL    2020-03-12    NULL      0       2020-09-28 VIRTUA… <NA>    NA         <NA>    <NA>    NA         <NA>    <NA>    NA         <NA>   
# … with 18 more variables: Total_6 <chr>, InstantDate_6 <date>, VisitType_6 <chr>, Total_7 <chr>, InstantDate_7 <date>, VisitType_7 <chr>,
#   Total_8 <chr>, InstantDate_8 <date>, VisitType_8 <chr>, Total_9 <chr>, InstantDate_9 <date>, VisitType_9 <chr>, Total_10 <chr>,
#   InstantDate_10 <date>, VisitType_10 <chr>, Total_11 <chr>, InstantDate_11 <date>, VisitType_11 <chr>, and abbreviated variable names
#   ¹VisitType_1, ²InstantDate_2, ³VisitType_2, ⁴InstantDate_3, ⁵VisitType_3, ⁶InstantDate_4, ⁷VisitType_4, ⁸InstantDate_5, ⁹VisitType_5

CodePudding user response：

Here's an approach building on @r2evans (since deleted) answer, specifically their use of strcapture(). It will group columns together by their suffix number, with columns sorted within suffixes by their first appearance in the dataset, without having to hard code column prefixes or suffixes.

col_orders <- strcapture("(.*)_(.*)", names(data), list(nm="", num=0L))
col_orders <- col_orders[order(col_orders$num),]
col_orders <- with(col_orders, paste(nm, num, sep = "_"))
data[col_orders]

# A tibble: 6 × 33
  Total_1 InstantD…¹ Visit…² Total_2 InstantD…³ Visit…⁴ Total_3 InstantD…⁵ Visit…⁶
  <chr>   <date>     <chr>   <chr>   <date>     <chr>   <chr>   <date>     <chr>  
1 NULL    2020-03-06 NULL    17      2021-02-15 FOLLOW… 15      2021-03-17 FOLLOW…
2 NULL    2020-03-09 NULL    5       2021-08-19 FOLLOW… NULL    2022-02-21 FOLLOW…
3 NULL    2020-03-11 NULL    3       2021-06-16 VIRTUA… NA      NA         NA     
4 NULL    2020-03-11 NULL    13      2020-09-22 OFFICE… 2       2021-02-15 VIRTUA…
5 NULL    2020-03-11 NULL    NULL    2020-09-11 NULL    6       2021-04-09 VIRTUA…
6 NULL    2020-03-12 NULL    0       2020-09-28 VIRTUA… NA      NA         NA     
# … with 24 more variables: Total_4 <chr>, InstantDate_4 <date>,
#   VisitType_4 <chr>, Total_5 <chr>, InstantDate_5 <date>, VisitType_5 <chr>,
#   Total_6 <chr>, InstantDate_6 <date>, VisitType_6 <chr>, Total_7 <chr>,
#   InstantDate_7 <date>, VisitType_7 <chr>, Total_8 <chr>, InstantDate_8 <date>,
#   VisitType_8 <chr>, Total_9 <chr>, InstantDate_9 <date>, VisitType_9 <chr>,
#   Total_10 <chr>, InstantDate_10 <date>, VisitType_10 <chr>, Total_11 <chr>,
#   InstantDate_11 <date>, VisitType_11 <chr>, and abbreviated variable names …

CodePudding user response：

We could do it this way:

Notice: values_transform = list(val = as.character))

library(tidyverse)

data %>% 
  pivot_longer(everything(),
               names_to = "key",
               values_to = "val", 
               values_transform = list(val = as.character)) %>% 
  arrange(parse_number(key)) %>% 
  pivot_wider(names_from = key, values_from = val, values_fn = list) %>% 
  unnest(everything())

Total_1 InstantDate_1 Visit…¹ Total_2 Insta…² Visit…³ Total_3 Insta…⁴ Visit…⁵ Total_4 Insta…⁶ Visit…⁷ Total_5
  <chr>   <chr>         <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>   <chr>  
1 NULL    2020-03-06    NULL    17      2021-0… FOLLOW… 15      2021-0… FOLLOW… 9       2021-0… FOLLOW… 15     
2 NULL    2020-03-09    NULL    5       2021-0… FOLLOW… NULL    2022-0… FOLLOW… NA      NA      NA      NA     
3 NULL    2020-03-11    NULL    3       2021-0… VIRTUA… NA      NA      NA      NA      NA      NA      NA     
4 NULL    2020-03-11    NULL    13      2020-0… OFFICE… 2       2021-0… VIRTUA… 8       2021-1… VIRTUA… 14     
5 NULL    2020-03-11    NULL    NULL    2020-0… NULL    6       2021-0… VIRTUA… NA      NA      NA      NA     
6 NULL    2020-03-12    NULL    0       2020-0… VIRTUA… NA      NA      NA      NA      NA      NA      NA     
# … with 20 more variables: InstantDate_5 <chr>, VisitType_5 <chr>, Total_6 <chr>, InstantDate_6 <chr>,
#   VisitType_6 <chr>, Total_7 <chr>, InstantDate_7 <chr>, VisitType_7 <chr>, Total_8 <chr>,
#   InstantDate_8 <chr>, VisitType_8 <chr>, Total_9 <chr>, InstantDate_9 <chr>, VisitType_9 <chr>,
#   Total_10 <chr>, InstantDate_10 <chr>, VisitType_10 <chr>, Total_11 <chr>, InstantDate_11 <chr>,
#   VisitType_11 <chr>, and abbreviated variable names ¹VisitType_1, ²InstantDate_2, ³VisitType_2,
#   ⁴InstantDate_3, ⁵VisitType_3, ⁶InstantDate_4, ⁷VisitType_4
# ℹ Use `colnames()` to see all variable names