Home > Software engineering >  How do you arrange similar names in a particular order by group?
How do you arrange similar names in a particular order by group?

Time:03-01

I want to arrange a list of names in a particular order.

For example, I have the following df:

structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "D", 
"D", "E", "E"), order = c(1, 2, 3, 1, 2, 3, 1, 1, 2, 1, 2), name = c("Kate M. Smith", 
"Kate Marie Smith", "Kate Smith", "Ben Frederick Jones", "Ben Jones", 
"Ben F. Jones", "Charles Lane", "Renee Perez", "Renee G. Perez", 
"Henry Paul Poss", "Henry Poss")), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -11L))

   group order name               
   <chr> <dbl> <chr>              
 1 A         1 Kate M. Smith      
 2 A         2 Kate Marie Smith   
 3 A         3 Kate Smith         
 4 B         1 Ben Frederick Jones
 5 B         2 Ben Jones          
 6 B         3 Ben F. Jones       
 7 C         1 Charles Lane       
 8 D         1 Renee Perez        
 9 D         2 Renee G. Perez     
10 E         1 Henry Paul Poss    
11 E         2 Henry Poss 

I want to rearrange the order for each group to "First Name, Last Name", "First Name, Middle Initial, Last Name", and "First Name, Middle Name, Last Name". The end result would look like this:

structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "D", 
"D", "E", "E"), order = c(1, 2, 3, 1, 2, 3, 1, 1, 2, 1, 2), name = c("Kate Smith", 
"Kate M. Smith", "Kate Marie Smith", "Ben Jones", "Ben F. Jones", 
"Ben Frederick Jones", "Charles Lane", "Renee Perez", "Renee G. Perez", 
"Henry Poss", "Henry Paul Poss")), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -11L))

   group order name               
   <chr> <dbl> <chr>              
 1 A         1 Kate Smith         
 2 A         2 Kate M. Smith      
 3 A         3 Kate Marie Smith   
 4 B         1 Ben Jones          
 5 B         2 Ben F. Jones       
 6 B         3 Ben Frederick Jones
 7 C         1 Charles Lane       
 8 D         1 Renee Perez        
 9 D         2 Renee G. Perez     
10 E         1 Henry Poss         
11 E         2 Henry Paul Poss  

Notice that Group A went from:

  1. Kate M. Smith
  2. Kate Marie Smith
  3. Kate Smith

To:

  1. Kate Smith
  2. Kate M. Smith
  3. Kate Marie Smith

I've tried using arrange but it doesn't seem like it always captures the exact order.

Any guidance would be appreciated!

CodePudding user response:

We may have to do this by counting the number of words and number of characters in arrange and then change the 'order' column values by with row_number() after grouping by 'group'

library(dplyr)
library(stringr)
df %>% 
    arrange(group, str_count(name, "\\w "), nchar(name)) %>%
    group_by(group) %>%
    mutate(order = row_number()) %>%
    ungroup

-output

# A tibble: 11 × 3
   group order name               
   <chr> <int> <chr>              
 1 A         1 Kate Smith         
 2 A         2 Kate M. Smith      
 3 A         3 Kate Marie Smith   
 4 B         1 Ben Jones          
 5 B         2 Ben F. Jones       
 6 B         3 Ben Frederick Jones
 7 C         1 Charles Lane       
 8 D         1 Renee Perez        
 9 D         2 Renee G. Perez     
10 E         1 Henry Poss         
11 E         2 Henry Paul Poss    

CodePudding user response:

Ordering on the number of characters in the name string within each group should give the desired results.

Using data.table:

library(data.table)
dt <- structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "D", 
"D", "E", "E"), order = c(1, 2, 3, 1, 2, 3, 1, 1, 2, 1, 2), name = c("Kate M. Smith", 
"Kate Marie Smith", "Kate Smith", "Ben Frederick Jones", "Ben Jones", 
"Ben F. Jones", "Charles Lane", "Renee Perez", "Renee G. Perez", 
"Henry Paul Poss", "Henry Poss")), class = c("tbl_df", "tbl", 
"data.frame"), row.names = c(NA, -11L))
setDT(dt)

dt[order(group, nchar(name))]

With the result:

    group order                name
 1:     A     3          Kate Smith
 2:     A     1       Kate M. Smith
 3:     A     2    Kate Marie Smith
 4:     B     2           Ben Jones
 5:     B     3        Ben F. Jones
 6:     B     1 Ben Frederick Jones
 7:     C     1        Charles Lane
 8:     D     1         Renee Perez
 9:     D     2      Renee G. Perez
10:     E     2          Henry Poss
11:     E     1     Henry Paul Poss
  • Related