I want to arrange a list of names in a particular order.
For example, I have the following df:
structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "D",
"D", "E", "E"), order = c(1, 2, 3, 1, 2, 3, 1, 1, 2, 1, 2), name = c("Kate M. Smith",
"Kate Marie Smith", "Kate Smith", "Ben Frederick Jones", "Ben Jones",
"Ben F. Jones", "Charles Lane", "Renee Perez", "Renee G. Perez",
"Henry Paul Poss", "Henry Poss")), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -11L))
group order name
<chr> <dbl> <chr>
1 A 1 Kate M. Smith
2 A 2 Kate Marie Smith
3 A 3 Kate Smith
4 B 1 Ben Frederick Jones
5 B 2 Ben Jones
6 B 3 Ben F. Jones
7 C 1 Charles Lane
8 D 1 Renee Perez
9 D 2 Renee G. Perez
10 E 1 Henry Paul Poss
11 E 2 Henry Poss
I want to rearrange the order for each group to "First Name, Last Name", "First Name, Middle Initial, Last Name", and "First Name, Middle Name, Last Name". The end result would look like this:
structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "D",
"D", "E", "E"), order = c(1, 2, 3, 1, 2, 3, 1, 1, 2, 1, 2), name = c("Kate Smith",
"Kate M. Smith", "Kate Marie Smith", "Ben Jones", "Ben F. Jones",
"Ben Frederick Jones", "Charles Lane", "Renee Perez", "Renee G. Perez",
"Henry Poss", "Henry Paul Poss")), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -11L))
group order name
<chr> <dbl> <chr>
1 A 1 Kate Smith
2 A 2 Kate M. Smith
3 A 3 Kate Marie Smith
4 B 1 Ben Jones
5 B 2 Ben F. Jones
6 B 3 Ben Frederick Jones
7 C 1 Charles Lane
8 D 1 Renee Perez
9 D 2 Renee G. Perez
10 E 1 Henry Poss
11 E 2 Henry Paul Poss
Notice that Group A went from:
- Kate M. Smith
- Kate Marie Smith
- Kate Smith
To:
- Kate Smith
- Kate M. Smith
- Kate Marie Smith
I've tried using arrange
but it doesn't seem like it always captures the exact order.
Any guidance would be appreciated!
CodePudding user response:
We may have to do this by counting the number of words and number of characters in arrange
and then change the 'order' column values by with row_number()
after grouping by 'group'
library(dplyr)
library(stringr)
df %>%
arrange(group, str_count(name, "\\w "), nchar(name)) %>%
group_by(group) %>%
mutate(order = row_number()) %>%
ungroup
-output
# A tibble: 11 × 3
group order name
<chr> <int> <chr>
1 A 1 Kate Smith
2 A 2 Kate M. Smith
3 A 3 Kate Marie Smith
4 B 1 Ben Jones
5 B 2 Ben F. Jones
6 B 3 Ben Frederick Jones
7 C 1 Charles Lane
8 D 1 Renee Perez
9 D 2 Renee G. Perez
10 E 1 Henry Poss
11 E 2 Henry Paul Poss
CodePudding user response:
Ordering on the number of characters in the name string within each group should give the desired results.
Using data.table:
library(data.table)
dt <- structure(list(group = c("A", "A", "A", "B", "B", "B", "C", "D",
"D", "E", "E"), order = c(1, 2, 3, 1, 2, 3, 1, 1, 2, 1, 2), name = c("Kate M. Smith",
"Kate Marie Smith", "Kate Smith", "Ben Frederick Jones", "Ben Jones",
"Ben F. Jones", "Charles Lane", "Renee Perez", "Renee G. Perez",
"Henry Paul Poss", "Henry Poss")), class = c("tbl_df", "tbl",
"data.frame"), row.names = c(NA, -11L))
setDT(dt)
dt[order(group, nchar(name))]
With the result:
group order name
1: A 3 Kate Smith
2: A 1 Kate M. Smith
3: A 2 Kate Marie Smith
4: B 2 Ben Jones
5: B 3 Ben F. Jones
6: B 1 Ben Frederick Jones
7: C 1 Charles Lane
8: D 1 Renee Perez
9: D 2 Renee G. Perez
10: E 2 Henry Poss
11: E 1 Henry Paul Poss