I have an R df where one column, Author, looks like this (the names themselves are replaced by 'Last' and 'First' in this post for anonymity):
id | Author |
---|---|
1 | Last, First & Last, First |
2 | Last, First & Last, First & Last, First & Last, First |
3 | Last, First & Last, First & Last, First |
I need to add a new column for the first and last name of every author. It should look like this:
id | First1 | Last1 | First2 | Last2 | First3 | Last3 | First4 | Last4 |
---|---|---|---|---|---|---|---|---|
1 | First | Last | First | Last | ||||
2 | First | Last | First | Last | First | Last | First | Last |
3 | First | Last | First | Last | First | Last |
Of course, not all readings have the same number of authors, so I'm unable to append a specific number of columns for all rows.
** NOTE: I've done this for columns that only have one author using
data <- data %>%
mutate(FirstName=unlist(lapply(strsplit(Author,", "),function(x) x[2])),
LastName=gsub(",.*","",Author))
How can I do this?
CodePudding user response:
One option can be:
df %>%
mutate(map_dfr(.x = str_split(Author, ", | & "),
~ set_names(.x, ave(.x, .x, FUN = function(y) paste0(y, cumsum(duplicated(y)) 1)))))
id Author Last1 First1 Last2 First2 Last3 First3 Last4 First4
1 1 Last, First & Last, First Last First Last First <NA> <NA> <NA> <NA>
2 2 Last, First & Last, First & Last, First & Last, First Last First Last First Last First Last First
3 3 Last, First & Last, First & Last, First Last First Last First Last First <NA> <NA>
CodePudding user response:
Here I use a small helper function to return a list of last and first names
get_names <- function(x) {
authors = strsplit(x, " & ")[[1]]
lapply(authors, \(a) setNames(strsplit(a, ", ")[[1]], c("Last", "First")))
}
Then I apply it to each id
, unnest, and pivot wide
df %>%
group_by(id) %>%
summarize(names=list(get_names(Author))) %>%
unnest(names) %>%
unnest_wider(names) %>%
group_by(id) %>%
mutate(rn=row_number()) %>%
pivot_wider(id, names_from =rn, names_sep="",values_from = c(First,Last),names_vary="slowest")
CodePudding user response:
Here's a version using tidyr::separate
:
library(tidyr)
library(stringr)
n_auth = max(str_count(df$Author, pattern = "&")) 1
df %>%
separate(
Author,
sep = " *[,&] ",
into = c(outer(c("First", "Last"), 1:n_auth, FUN = paste0)),
fill = "right"
)
# id First1 Last1 First2 Last2 First3 Last3 First4 Last4
# 1 1 Last First Last First <NA> <NA> <NA> <NA>
# 2 2 Last First Last First Last First Last First
# 3 3 Last First Last First Last First <NA> <NA>
CodePudding user response:
In base R you will do:
(df1 <- read.table(text=gsub('[,&]', '', df$Author), fill=TRUE))
V1 V2 V3 V4 V5 V6 V7 V8
1 Last First Last First
2 Last First Last First Last First Last First
3 Last First Last First Last First
You can then add names:
names(df1)<-paste0(c("Last", "First"), gl(ncol(df1), 2,ncol(df1)))
df1
Last1 First1 Last2 First2 Last3 First3 Last4 First4
1 Last First Last First
2 Last First Last First Last First Last First
3 Last First Last First Last First
of course you can rearrange the table to have first
before last
:
df1[order(sub("\\D ", '',names(df1)), sub("\\d ", '', names(df1)))]
First1 Last1 First2 Last2 First3 Last3 First4 Last4
1 First Last First Last
2 First Last First Last First Last First Last
3 First Last First Last First Last