I have this type of data:
df <- data.frame(name = c("Acer laurinum", "Acer laurinum Hassk.", "Acmella paniculata",
"Adinandra cf. integerrima", "Adinandra cf. integerrima T.Anderson"),
value1 = c(1,2,3,4,5),
value2 = c(2,3,4,5,6))
I want to summarise
columns value1
and value2
based on the matched parts of column name
and keep the unique values of the new column author
. This code only does the summarising part but author
is gone:
df %>%
mutate(author = str_extract(name, "(?<=\\s)(?=.*\\.)[.\\w] $"),
name1 = trimws(str_remove(name, "(?<=\\s)(?=.*\\.)[.\\w] $"))) %>%
group_by(name1) %>%
summarise(across(c(value1, value2), sum))
# A tibble: 3 x 3
name1 value1 value2
* <chr> <dbl> <dbl>
1 Acer laurinum 3 5
2 Acmella paniculata 3 4
3 Adinandra cf. integerrima 9 11
Expected output:
# A tibble: 3 x 3
name1 value1 value2 author
* <chr> <dbl> <dbl> <chr>
1 Acer laurinum 3 5 Hassk.
2 Acmella paniculata 3 4 <NA>
3 Adinandra cf. integerrima 9 11 T.Anderson
CodePudding user response:
You may use na.omit(author)[1]
to get 1st non NA value of author
in the group.
library(dplyr)
library(stringr)
df %>%
mutate(author = str_extract(name, "(?<=\\s)(?=.*\\.)[.\\w] $"),
name1 = trimws(str_remove(name, "(?<=\\s)(?=.*\\.)[.\\w] $"))) %>%
group_by(name1) %>%
summarise(across(c(value1, value2), sum),
author = na.omit(author)[1])
# name1 value1 value2 author
# <chr> <dbl> <dbl> <chr>
#1 Acer laurinum 3 5 Hassk.
#2 Acmella paniculata 3 4 NA
#3 Adinandra cf. integerrima 9 11 T.Anderson