I have a df
as below
df <- data.frame(col = c("Acer laurinum", "Acer laurinum"),
author = c("", "Hassk."),
value = c(1,1))
df
col author value1 value2
1 Acer laurinum 1 2
2 Acer laurinum Hassk. 1 3
3 Acronychia pedunculata (L.) Miq. 0 1
4 Acronychia pedunculata 0 0
I would like to update the author
column saying that if I found any duplicated rows in col
column, then the blank cell of author
associated with the value (i.e. Acer laurinum
) will be updated to Hassk.
And the value
must be retained.
- Desired output
col author value1 value2
1 Acer laurinum Hassk. 1 2
2 Acer laurinum Hassk. 1 3
3 Acronychia pedunculata (L.) Miq. 0 1
4 Acronychia pedunculata (L.) Miq. 0 0
Any suggestions for this?
CodePudding user response:
You should use dplyr::group_by
and dplyr::mutate
with max
:
df %>% group_by(col) %>%
mutate(author=max(author))
Output:
col author value1 value2
<chr> <chr> <dbl> <dbl>
1 Acer laurinum Hassk. 1 2
2 Acer laurinum Hassk. 1 3
3 Acronychia pedunculata (L.) Miq. 0 1
4 Acronychia pedunculata (L.) Miq. 0 0
>
CodePudding user response:
# Base option 1:
df$author <- with(df, ave(author, col, FUN = max))
# Base option 2:
transform(
with(
df,
replace(df, df == "", NA_character_)[
order(col, author, decreasing = TRUE),
]
),
author = na.omit(author)[cumsum(!(is.na(author)))]
)[row.names(df),]
Data:
df <- structure(list(col = c("Acer laurinum", "Acer laurinum", "Acronychia pedunculata",
"Acronychia pedunculata"), author = c("", "Hassk.", "(L.) Miq.",
""), value1 = c(1, 1, 0, 0), value2 = c(2, 3, 1, 0)), class = "data.frame", row.names = c(NA,
-4L))