Following this topic Get the latest updated value in R.
I would like to add a new problem appeared when I ran the solution from @TarJae
I made a new example
df <- data.frame(name1 = c("Acacia pinnata", "Acer laurinum", "Acmella paniculata", "Aglaia lawii", NA, NA),
name2 = c(NA, NA, NA, NA, "Acer laurinum Hassk.", "Aglaia lawii (Wight)"),
name3 = c("Senegalia rugata (Lam.) Britton & Rose", "Acer laurinum", "Acmella paniculata", "Aglaia lawii", "Acer laurinum Hassk.", "Aglaia lawii (Wight)"))
name1 name2 name3
1 Acacia pinnata <NA> Senegalia rugata (Lam.) Britton & Rose
2 Acer laurinum <NA> Acer laurinum
3 Acmella paniculata <NA> Acmella paniculata
4 Aglaia lawii <NA> Aglaia lawii
5 <NA> Acer laurinum Hassk. Acer laurinum Hassk.
6 <NA> Aglaia lawii (Wight) Aglaia lawii (Wight)
Now using this code from @TarJae
we gonna have the output like this (seem to be the rows being raised)
df %>%
mutate(id = row_number()) %>%
pivot_longer(
cols = -id
) %>%
mutate(helper= word(value, 1)) %>%
group_by(helper) %>%
mutate(value= last(value)) %>%
pivot_wider(
names_from = name,
values_from = value
) %>%
ungroup() %>%
select(-id, -helper) %>%
filter(if_any(everything(), ~ !is.na(.)))
name1 name2 name3
<chr> <chr> <chr>
1 Acacia pinnata NA NA
2 NA NA Senegalia rugata (Lam.) Britton & Rose
3 Acer laurinum Hassk. NA Acer laurinum Hassk.
4 Acmella paniculata NA Acmella paniculata
5 Aglaia lawii (Wight) NA Aglaia lawii (Wight)
6 NA Acer laurinum Hassk. Acer laurinum Hassk.
7 NA Aglaia lawii (Wight) Aglaia lawii (Wight)
We will see our first row already splited to two row. I am thinking now is to merge 2 this row into one. However, what if I honestly have many cases like that.
Any suggestions for this?
Updated code for example* Since another problem raised I would like to make another table here.
df <- data.frame(name1 = c("Acacia pinnata", "Acer laurinum", "Acmella paniculata", "Aglaia lawii", NA, NA, "Alangium javanicum", "Alangium longiflorum", NA),
name2 = c(NA, NA, NA, NA, "Acer laurinum Hassk.", "Aglaia lawii (Wight)", NA,NA, "Alangium javanicum (Blume) Wangerin"),
name3 = c("Senegalia rugata (Lam.) Britton & Rose", "Acer laurinum", "Acmella paniculata", "Aglaia lawii", "Acer laurinum Hassk.", "Aglaia lawii (Wight)",
"Alangium javanicum", "Celtis cf. rigescens (Miq.) Planch.", "Alangium javanicum (Blume) Wangerin"))
Using the code from @TarJae, we gonna see now Celtis cf. rigescens (Miq.) Planch.
is matched with Alangium javanicum (Blume) Wangerin``not
Alangium longiflorum`.
Desired output
name1 name2 name3
1 Acacia pinnata NA Senegalia rugata (Lam.) Britton & ~
2 Acer laurinum Hassk. NA Acer laurinum Hassk.
3 Acmella paniculata NA Acmella paniculata
4 Aglaia lawii (Wight) NA Aglaia lawii (Wight)
5 NA Acer laurinum Hassk. Acer laurinum Hassk.
6 NA Aglaia lawii (Wight) Aglaia lawii (Wight)
7 Alangium javanicum (Blume) Wangerin NA Alangium javanicum (Blume) Wangerin
8 Alangium longiflorum NA Celtis cf. rigescens (Miq.) Planch.
9 NA Alangium javanicum (Blume) Wangerin Alangium javanicum (Blume) Wangerin
CodePudding user response:
update:
The key is here: mutate(helper= word(value, 1,2))
We group by the first two words and then fill the group with the last entry. In case this is not enough you could group by the first 3 words etc...
df %>%
pivot_longer(
cols = everything(),
) %>%
mutate(helper= word(value, 1,2)) %>%
group_by(helper) %>%
mutate(value= last(value)) %>%
ungroup() %>%
select(-helper) %>%
pivot_wider(
names_from = name,
values_from = value,
) %>%
unnest(cols = c(name1, name2, name3))
name1 name2 name3
<chr> <chr> <chr>
1 Acacia pinnata NA Senegalia rugata (Lam.) Britton & ~
2 Acer laurinum Hassk. NA Acer laurinum Hassk.
3 Acmella paniculata NA Acmella paniculata
4 Aglaia lawii (Wight) NA Aglaia lawii (Wight)
5 NA Acer laurinum Hassk. Acer laurinum Hassk.
6 NA Aglaia lawii (Wight) Aglaia lawii (Wight)
7 Alangium javanicum (Blume) Wangerin NA Alangium javanicum (Blume) Wangerin
8 Alangium longiflorum NA Celtis cf. rigescens (Miq.) Planch.
9 NA Alangium javanicum (Blume) Wangerin Alangium javanicum (Blume) Wangerin
- We don't need
id
- We have to
ungroup()
aftermutate(value= last(value))
and - remove
helper
beforepivot_wider
to keep the rows constant.
library(dplyr)
library(tidyr)
library(stringr)
df %>%
pivot_longer(
cols = everything(),
) %>%
mutate(helper= word(value, 1)) %>%
group_by(helper) %>%
mutate(value= last(value)) %>%
ungroup() %>%
select(-helper) %>%
pivot_wider(
names_from = name,
values_from = value,
) %>%
unnest(cols = c(name1, name2, name3))
output:
name1 name2 name3
<chr> <chr> <chr>
1 Acacia pinnata NA Senegalia rugata (Lam.) Britton & Rose
2 Acer laurinum Hassk. NA Acer laurinum Hassk.
3 Acmella paniculata NA Acmella paniculata
4 Aglaia lawii (Wight) NA Aglaia lawii (Wight)
5 NA Acer laurinum Hassk. Acer laurinum Hassk.
6 NA Aglaia lawii (Wight) Aglaia lawii (Wight)