I currently have a .csv
data file with contents like:
Name Fruit Qty
Tauwen Apple 32
Tauewn Pear 09
Wikfae Apple 09
Moigaw Pear 16
I want to use R to insert the missing "Pear" for "Wikfae" with corresponding Qty as 0, and missing "Apple" for "Moigaw" with corresponding Qty as 0. So the output table I want is:
Name Fruit Qty
Tauwen Apple 32
Tauewn Pear 09
Wikfae Apple 09
Wikfae Peak 0 #Added line
Moigaw Pear 16
Moigaw Apple 0 #Added line
I don't even know where to begin. A tidyverse
-based solution would be most useful for me.
CodePudding user response:
We could use complete
:
library(tidyr)
df |>
complete(Name, Fruit, fill = list(Qty = "0"))
Output:
# A tibble: 6 × 3
Name Fruit Qty
<chr> <chr> <chr>
1 Moigaw Apple 0
2 Moigaw Pear 16
3 Tauwen Apple 32
4 Tauwen Pear 09
5 Wikfae Apple 09
6 Wikfae Pear 0
You might however consider turning Qty
into a numeric:
df |>
mutate(Qty = as.numeric(Qty)) |>
complete(Name, Fruit, fill = list(Qty = 0))
Output:
# A tibble: 6 × 3
Name Fruit Qty
<chr> <chr> <dbl>
1 Moigaw Apple 0
2 Moigaw Pear 16
3 Tauwen Apple 32
4 Tauwen Pear 9
5 Wikfae Apple 9
6 Wikfae Pear 0
Data without typos:
library(readr)
df <-
read_table("Name Fruit Qty
Tauwen Apple 32
Tauwen Pear 09
Wikfae Apple 09
Moigaw Pear 16")
CodePudding user response:
I'm not sure if I'm understanding your question correctly, but from what it sounds like, you are wanting to surgically add rows into your data. This can be done simply using add_row
. First I simulated your data:
#### Simulate Data ####
Name <- c("Tauwen","Tauwen","Wikfae","Moigaw")
Fruit <- c("Apple","Pear","Apple","Pear")
Qty <- c(32,09,09,16)
tib <- tibble(
Name,
Fruit,
Qty
)
Then I added rows, first by a specific placement (.before
here says which row to place it before), then adding a row at the end.
#### Add Rows ####
tib %>%
add_row(Name = "Wikfae",
Fruit = "Peak",
Qty = 0,
.before = 4) %>%
add_row(Name = "Moigaw",
Fruit = "Apple",
Qty = 0)
Giving you this:
# A tibble: 6 × 3
Name Fruit Qty
<chr> <chr> <dbl>
1 Tauwen Apple 32
2 Tauwen Pear 9
3 Wikfae Apple 9
4 Wikfae Peak 0
5 Moigaw Pear 16
6 Moigaw Apple 0
If your goal is simply to complete observations in a dataframe with these values already existing in some way, then the other answer by Harre here is likely a better choice.
CodePudding user response:
Here is the long way of @harre's preferable answer:
library(dplyr)
library(tidyr)
df %>%
group_by(Name) %>%
summarise(cur_data()[seq(2),]) %>%
ungroup() %>%
mutate(Fruit = if(first(Fruit) == "Pear") replace(Fruit, is.na(Fruit), "Apple")
else replace(Fruit), is.na(Fruit), first(Fruit)) %>%
mutate(Qty = replace_na(Qty, 0)) %>%
select(Name, Fruit, Qty)
Name Fruit Qty
<chr> <chr> <int>
1 Moigaw Pear 16
2 Moigaw Apple 0
3 Tauwen Apple 32
4 Tauwen Pear 9
5 Wikfae Apple 9
6 Wikfae Apple 0