Home > database >  Insert rows of missing subcategory with corresponding value as zero in R
Insert rows of missing subcategory with corresponding value as zero in R

Time:11-16

I currently have a .csv data file with contents like:

Name Fruit Qty
Tauwen Apple 32
Tauewn Pear 09
Wikfae Apple 09
Moigaw Pear 16

I want to use R to insert the missing "Pear" for "Wikfae" with corresponding Qty as 0, and missing "Apple" for "Moigaw" with corresponding Qty as 0. So the output table I want is:

Name Fruit Qty
Tauwen Apple 32
Tauewn Pear 09
Wikfae Apple 09
Wikfae Peak 0 #Added line
Moigaw Pear 16
Moigaw Apple 0 #Added line

I don't even know where to begin. A tidyverse-based solution would be most useful for me.

CodePudding user response:

We could use complete:

library(tidyr)

df |>
  complete(Name, Fruit, fill = list(Qty = "0"))

Output:

# A tibble: 6 × 3
  Name   Fruit Qty  
  <chr>  <chr> <chr>
1 Moigaw Apple 0    
2 Moigaw Pear  16   
3 Tauwen Apple 32   
4 Tauwen Pear  09   
5 Wikfae Apple 09   
6 Wikfae Pear  0    

You might however consider turning Qty into a numeric:

df |>
  mutate(Qty = as.numeric(Qty)) |>
  complete(Name, Fruit, fill = list(Qty = 0))

Output:

# A tibble: 6 × 3
  Name   Fruit   Qty
  <chr>  <chr> <dbl>
1 Moigaw Apple     0
2 Moigaw Pear     16
3 Tauwen Apple    32
4 Tauwen Pear      9
5 Wikfae Apple     9
6 Wikfae Pear      0

Data without typos:

library(readr)

df <- 
  read_table("Name Fruit Qty
              Tauwen Apple 32
              Tauwen Pear 09
              Wikfae Apple 09
              Moigaw Pear 16") 

CodePudding user response:

I'm not sure if I'm understanding your question correctly, but from what it sounds like, you are wanting to surgically add rows into your data. This can be done simply using add_row. First I simulated your data:

#### Simulate Data ####
Name <- c("Tauwen","Tauwen","Wikfae","Moigaw")
Fruit <- c("Apple","Pear","Apple","Pear")
Qty <- c(32,09,09,16)
tib <- tibble(
  Name,
  Fruit,
  Qty
)

Then I added rows, first by a specific placement (.before here says which row to place it before), then adding a row at the end.

#### Add Rows ####
tib %>% 
  add_row(Name = "Wikfae", 
          Fruit = "Peak",
          Qty = 0,
          .before = 4) %>% 
  add_row(Name = "Moigaw", 
          Fruit = "Apple",
          Qty = 0)

Giving you this:

# A tibble: 6 × 3
  Name   Fruit   Qty
  <chr>  <chr> <dbl>
1 Tauwen Apple    32
2 Tauwen Pear      9
3 Wikfae Apple     9
4 Wikfae Peak      0
5 Moigaw Pear     16
6 Moigaw Apple     0

If your goal is simply to complete observations in a dataframe with these values already existing in some way, then the other answer by Harre here is likely a better choice.

CodePudding user response:

Here is the long way of @harre's preferable answer:

library(dplyr)
library(tidyr)

df %>% 
  group_by(Name) %>% 
  summarise(cur_data()[seq(2),]) %>% 
  ungroup() %>% 
  mutate(Fruit = if(first(Fruit) == "Pear") replace(Fruit, is.na(Fruit), "Apple") 
         else replace(Fruit), is.na(Fruit), first(Fruit)) %>% 
  mutate(Qty = replace_na(Qty, 0)) %>% 
  select(Name, Fruit, Qty)
  Name   Fruit   Qty
  <chr>  <chr> <int>
1 Moigaw Pear     16
2 Moigaw Apple     0
3 Tauwen Apple    32
4 Tauwen Pear      9
5 Wikfae Apple     9
6 Wikfae Apple     0
  • Related