I have an R query that I'm having trouble wrapping my head around. This is the table I have:
House | ownername | type | no. of babies |
---|---|---|---|
1 | Pete | cat | 1 |
1 | Tom | dog | 3 |
2 | Chrissa | cat | 4 |
2 | Mary | dog | 2 |
3 | Pete | cat | 6 |
3 | Pete | dog | 3 |
4 | Jamie | cat | 2 |
all | _ | cat | 13 |
all | _ | dog | 8 |
We can ignore ownername
, but House
, type
and no. of babies
are important. Essentially, each house has a cat and dog and its corresponding amount of offspring. Note that House 4 is missing a row of dogs as it has no dogs, but I want it to be there for the sake of the issue.
I tried the following code:
df %>% select(House, type, no. of babies)
Of course, this gets rid of the ownername
and everything remains the same, but I would like this to be the following output:
House | type | no. of babies |
---|---|---|
1 | cat | 1 |
1 | dog | 3 |
2 | cat | 4 |
2 | dog | 2 |
3 | cat | 6 |
3 | dog | 3 |
4 | cat | 2 |
4 | dog | 0 |
all | cat | 13 |
all | dog | 8 |
I'd like to have the row of the dog in House 4 showing, even if it's no. of babies row does not exist. I would like to figure out how to make it happen. Note, I don't want to synthetically create a row within the dataframe that adds a row of dogs in House 4 (I don't want it hard coded, plus ownername won't be consistent). I was hoping to achieve this through the select function I had copied above, with some additional mutation of some sort, but I'm having considerable troubles with this.
My best interpretation of what I'm trying to do is create a function that looks at the output table, iterates through houses 1-4 (and all) to ensure each have a value for animal1 and animal2 and if they don't, they add one and have no. of babies output as 0. In addition, if this can be done without hard-coding 'cat' and 'dog' (for example, later in the dataframe there is horse and mouse in the same format), that would really help too. If not, that is totally fine.
Please let me know of any solutions that you can find, I would appreciate it immeasurably.
CodePudding user response:
You need the complete()
function from the package tidyr
, which fills in implicit missing values.
Code
library(dplyr)
library(tidyr)
df %>%
select(House, type, `no. of babies`) %>%
complete(House, type, fill = list(`no. of babies` = 0))
# A tibble: 10 × 3
House type `no. of babies`
<chr> <chr> <int>
1 1 cat 1
2 1 dog 3
3 2 cat 4
4 2 dog 2
5 3 cat 6
6 3 dog 3
7 4 cat 2
8 4 dog 0
9 all cat 13
10 all dog 8
Input data
df <- read.table(header = T, check.names = F, sep = "\t", text = "
House ownername type no. of babies
1 Pete cat 1
1 Tom dog 3
2 Chrissa cat 4
2 Mary dog 2
3 Pete cat 6
3 Pete dog 3
4 Jamie cat 2
all _ cat 13
all _ dog 8")