I am trying to figure out how to create new columns based on other columns and insert another column's value as observations in the newly created columns. Not sure how to clearly explain this in words so I will show an example.
Basically my data looks like this:
code | spec | grid | month | day | depth | number |
---|---|---|---|---|---|---|
01 | 31 | 1 | 6 | 17 | 5 | 1 |
01 | 33 | 1 | 6 | 17 | 5 | 2 |
01 | 45 | 1 | 6 | 17 | 10 | 15 |
02 | 45 | 12 | 6 | 17 | 10 | 34 |
02 | 45 | 12 | 7 | 19 | 15 | 1 |
03 | 31 | 15 | 8 | 27 | 15 | 30 |
03 | 33 | 16 | 9 | 13 | 20 | 34 |
03 | 31 | 18 | 10 | 17 | 25 | 100 |
My dataset has 2,514 rows.
The 'spec' column is the species code. I would like to create new columns with the actual species name (for each species code) and its respective number as observations(rows).
For example (simplified/mock data for privacy purposes). Let's say the species code: 31 == mosquito, 33 == lion, 45 == fish....etc. I would like my data to end up like this:
code | mosquito | lion | fish | grid | month | day | depth |
---|---|---|---|---|---|---|---|
01 | 1 | 0 | 0 | 1 | 6 | 17 | 5 |
01 | 0 | 2 | 0 | 1 | 6 | 17 | 5 |
01 | 0 | 0 | 15 | 1 | 6 | 17 | 10 |
02 | 0 | 0 | 34 | 12 | 6 | 17 | 10 |
02 | 0 | 0 | 1 | 12 | 7 | 19 | 15 |
03 | 30 | 0 | 0 | 15 | 8 | 27 | 15 |
03 | 0 | 34 | 0 | 16 | 9 | 13 | 20 |
03 | 100 | 0 | 0 | 18 | 10 | 17 | 25 |
- New column with the name of the species for each unique species code (there's probably about 15 species in my data so should have ~15 new columns)
- Each species should have its respective 'number', if the species was not counted it should have a '0'
- The 'number' and 'spec' columns can get deleted after this is all set and done. *Of course, the data needs to align to its respective day, month, code, grid, etc. so nothing else is changed/modified from original data
I have looked online for a few hours but I haven't been able to find a clear answer... I have found simpler solutions but not something that meets my question.
I have played around with case_when, if_else, and sapply but can't seem to make it work. I hope this question is clear. Happy to clarify further. Any suggestions? Thank you!
CodePudding user response:
You could do:
library(tidyverse)
df %>%
mutate(spec = case_when(
spec == 31 ~ 'mosquito',
spec == 33 ~ 'lion',
spec == 45 ~ 'fish',
TRUE ~ NA_character_),
id = row_number()) %>%
pivot_wider(names_from = spec, values_from = number) %>%
mutate(across(everything(), ~ replace_na(.x, 0)))
#> # A tibble: 8 x 9
#> code grid month day depth id mosquito lion fish
#> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#> 1 1 1 6 17 5 1 1 0 0
#> 2 1 1 6 17 5 2 0 2 0
#> 3 1 1 6 17 10 3 0 0 15
#> 4 2 12 6 17 10 4 0 0 34
#> 5 2 12 7 19 15 5 0 0 1
#> 6 3 15 8 27 15 6 30 0 0
#> 7 3 16 9 13 20 7 0 34 0
#> 8 3 18 10 17 25 8 100 0 0
Note that you need to map your other species to the spec
number inside case_when
Created on 2022-04-10 by the reprex package (v2.0.1)
Data from question in reproducible format:
``` r
df <- structure(list(code = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L), spec = c(31L,
33L, 45L, 45L, 45L, 31L, 33L, 31L), grid = c(1L, 1L, 1L, 12L,
12L, 15L, 16L, 18L), month = c(6L, 6L, 6L, 6L, 7L, 8L, 9L, 10L
), day = c(17L, 17L, 17L, 17L, 19L, 27L, 13L, 17L), depth = c(5L,
5L, 10L, 10L, 15L, 15L, 20L, 25L), number = c(1L, 2L, 15L, 34L,
1L, 30L, 34L, 100L)), class = "data.frame", row.names = c(NA,
-8L))
df
#> code spec grid month day depth number
#> 1 1 31 1 6 17 5 1
#> 2 1 33 1 6 17 5 2
#> 3 1 45 1 6 17 10 15
#> 4 2 45 12 6 17 10 34
#> 5 2 45 12 7 19 15 1
#> 6 3 31 15 8 27 15 30
#> 7 3 33 16 9 13 20 34
#> 8 3 31 18 10 17 25 100
CodePudding user response:
You could use
library(dplyr)
library(tidyr)
df1 %>%
left_join(df2, by = "spec") %>%
pivot_wider(values_from = "number", values_fill = 0) %>%
select(code, mosquito, lion, fish, grid, month, day, depth)
This returns
# A tibble: 8 x 8
code mosquito lion fish grid month day depth
<chr> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 01 1 0 0 1 6 17 5
2 01 0 2 0 1 6 17 5
3 01 0 0 15 1 6 17 10
4 02 0 0 34 12 6 17 10
5 02 0 0 1 12 7 19 15
6 03 30 0 0 15 8 27 15
7 03 0 34 0 16 9 13 20
8 03 100 0 0 18 10 17 25
I used two data frames: Your shown data
structure(list(code = c("01", "01", "01", "02", "02", "03", "03",
"03"), spec = c(31, 33, 45, 45, 45, 31, 33, 31), grid = c(1,
1, 1, 12, 12, 15, 16, 18), month = c(6, 6, 6, 6, 7, 8, 9, 10),
day = c(17, 17, 17, 17, 19, 27, 13, 17), depth = c(5, 5,
10, 10, 15, 15, 20, 25), number = c(1, 2, 15, 34, 1, 30,
34, 100)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -8L), spec = structure(list(cols = list(
code = structure(list(), class = c("collector_character",
"collector")), spec = structure(list(), class = c("collector_double",
"collector")), grid = structure(list(), class = c("collector_double",
"collector")), month = structure(list(), class = c("collector_double",
"collector")), day = structure(list(), class = c("collector_double",
"collector")), depth = structure(list(), class = c("collector_double",
"collector")), number = structure(list(), class = c("collector_double",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))
and a data.frame for converting spec
into the real species name:
structure(list(spec = c(31, 33, 45), name = c("mosquito", "lion",
"fish")), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), spec = structure(list(cols = list(
spec = structure(list(), class = c("collector_double", "collector"
)), name = structure(list(), class = c("collector_character",
"collector"))), default = structure(list(), class = c("collector_guess",
"collector")), skip = 1L), class = "col_spec"))