Home > OS >  How to create multiple new columns based on multiple conditions of other columns in R?
How to create multiple new columns based on multiple conditions of other columns in R?

Time:04-11

I am trying to figure out how to create new columns based on other columns and insert another column's value as observations in the newly created columns. Not sure how to clearly explain this in words so I will show an example.

Basically my data looks like this:

code spec grid month day depth number
01 31 1 6 17 5 1
01 33 1 6 17 5 2
01 45 1 6 17 10 15
02 45 12 6 17 10 34
02 45 12 7 19 15 1
03 31 15 8 27 15 30
03 33 16 9 13 20 34
03 31 18 10 17 25 100

My dataset has 2,514 rows.

The 'spec' column is the species code. I would like to create new columns with the actual species name (for each species code) and its respective number as observations(rows).

For example (simplified/mock data for privacy purposes). Let's say the species code: 31 == mosquito, 33 == lion, 45 == fish....etc. I would like my data to end up like this:

code mosquito lion fish grid month day depth
01 1 0 0 1 6 17 5
01 0 2 0 1 6 17 5
01 0 0 15 1 6 17 10
02 0 0 34 12 6 17 10
02 0 0 1 12 7 19 15
03 30 0 0 15 8 27 15
03 0 34 0 16 9 13 20
03 100 0 0 18 10 17 25
  1. New column with the name of the species for each unique species code (there's probably about 15 species in my data so should have ~15 new columns)
  2. Each species should have its respective 'number', if the species was not counted it should have a '0'
  3. The 'number' and 'spec' columns can get deleted after this is all set and done. *Of course, the data needs to align to its respective day, month, code, grid, etc. so nothing else is changed/modified from original data

I have looked online for a few hours but I haven't been able to find a clear answer... I have found simpler solutions but not something that meets my question.

I have played around with case_when, if_else, and sapply but can't seem to make it work. I hope this question is clear. Happy to clarify further. Any suggestions? Thank you!

CodePudding user response:

You could do:

library(tidyverse)

df %>% 
  mutate(spec = case_when(
      spec == 31 ~ 'mosquito', 
      spec == 33 ~ 'lion', 
      spec == 45 ~ 'fish',
      TRUE ~ NA_character_),
      id = row_number()) %>%
  pivot_wider(names_from = spec, values_from = number) %>%
  mutate(across(everything(), ~ replace_na(.x, 0)))
#> # A tibble: 8 x 9
#>    code  grid month   day depth    id mosquito  lion  fish
#>   <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>    <dbl> <dbl> <dbl>
#> 1     1     1     6    17     5     1        1     0     0
#> 2     1     1     6    17     5     2        0     2     0
#> 3     1     1     6    17    10     3        0     0    15
#> 4     2    12     6    17    10     4        0     0    34
#> 5     2    12     7    19    15     5        0     0     1
#> 6     3    15     8    27    15     6       30     0     0
#> 7     3    16     9    13    20     7        0    34     0
#> 8     3    18    10    17    25     8      100     0     0

Note that you need to map your other species to the spec number inside case_when

Created on 2022-04-10 by the reprex package (v2.0.1)


Data from question in reproducible format:

``` r
df <- structure(list(code = c(1L, 1L, 1L, 2L, 2L, 3L, 3L, 3L), spec = c(31L, 
33L, 45L, 45L, 45L, 31L, 33L, 31L), grid = c(1L, 1L, 1L, 12L, 
12L, 15L, 16L, 18L), month = c(6L, 6L, 6L, 6L, 7L, 8L, 9L, 10L
), day = c(17L, 17L, 17L, 17L, 19L, 27L, 13L, 17L), depth = c(5L, 
5L, 10L, 10L, 15L, 15L, 20L, 25L), number = c(1L, 2L, 15L, 34L, 
1L, 30L, 34L, 100L)), class = "data.frame", row.names = c(NA, 
-8L))

df
#>   code spec grid month day depth number
#> 1    1   31    1     6  17     5      1
#> 2    1   33    1     6  17     5      2
#> 3    1   45    1     6  17    10     15
#> 4    2   45   12     6  17    10     34
#> 5    2   45   12     7  19    15      1
#> 6    3   31   15     8  27    15     30
#> 7    3   33   16     9  13    20     34
#> 8    3   31   18    10  17    25    100

CodePudding user response:

You could use

library(dplyr)
library(tidyr)

df1 %>% 
  left_join(df2, by = "spec") %>% 
  pivot_wider(values_from = "number", values_fill = 0) %>% 
  select(code, mosquito, lion, fish, grid, month, day, depth)

This returns

# A tibble: 8 x 8
  code  mosquito  lion  fish  grid month   day depth
  <chr>    <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
1 01           1     0     0     1     6    17     5
2 01           0     2     0     1     6    17     5
3 01           0     0    15     1     6    17    10
4 02           0     0    34    12     6    17    10
5 02           0     0     1    12     7    19    15
6 03          30     0     0    15     8    27    15
7 03           0    34     0    16     9    13    20
8 03         100     0     0    18    10    17    25

I used two data frames: Your shown data

structure(list(code = c("01", "01", "01", "02", "02", "03", "03", 
"03"), spec = c(31, 33, 45, 45, 45, 31, 33, 31), grid = c(1, 
1, 1, 12, 12, 15, 16, 18), month = c(6, 6, 6, 6, 7, 8, 9, 10), 
    day = c(17, 17, 17, 17, 19, 27, 13, 17), depth = c(5, 5, 
    10, 10, 15, 15, 20, 25), number = c(1, 2, 15, 34, 1, 30, 
    34, 100)), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -8L), spec = structure(list(cols = list(
    code = structure(list(), class = c("collector_character", 
    "collector")), spec = structure(list(), class = c("collector_double", 
    "collector")), grid = structure(list(), class = c("collector_double", 
    "collector")), month = structure(list(), class = c("collector_double", 
    "collector")), day = structure(list(), class = c("collector_double", 
    "collector")), depth = structure(list(), class = c("collector_double", 
    "collector")), number = structure(list(), class = c("collector_double", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))

and a data.frame for converting spec into the real species name:

structure(list(spec = c(31, 33, 45), name = c("mosquito", "lion", 
"fish")), class = c("spec_tbl_df", "tbl_df", "tbl", "data.frame"
), row.names = c(NA, -3L), spec = structure(list(cols = list(
    spec = structure(list(), class = c("collector_double", "collector"
    )), name = structure(list(), class = c("collector_character", 
    "collector"))), default = structure(list(), class = c("collector_guess", 
"collector")), skip = 1L), class = "col_spec"))
  • Related