I have a dataframe in which I have many observations of different taxa. I need to add a column in this dataframe in which I'll find an ID composed by numbers from 1 to [n(taxa)] : to illustrate, here's an example of my dataframe :
taxa; station_nom; x; y; density_m²;
Anax; station_1; x1; y1; 26;
Anax; station_2; x2; y2; 38;
Anopheles; station_1; x1; y1; 3;
Anopheles; station_2; x2; y2; 12;
Atrichopogon; station_3; x3; y3; 89;
[...]
And I would like to add a new column named "CODE" which should refers a fictional ID number for each taxon from 1 to the number of taxa :
taxa; station_nom; x; y; density_m²; CODE;
Anax; station_1; x1; y1; 26; 1;
Anax; station_2; x2; y2; 38; 1;
Anopheles; station_1; x1; y1; 3; 2;
Anopheles; station_2; x2; y2; 12; 2;
Atrichopogon; station_3; x3; y3; 89; 3;
I need that all the "Anax" taxa have the same CODE (here 1), and all the "Anopheles" taxa have the [Anax CODE] 1, etc...
I tried different things but the most accurate is probably the fonction "Mutate" from tidyverse. Here's one of the things i tried, which works fine in other dataframes (in which i have 1 observation per taxa). In my actual case, I have several observations for the same taxon.
Obs_emb<- BDD %>%
group_by(embranchement_phylum_2, station_nom, x, y) %>%
summarise(densite_m2 = round(mean(densite_par_m2)))
Obs_emb<- dplyr::mutate(Obs_emb, CODE = row_number())
This code add a new column named "CODE" but there's no incrementation.
I think it could be interesting to try some loops based on the difference between the names of all taxa... but my knowledge stops here.
Can anyone help me ?
CodePudding user response:
Or this with dplyr >= 1.0.0
df <- tibble(val = c(1,2,3,4),
group = c("a", "a", "b", "b")
)
gf <- group_by(df, group)
mutate(gf, ID = cur_group_id())
#> # A tibble: 4 x 3
#> # Groups: group [2]
#> val group ID
#> <dbl> <chr> <int>
#> 1 1 a 1
#> 2 2 a 1
#> 3 3 b 2
#> 4 4 b 2
CodePudding user response:
Try this ...
library(tidyverse)
tibble(taxa = c("a", "a", "b", "c"), value = 1:4) |>
nest(data = -taxa) |>
mutate(code = row_number()) |>
unnest(cols = c(data))
#> # A tibble: 4 × 3
#> taxa value code
#> <chr> <int> <int>
#> 1 a 1 1
#> 2 a 2 1
#> 3 b 3 2
#> 4 c 4 3
Created on 2022-04-27 by the reprex package (v2.0.1)
library(tidyverse)
coded <- tibble(taxa = c(rep("a", 100), rep("b", 10), rep("c", 10)), value = 1:120) |>
nest(data = -taxa) |>
mutate(code = row_number()) |>
unnest(cols = c(data))
coded |> count(code)
#> # A tibble: 3 × 2
#> code n
#> <int> <int>
#> 1 1 100
#> 2 2 10
#> 3 3 10
Created on 2022-04-27 by the reprex package (v2.0.1)
CodePudding user response:
# Packages
library("tibble")
library("dplyr")
# Data
aa <- tibble(val = c(1,2,3,4), group = c("a", "a", "b", "b"))
# Use either Base R Pipe or Magrittr
aa |> mutate(x = match(group, unique(group)))
# A tibble: 4 x 3
# val group x
# <dbl> <chr> <int>
#1 1 a 1
#2 2 a 1
#3 3 b 2
#4 4 b 2