R studio : Create an ID column in which integers will be incremented and change gradually (from 1 to-CodePudding

I have a dataframe in which I have many observations of different taxa. I need to add a column in this dataframe in which I'll find an ID composed by numbers from 1 to [n(taxa)] : to illustrate, here's an example of my dataframe :


taxa;          station_nom;     x;        y;        density_m²;

Anax;          station_1;       x1;       y1;          26;
Anax;          station_2;       x2;       y2;          38; 
Anopheles;     station_1;       x1;       y1;          3; 
Anopheles;     station_2;       x2;       y2;          12;
Atrichopogon;  station_3;       x3;       y3;          89;
[...]

And I would like to add a new column named "CODE" which should refers a fictional ID number for each taxon from 1 to the number of taxa :

taxa;          station_nom;     x;        y;        density_m²;       CODE;

Anax;          station_1;       x1;       y1;          26;             1;
Anax;          station_2;       x2;       y2;          38;             1;
Anopheles;     station_1;       x1;       y1;          3;              2;
Anopheles;     station_2;       x2;       y2;          12;             2;  
Atrichopogon;  station_3;       x3;       y3;          89;             3;

I need that all the "Anax" taxa have the same CODE (here 1), and all the "Anopheles" taxa have the [Anax CODE] 1, etc...

I tried different things but the most accurate is probably the fonction "Mutate" from tidyverse. Here's one of the things i tried, which works fine in other dataframes (in which i have 1 observation per taxa). In my actual case, I have several observations for the same taxon.

Obs_emb<- BDD %>%
  group_by(embranchement_phylum_2, station_nom, x, y) %>%
  summarise(densite_m2 = round(mean(densite_par_m2))) 
Obs_emb<- dplyr::mutate(Obs_emb, CODE = row_number())

This code add a new column named "CODE" but there's no incrementation.

I think it could be interesting to try some loops based on the difference between the names of all taxa... but my knowledge stops here.

Can anyone help me ?

CodePudding user response：

Or this with dplyr >= 1.0.0

df <- tibble(val = c(1,2,3,4),
             group = c("a", "a", "b", "b")
             )

gf <- group_by(df, group)

mutate(gf, ID = cur_group_id())

#> # A tibble: 4 x 3
#> # Groups:   group [2]
#>     val group    ID
#>   <dbl> <chr> <int>
#> 1     1 a         1
#> 2     2 a         1
#> 3     3 b         2
#> 4     4 b         2

CodePudding user response：

Try this ...

library(tidyverse)

tibble(taxa = c("a", "a", "b", "c"), value = 1:4) |> 
  nest(data = -taxa) |> 
  mutate(code = row_number()) |> 
  unnest(cols = c(data))
#> # A tibble: 4 × 3
#>   taxa  value  code
#>   <chr> <int> <int>
#> 1 a         1     1
#> 2 a         2     1
#> 3 b         3     2
#> 4 c         4     3

^{Created on 2022-04-27 by the reprex package (v2.0.1)}

library(tidyverse)

coded <- tibble(taxa = c(rep("a", 100), rep("b", 10), rep("c", 10)), value = 1:120) |> 
  nest(data = -taxa) |> 
  mutate(code = row_number()) |> 
  unnest(cols = c(data))

coded |> count(code)
#> # A tibble: 3 × 2
#>    code     n
#>   <int> <int>
#> 1     1   100
#> 2     2    10
#> 3     3    10

^{Created on 2022-04-27 by the reprex package (v2.0.1)}

CodePudding user response：

# Packages 
library("tibble")
library("dplyr")
# Data
aa <- tibble(val = c(1,2,3,4), group = c("a", "a", "b", "b"))

# Use either Base R Pipe or Magrittr 
aa |> mutate(x = match(group, unique(group)))
# A tibble: 4 x 3
#    val group     x
#  <dbl> <chr> <int>
#1     1 a         1
#2     2 a         1
#3     3 b         2
#4     4 b         2