How to create unique ID in R starting from a specific number?-CodePudding

I am trying to create a new column with a unique id that counts incrementally which is based on each client ID starting from a specific number i.e. 1102535 in R.

Here is my current data: enter image description here

Desired output:

enter image description here

CodePudding user response：

It's usually better to post a question with a reproducible sample of your data rather than screenshots. You can see some details about that here.

But to answer your question, the following code should work using the dplyr package:

# LOAD PACKAGE

library(dplyr) 

# CREATE SAMPLE DATA
df <- tribble(~Client, ~Timepoint, ~Status,
              100001,111,"Positive",100001,222,"Positive",100001,111,"Positive",100002,333,"Negative",100002,333,"Negative",100002,444,"Negative",100002,444,"Positive", 100004,555,"Positive",100004,555,"Negative",100004,666,"Positive",100004,666,"Positive",100005,777,"Negative",100005,777,"Positive",100005,777,"Positive",100006,888,"Negative",100006,999,"Negative")

# ADD ROW NUMBERS TO EACH DISTINCT CLIENT (PLUS YOUR CHOICE OF STARTING NUMBER)
# JOIN TO ORIGINAL DF

df |>
  distinct(Client) |> 
  mutate(ID = row_number() 1102534, .before = client) |>
  inner_join(df)

This should produce the following:

  # A tibble: 16 × 4
        ID Client Timepoint Status  
     <dbl>  <dbl>     <dbl> <chr>   
 1 1102535 100001       111 Positive
 2 1102535 100001       222 Positive
 3 1102535 100001       111 Positive
 4 1102536 100002       333 Negative
 5 1102536 100002       333 Negative
 6 1102536 100002       444 Negative
 7 1102536 100002       444 Positive
 8 1102537 100004       555 Positive
 9 1102537 100004       555 Negative
10 1102537 100004       666 Positive
11 1102537 100004       666 Positive
12 1102538 100005       777 Negative
13 1102538 100005       777 Positive
14 1102538 100005       777 Positive
15 1102539 100006       888 Negative
16 1102539 100006       999 Negative

CodePudding user response：

data.table solution

df <- df[
  j = ID := base::as.numeric(base::interaction(Client,drop=TRUE))   1102534
]

interaction() makes it possible to create an unique ID based on multiple variables if needed. Here, you only need Client.

CodePudding user response：

We can use cur_group_id to assign a group ID, then we can adjust the number based on the starting number. You have to subtract 1 as cur_group_id will start at 1.

library(tidyverse)

start_num <- 1102535

df %>%
  group_by(Client) %>%
  mutate(ID = cur_group_id() start_num-1)

Output

   Client  Time      ID
    <dbl> <dbl>   <dbl>
1 1000001  69.0 1102535
2 1000001  39.0 1102535
3 1000001  77.2 1102535
4 1000002  50.3 1102536
5 1000002  72.0 1102536
6 1000003  99.2 1102537

Data

df <- structure(list(Client = c(1000001, 1000001, 1000001, 1000002, 
1000002, 1000003), Time = c(69.0152618191205, 39.02626810316, 
77.2143005798571, 50.2722249664366, 72.0442323181778, 99.1987033882178
)), class = "data.frame", row.names = c(NA, -6L))