I am trying to create a new column with a unique id that counts incrementally which is based on each client ID starting from a specific number i.e. 1102535 in R.
Here is my current data: enter image description here
Desired output:
CodePudding user response:
It's usually better to post a question with a reproducible sample of your data rather than screenshots. You can see some details about that here.
But to answer your question, the following code should work using the dplyr package:
# LOAD PACKAGE
library(dplyr)
# CREATE SAMPLE DATA
df <- tribble(~Client, ~Timepoint, ~Status,
100001,111,"Positive",100001,222,"Positive",100001,111,"Positive",100002,333,"Negative",100002,333,"Negative",100002,444,"Negative",100002,444,"Positive", 100004,555,"Positive",100004,555,"Negative",100004,666,"Positive",100004,666,"Positive",100005,777,"Negative",100005,777,"Positive",100005,777,"Positive",100006,888,"Negative",100006,999,"Negative")
# ADD ROW NUMBERS TO EACH DISTINCT CLIENT (PLUS YOUR CHOICE OF STARTING NUMBER)
# JOIN TO ORIGINAL DF
df |>
distinct(Client) |>
mutate(ID = row_number() 1102534, .before = client) |>
inner_join(df)
This should produce the following:
# A tibble: 16 × 4
ID Client Timepoint Status
<dbl> <dbl> <dbl> <chr>
1 1102535 100001 111 Positive
2 1102535 100001 222 Positive
3 1102535 100001 111 Positive
4 1102536 100002 333 Negative
5 1102536 100002 333 Negative
6 1102536 100002 444 Negative
7 1102536 100002 444 Positive
8 1102537 100004 555 Positive
9 1102537 100004 555 Negative
10 1102537 100004 666 Positive
11 1102537 100004 666 Positive
12 1102538 100005 777 Negative
13 1102538 100005 777 Positive
14 1102538 100005 777 Positive
15 1102539 100006 888 Negative
16 1102539 100006 999 Negative
CodePudding user response:
data.table solution
df <- df[
j = ID := base::as.numeric(base::interaction(Client,drop=TRUE)) 1102534
]
interaction()
makes it possible to create an unique ID
based on multiple variables if needed. Here, you only need Client
.
CodePudding user response:
We can use cur_group_id
to assign a group ID, then we can adjust the number based on the starting number. You have to subtract 1 as cur_group_id
will start at 1.
library(tidyverse)
start_num <- 1102535
df %>%
group_by(Client) %>%
mutate(ID = cur_group_id() start_num-1)
Output
Client Time ID
<dbl> <dbl> <dbl>
1 1000001 69.0 1102535
2 1000001 39.0 1102535
3 1000001 77.2 1102535
4 1000002 50.3 1102536
5 1000002 72.0 1102536
6 1000003 99.2 1102537
Data
df <- structure(list(Client = c(1000001, 1000001, 1000001, 1000002,
1000002, 1000003), Time = c(69.0152618191205, 39.02626810316,
77.2143005798571, 50.2722249664366, 72.0442323181778, 99.1987033882178
)), class = "data.frame", row.names = c(NA, -6L))