Home > Software design >  Is there a way I can code a data set utilising only the first number assigned to an individual?
Is there a way I can code a data set utilising only the first number assigned to an individual?

Time:06-27

Bassically I am utilising ringing data for a project im working on and I have been asked to assign a unique ID to each individual. The idea is to use the first ringing number the individual was given, unfortuantely the data has been given in a format which makes using this first number hard. It would basically look like this

ID Ring Number Previous ring number
7829 3340
5689 0
2543 6789
3340 2799

So for example despite the latest ring number being 7829, I would want the value 2779 to be used for this unique. I originally tried to have it so if there was a non zero number in the "previous ring number" column the the unique ID would use that. But that doesn't account for an indicidual changing rings multiple times. I would basically like to know is there a way I can use the first ringing number for this unique ID, manually selecting it isn't an option as there are 50,000 entries of data. Im still not too experienced with R so apologies if this is an easy question.

Edit: I have realised a mistake in my attempt to explain how i want the outcome to look like. Essentially a new column (from the unique IDs gathered) added onto this table that would look like this

ID Ring Number Previous ring number Unique ID
7829 3340 2799
5689 0 5689
2543 6789 6789
3340 2799 2799

CodePudding user response:

Thanks to @stefan, we perhaps have a solution.

Your data

# A tibble: 4 × 2
     ID Previous
  <dbl>    <dbl>
1  7829     3340
2  5689        0
3  2543     6789
4  3340     2799

df %>%
  mutate(result =
           case_when(Previous %in% ID ~ Previous[match(Previous, ID)],
                     Previous == 0 ~ ID,
                     TRUE ~ Previous))

Output

# A tibble: 4 × 3
     ID Previous result
  <dbl>    <dbl>  <dbl>
1  7829     3340   2799
2  5689        0   5689
3  2543     6789   6789
4  3340     2799   2799

CodePudding user response:

You could use the neighborhood function from igraph in mode = 'out' to find the ancestor of each ID:

data <- read.table(text="
'ID Ring Number'    'Previous ring number'
7829    3340
5689    0
2543    6789
3340    2799",header=T)

library(igraph)


g <- igraph::graph_from_data_frame(data)

neighbors <- igraph::neighborhood(g,nodes=V(g)[match(data$ID,names(V(g)))],  order=vcount(g), mode="out")

ancestors <- lapply(neighbors,function(x) {x[[length(x)]]$name})

data$`Unique ID` <- ancestors

data
#>   ID.Ring.Number Previous.ring.number Unique ID
#> 1           7829                 3340      2799
#> 2           5689                    0         0
#> 3           2543                 6789      6789
#> 4           3340                 2799      2799
  •  Tags:  
  • r
  • Related