Home > Software design >  How to create an ID column for duplicate rows based on data from another column?
How to create an ID column for duplicate rows based on data from another column?

Time:06-22

I have a dataset that looks like this:

  Study_ID   ear
1      100  Left
2      100 Right
3      200  Left
4      200 Right
5      300  Left
6      300 Right

Where every patient is duplicated once (Study_ID appears twice), for each of their ears (left, right). I want to create a new variable that identifies which row is for the left ear, and which is for the right.

My desired output would look like this:

  Study_ID   ear ear_ID
1      100  Left  100_L
2      100 Right  100_R
3      200  Left  200_L
4      200 Right  200_R
5      300  Left  300_L
6      300 Right  300_R

Where the first part of the variable is the study ID, and the second part of the variable is 'L' or 'R' for left or right ear.

How can I go about doing this?

Reproducible Data:

data<-data.frame(Study_ID=c("100","100","200","200","300","300"),ear=c("Left","Right","Left","Right","Left","Right"))

CodePudding user response:

transform(data, ear_ID =paste(Study_ID, substr(ear, 1, 1), sep='_'))

  Study_ID   ear ear_ID
1      100  Left  100_L
2      100 Right  100_R
3      200  Left  200_L
4      200 Right  200_R
5      300  Left  300_L
6      300 Right  300_R

Note that tidyverse, you can just group by both the two columns and each will be considered unique identifier of the ear

  • Related