How can I generate unique ID numbers for each observation to match across panel data-CodePudding

I have a several dataframes containing different information about the same individuals which I would like to merge.

N = 1000
district <- rnorm(N)
region <- rnorm(N)
household_id <- rnorm(N) 
works <- rbinom(n = N, size = 1, prob = 0.05)
total_income <- rnorm(N) 
total_expenditure <- rnorm(N) 

df1 <- data.frame(district, region, household_id, works)
df2 <- data.frame(district, region, household_id, total_income)
df2 <- data.frame(district, region, household_id, total_expenditure)

Is there a way I can use the group_indices function, or any other way to generate unique ID number for each observation in order to match individuals in df1 with those in df2 and df3?

CodePudding user response：

To match the same rows with the same distric, region, and household_id, just make another variable, id for example, with:

df1$id <- paste0(df1$district, df1$region, df$household_id)

for each df and then merge them by this new id.

CodePudding user response：

I modified slightly your example: I guess your identifiers are the combination of district, region, and household_id. I give these variable categorical values:

district <- sample(letters,N,replace = T)
region <- sample(letters,N,replace = T)
household_id <- sample(letters,N,replace = T)

The easiest way: you define the table of all unique identifiers:

df <- unique(do.call(rbind,
               list(df1[,c("district", "region", "household_id")],
                    df2[,c("district", "region", "household_id")],
                    df3[,c("district", "region", "household_id")]))
       )

You can then create a unique ID:

df_corresp <- df %>% 
  mutate(ID = as.numeric(as.factor(paste0(district,region,household_id))))

That you can merge back in your original data frames:

df1 <- left_join(df1,df_corresp,by = c("district", "region", "household_id"))
df2 <- left_join(df2,df_corresp,by = c("district", "region", "household_id"))
df3 <- left_join(df3,df_corresp,by = c("district", "region", "household_id"))