Home > OS >  R: How do I add columns from one dataframe to another?
R: How do I add columns from one dataframe to another?

Time:03-21

The 1st column of the subjects_153_ped dataframe corresponds to the Database_ID column of the ann dataframe. The 5th column of the subjects_153_ped dataframe corresponds to the sex column of the ann dataframe. The 6th column of the subjects_153_ped dataframecorresponds to theProfilecolumn of theann` dataframe.

Here, I pheno is the subset where: (1) 1st column (FID): Database_ID column (from ann clinical table) (2) 2nd column (IID): 1 (hardcoded) (3) 3rd column (PAT): 1 (hardcoded) (4) 4th column (MAT): 1 (hardcoded) (5) 5th column (SEX): sex column (from ann clinical table) (6) 6th column (PHENOTYPE): Profile column (from ann clinical table) (7) Column 7 onwards are info from the original subjects_153_ped dataframe

Desired file formatting: PED file format https://plink.readthedocs.io/en/latest/plink_fmt/ https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format

Expected output

FID IID PAT MAT SEX PHENOTYPE ALL OTHER COLUMNS
AC10 1 1 1 M Schiz. ALL OTHER COLUMNS
AC11 1 1 1 M Schiz. ALL OTHER COLUMNS
AC12 1 1 1 M Schiz. ALL OTHER COLUMNS
AC13 1 1 1 F BP ALL OTHER COLUMNS
> dput(ann[1:6,])
structure(list(Database_ID = c("AC10", "AC11", "AC12", "AC13",
"AC14", "AC15"), Collection_type = c("Array", "Array", "Array",
"Array", "Array", "Array"), Age = c(40L, 51L, 19L, 49L, 48L,
44L), Round_Age = c(40L, 50L, 20L, 50L, 50L, 45L), sex = c("M",
"M", "M", "F", "F", "F"), race = c("White", "White", "White",
"White", "White", "White"), Profile = c("Schiz.", "Schiz.", "Schiz.",
"BP", "BP", "Unaffected control"), DSM_IV = c(295.9, 295.9, 295.9,
296.44, 296.7, NA), Axis_I_Primary_Dx = c("SCHIZ UNDIFF", "SCHIZ UNDIFF",
"SCHIZ UNDIFF", "BP I", "BP I", "CONTROL"), PMI = c(34L, 43L,
28L, 19L, 18L, 28L), Brain_PH = c(6.18, 6.63, 6.73, 5.87, 6.5,
6.59), Left_Brain = c("Fixed", "Frozen", "Frozen", "Fixed", "Frozen",
"Fixed"), Suicide_Status = c("No", "No", "No", "Yes", "No", "No"
), Psychotic_Feature = c("Yes", "Yes", "Yes", "Yes", "Unknown",
"No"), Rate_Of_Death = c("Possible anoxia at death (e.g., CO poisoning or pneumonia)",
"Sudden", "Sudden", "Sudden", "Sudden", "Sudden"), Exacerbation = c("No evidence of exacerbation",
"No evidence of exacerbation", "No evidence of exacerbation",
"Definite", "No evidence of exacerbation", "Unknown"), Smoking_At_TOD = c("Yes",
"Yes", "No", "Yes", "Yes", "Unknown"), Lifetime_Alcohol_Use = c("Moderate drinking in present (same as above, for recent period)",
"Moderate drinking in present (same as above, for recent period)",
"Social (1?2 drinks/day)", "Little or none (less than 1 drink/day)",
"Social (1?2 drinks/day)", "Moderate drinking in present (same as above, for recent period)"
), Lifetime_Drug_Use = c("Moderate drug use in present", "Social",
"Heavy drug use in present", "Little or none", "Moderate drug use in present",
"Little or none"), Lifetime_Antipsychotics = c(75000L, 130000L,
2500L, 4000L, 0L, 0L)), row.names = c("1", "2", "3", "4", "5",
"6"), class = "data.frame")


> dput(subjects_153_ped[1:10,1:10])
structure(list(V1 = c("AC10", "AC11", "AC12", "AC13", "AC14",
"AC15", "AC17", "AC18", "AC19", "AC1"), V2 = c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L), V3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), V4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V5 = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), V6 = c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), V7 = c(2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
2L), V8 = c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L), V9 = c(1L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L), V10 = c(2L, 2L, 2L, 2L,
2L, 2L, 1L, 2L, 2L, 2L)), row.names = c(NA, 10L), class = "data.frame")

CodePudding user response:

Perhaps this helps - select and create new columns in ann with transmute, and left_join with the 'subjects_153_ped' using the 'FID' and 'V1' as by

library(dplyr)
ann %>% 
   transmute(FID = Database_ID, IID = 1, PAT = 1, MAT = 1, 
    SEX = sex, PHENOTYPE = Profile) %>%
   left_join(subjects_153_ped, by = c("FID" = "V1"))
   FID IID PAT MAT SEX          PHENOTYPE V2 V3 V4 V5 V6 V7 V8 V9 V10
1 AC10   1   1   1   M             Schiz.  1  0  0  1  1  2  2  1   2
2 AC11   1   1   1   M             Schiz.  1  0  0  1  1  2  2  1   2
3 AC12   1   1   1   M             Schiz.  1  0  0  1  1  1  1  1   2
4 AC13   1   1   1   F                 BP  1  0  0  1  1  1  2  1   2
5 AC14   1   1   1   F                 BP  1  0  0  1  1  2  2  2   2
6 AC15   1   1   1   F Unaffected control  1  0  0  1  1  2  2  2   2
  • Related