The 1st column of the subjects_153_ped
dataframe corresponds to the Database_ID
column of the ann
dataframe.
The 5th column of the subjects_153_ped
dataframe corresponds to the sex
column of the ann
dataframe.
The 6th column of the subjects_153_ped
dataframecorresponds to the
Profilecolumn of the
ann` dataframe.
Here, I pheno
is the subset where:
(1) 1st column (FID): Database_ID
column (from ann
clinical table)
(2) 2nd column (IID): 1 (hardcoded)
(3) 3rd column (PAT): 1 (hardcoded)
(4) 4th column (MAT): 1 (hardcoded)
(5) 5th column (SEX): sex
column (from ann
clinical table)
(6) 6th column (PHENOTYPE): Profile
column (from ann
clinical table)
(7) Column 7 onwards are info from the original subjects_153_ped
dataframe
Desired file formatting: PED file format https://plink.readthedocs.io/en/latest/plink_fmt/ https://gatk.broadinstitute.org/hc/en-us/articles/360035531972-PED-Pedigree-format
Expected output
FID | IID | PAT | MAT | SEX | PHENOTYPE | ALL | OTHER | COLUMNS |
---|---|---|---|---|---|---|---|---|
AC10 | 1 | 1 | 1 | M | Schiz. | ALL | OTHER | COLUMNS |
AC11 | 1 | 1 | 1 | M | Schiz. | ALL | OTHER | COLUMNS |
AC12 | 1 | 1 | 1 | M | Schiz. | ALL | OTHER | COLUMNS |
AC13 | 1 | 1 | 1 | F | BP | ALL | OTHER | COLUMNS |
> dput(ann[1:6,])
structure(list(Database_ID = c("AC10", "AC11", "AC12", "AC13",
"AC14", "AC15"), Collection_type = c("Array", "Array", "Array",
"Array", "Array", "Array"), Age = c(40L, 51L, 19L, 49L, 48L,
44L), Round_Age = c(40L, 50L, 20L, 50L, 50L, 45L), sex = c("M",
"M", "M", "F", "F", "F"), race = c("White", "White", "White",
"White", "White", "White"), Profile = c("Schiz.", "Schiz.", "Schiz.",
"BP", "BP", "Unaffected control"), DSM_IV = c(295.9, 295.9, 295.9,
296.44, 296.7, NA), Axis_I_Primary_Dx = c("SCHIZ UNDIFF", "SCHIZ UNDIFF",
"SCHIZ UNDIFF", "BP I", "BP I", "CONTROL"), PMI = c(34L, 43L,
28L, 19L, 18L, 28L), Brain_PH = c(6.18, 6.63, 6.73, 5.87, 6.5,
6.59), Left_Brain = c("Fixed", "Frozen", "Frozen", "Fixed", "Frozen",
"Fixed"), Suicide_Status = c("No", "No", "No", "Yes", "No", "No"
), Psychotic_Feature = c("Yes", "Yes", "Yes", "Yes", "Unknown",
"No"), Rate_Of_Death = c("Possible anoxia at death (e.g., CO poisoning or pneumonia)",
"Sudden", "Sudden", "Sudden", "Sudden", "Sudden"), Exacerbation = c("No evidence of exacerbation",
"No evidence of exacerbation", "No evidence of exacerbation",
"Definite", "No evidence of exacerbation", "Unknown"), Smoking_At_TOD = c("Yes",
"Yes", "No", "Yes", "Yes", "Unknown"), Lifetime_Alcohol_Use = c("Moderate drinking in present (same as above, for recent period)",
"Moderate drinking in present (same as above, for recent period)",
"Social (1?2 drinks/day)", "Little or none (less than 1 drink/day)",
"Social (1?2 drinks/day)", "Moderate drinking in present (same as above, for recent period)"
), Lifetime_Drug_Use = c("Moderate drug use in present", "Social",
"Heavy drug use in present", "Little or none", "Moderate drug use in present",
"Little or none"), Lifetime_Antipsychotics = c(75000L, 130000L,
2500L, 4000L, 0L, 0L)), row.names = c("1", "2", "3", "4", "5",
"6"), class = "data.frame")
> dput(subjects_153_ped[1:10,1:10])
structure(list(V1 = c("AC10", "AC11", "AC12", "AC13", "AC14",
"AC15", "AC17", "AC18", "AC19", "AC1"), V2 = c(1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L, 1L), V3 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L,
0L, 0L), V4 = c(0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L, 0L), V5 = c(1L,
1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), V6 = c(1L, 1L, 1L, 1L, 1L,
1L, 1L, 1L, 1L, 1L), V7 = c(2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L,
2L), V8 = c(2L, 2L, 1L, 2L, 2L, 2L, 2L, 1L, 2L, 2L), V9 = c(1L,
1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L, 2L), V10 = c(2L, 2L, 2L, 2L,
2L, 2L, 1L, 2L, 2L, 2L)), row.names = c(NA, 10L), class = "data.frame")
CodePudding user response:
Perhaps this helps - select and create new columns in ann
with transmute
, and left_join
with the 'subjects_153_ped' using the 'FID' and 'V1' as by
library(dplyr)
ann %>%
transmute(FID = Database_ID, IID = 1, PAT = 1, MAT = 1,
SEX = sex, PHENOTYPE = Profile) %>%
left_join(subjects_153_ped, by = c("FID" = "V1"))
FID IID PAT MAT SEX PHENOTYPE V2 V3 V4 V5 V6 V7 V8 V9 V10
1 AC10 1 1 1 M Schiz. 1 0 0 1 1 2 2 1 2
2 AC11 1 1 1 M Schiz. 1 0 0 1 1 2 2 1 2
3 AC12 1 1 1 M Schiz. 1 0 0 1 1 1 1 1 2
4 AC13 1 1 1 F BP 1 0 0 1 1 1 2 1 2
5 AC14 1 1 1 F BP 1 0 0 1 1 2 2 2 2
6 AC15 1 1 1 F Unaffected control 1 0 0 1 1 2 2 2 2