Home > front end >  Convert a dataframe into a ped object in R (pedtools)
Convert a dataframe into a ped object in R (pedtools)

Time:11-09

I have this dataframe in R. It has the structure of a pedigree dataframe, with the id, fid, mid and sex columns.

pedigree <- structure(list(id = c(212, 214, 263, 266, 273, 274, 275, 279, 
280, 281, 286, 287, 312, 313, 314, 315, 316, 317, 318, 319, 320, 
321, 322, 323, 324, 325, 326, 327, 332, 333, 334, 335, 336, 337, 
338, 339, 340, 341, 346, 347, 348, 349, 389, 390, 391, 392, 413, 
414, 415, 416, 466, 475, 476, 477, 478, 479, 480, 483, 486, 487, 
491, 492, 493, 494, 498, 501, 502, 506, 507, 508, 509, 510, 511, 
512, 513, 514, 518, 519, 542, 543, 544, 545, 546, 547, 551, 552, 
553, 554, 555, 556, 564, 565, 568, 569, 570, 575, 576, 579, 580, 
584, 585, 586, 589, 590, 593, 595, 596, 597, 598, 599, 614, 615, 
616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 653, 654, 662, 
663, 671, 672, 673, 674, 675, 676, 681, 682, 683, 684, 688, 689, 
693, 694, 695, 696, 697, 698, 701, 702, 703, 704, 709, 710, 715, 
716, 718, 720, 721, 722, 723, 724, 725, 726, 727, 730, 731, 736, 
737, 738, 739, 740, 744, 745, 842, 843, 874, 875, 884, 885, 886, 
887, 889, 890, 894, 895, 896, 897, 898, 903, 905, 906, 907, 908, 
909, 910, 911, 912, 913, 914, 915, 917, 925, 926, 927, 928, 929, 
931, 932, 936, 965, 999, 1000, 1006, 1007, 1041, 1043, 1044, 
1046, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1099, 1100, 
1101, 1321, 1322, 1368, 1551, 1552, 1553, 1554, 1555), fid = c(0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 326, 326, 326, 326, 279, 320, 320, 320, 320, 320, 320, 
320, 320, 320, 324, 324, 324, 324, 322, 322, 322, 324, 324, 324, 
324, 324, 324, 324, 324, 324, 318, 318, 326, 326, 326, 326, 326, 
326, 326, 326, 326, 326, 326, 326, 332, 332, 287, 287, 287, 287, 
287, 286, 286, 346, 346, 346, 348, 348, 348, 326, 326, 326, 326, 
326, 332, 332, 320, 320, 320, 320, 320, 287, 346, 346, 346, 346, 
273, 273, 273, 273, 266, 334, 334, 334, 334, 334, 336, 336, 336, 
336, 336, 336, 334, 334, 334, 334, 334, 334, 338, 338, 338, 338, 
340, 340, 340, 338, 338, 334, 334, 334, 334, 334, 334, 334, 334, 
314, 314, 314, 314, 314, 314, 314, 312, 312, 0, 0, 286, 286, 
314, 314, 314, 314, 314, 314, 334, 334, 334, 334, 334, 389, 389, 
389, 389, 389, 389, 389, 389, 389, 389, 389, 389, 338, 332, 332, 
332, 332, 332, 332, 332, 346, 274, 391, 391, 391, 391, 0, 0, 
0, 0, 316, 316, 316, 316, 316, 316, 316, 316, 842, 842, 842, 
1041, 1041, 1041, 1043, 1043, 1043, 1043, 1043), mid = c(0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 
0, 0, 327, 327, 327, 327, 275, 321, 321, 321, 321, 321, 321, 
321, 321, 321, 325, 325, 325, 325, 323, 323, 323, 325, 325, 325, 
325, 325, 325, 325, 325, 325, 319, 319, 327, 327, 327, 327, 327, 
327, 327, 327, 327, 327, 327, 327, 333, 333, 212, 212, 212, 212, 
212, 214, 214, 347, 347, 347, 349, 349, 349, 327, 327, 327, 327, 
327, 333, 333, 321, 321, 321, 321, 321, 212, 347, 347, 347, 347, 
281, 281, 281, 281, 263, 335, 335, 335, 335, 335, 337, 337, 337, 
337, 337, 337, 335, 335, 335, 335, 335, 335, 339, 339, 339, 339, 
341, 341, 341, 339, 339, 335, 335, 335, 335, 335, 335, 335, 335, 
315, 315, 315, 315, 315, 315, 315, 313, 313, 0, 0, 214, 214, 
315, 315, 315, 315, 315, 315, 335, 335, 335, 335, 335, 390, 390, 
390, 390, 390, 390, 390, 390, 390, 390, 390, 390, 339, 333, 333, 
333, 333, 333, 333, 333, 347, 280, 392, 392, 392, 392, 0, 0, 
0, 0, 317, 317, 317, 317, 317, 317, 317, 317, 843, 843, 843, 
1044, 1044, 1044, 1046, 1046, 1046, 1046, 1046), sex = structure(c(1L, 
1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 
2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L, 
2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L, 
1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L, 
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 
2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L), levels = c("1", "2"), class = "factor")), row.names = c(NA, 
-234L), class = c("tbl_df", "tbl", "data.frame"))

I am trying to do a pedigree analysis by using pedtools.

In order to convert this dataframe into a ped object, I use this as.ped(pedigree) function.

However, I see this malformed pedigree information:

as.ped(pedigree)
Error: Malformed pedigree.
 Individual 287 is female, but appear as the father of 568
 Individual 212 is male, but appear as the mother of 568

I checked the ids 568, 287 and 212, but everything is properly assigned. This means that 287is the mother of 568 (it is included in fid) and similarly with 287.

As a convention, 1 refers to males and 2 to females.

What might be happening?

CodePudding user response:

I checked the ids 568, 287 and 212, but everything is properly assigned. This means that 287is the mother of 568 (it is included in fid) and similarly with 287.

Looking at your dataset, the record for 568 states

  A tibble: 1 x 4
     id   fid   mid sex  
  <dbl> <dbl> <dbl> <fct>
1   568   287   212 1 

287 is in the fid column, not the mid column as you state. There is an error somewhere in the data (either fid and mid have been switched here, or the sex value of 287 and 212 have been swapped)

Edit: On further inspection, several records indicate 287 as the father and 212 as the mother, specifically:

# A tibble: 6 x 4
     id   fid   mid sex  
  <dbl> <dbl> <dbl> <fct>
1   568   287   212 1    
2   569   287   212 1    
3   570   287   212 2    
4   575   287   212 1    
5   576   287   212 2    
6   621   287   212 2   

This may indicate the sex values for 287 and 212 are incorrect (rather than fid and mid being swapped across several records), but you will need to examine your data source (or processing pipeline) to confirm

CodePudding user response:

The problem is that males (1) are assigned as mothers (2) and females are assigned as fathers. R only returns the error for the first case it evaluates.

You can rename using colnames and then run the code:

colnames(pedigree) = c("id", "mid", "fid", "sex")
as.ped(pedigree)

You can change the name in the df directly too.

  • Related