I have this dataframe in R. It has the structure of a pedigree dataframe, with the id
, fid
, mid
and sex
columns.
pedigree <- structure(list(id = c(212, 214, 263, 266, 273, 274, 275, 279,
280, 281, 286, 287, 312, 313, 314, 315, 316, 317, 318, 319, 320,
321, 322, 323, 324, 325, 326, 327, 332, 333, 334, 335, 336, 337,
338, 339, 340, 341, 346, 347, 348, 349, 389, 390, 391, 392, 413,
414, 415, 416, 466, 475, 476, 477, 478, 479, 480, 483, 486, 487,
491, 492, 493, 494, 498, 501, 502, 506, 507, 508, 509, 510, 511,
512, 513, 514, 518, 519, 542, 543, 544, 545, 546, 547, 551, 552,
553, 554, 555, 556, 564, 565, 568, 569, 570, 575, 576, 579, 580,
584, 585, 586, 589, 590, 593, 595, 596, 597, 598, 599, 614, 615,
616, 617, 618, 619, 620, 621, 622, 623, 624, 625, 653, 654, 662,
663, 671, 672, 673, 674, 675, 676, 681, 682, 683, 684, 688, 689,
693, 694, 695, 696, 697, 698, 701, 702, 703, 704, 709, 710, 715,
716, 718, 720, 721, 722, 723, 724, 725, 726, 727, 730, 731, 736,
737, 738, 739, 740, 744, 745, 842, 843, 874, 875, 884, 885, 886,
887, 889, 890, 894, 895, 896, 897, 898, 903, 905, 906, 907, 908,
909, 910, 911, 912, 913, 914, 915, 917, 925, 926, 927, 928, 929,
931, 932, 936, 965, 999, 1000, 1006, 1007, 1041, 1043, 1044,
1046, 1068, 1069, 1070, 1071, 1072, 1073, 1074, 1075, 1099, 1100,
1101, 1321, 1322, 1368, 1551, 1552, 1553, 1554, 1555), fid = c(0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 326, 326, 326, 326, 279, 320, 320, 320, 320, 320, 320,
320, 320, 320, 324, 324, 324, 324, 322, 322, 322, 324, 324, 324,
324, 324, 324, 324, 324, 324, 318, 318, 326, 326, 326, 326, 326,
326, 326, 326, 326, 326, 326, 326, 332, 332, 287, 287, 287, 287,
287, 286, 286, 346, 346, 346, 348, 348, 348, 326, 326, 326, 326,
326, 332, 332, 320, 320, 320, 320, 320, 287, 346, 346, 346, 346,
273, 273, 273, 273, 266, 334, 334, 334, 334, 334, 336, 336, 336,
336, 336, 336, 334, 334, 334, 334, 334, 334, 338, 338, 338, 338,
340, 340, 340, 338, 338, 334, 334, 334, 334, 334, 334, 334, 334,
314, 314, 314, 314, 314, 314, 314, 312, 312, 0, 0, 286, 286,
314, 314, 314, 314, 314, 314, 334, 334, 334, 334, 334, 389, 389,
389, 389, 389, 389, 389, 389, 389, 389, 389, 389, 338, 332, 332,
332, 332, 332, 332, 332, 346, 274, 391, 391, 391, 391, 0, 0,
0, 0, 316, 316, 316, 316, 316, 316, 316, 316, 842, 842, 842,
1041, 1041, 1041, 1043, 1043, 1043, 1043, 1043), mid = c(0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 327, 327, 327, 327, 275, 321, 321, 321, 321, 321, 321,
321, 321, 321, 325, 325, 325, 325, 323, 323, 323, 325, 325, 325,
325, 325, 325, 325, 325, 325, 319, 319, 327, 327, 327, 327, 327,
327, 327, 327, 327, 327, 327, 327, 333, 333, 212, 212, 212, 212,
212, 214, 214, 347, 347, 347, 349, 349, 349, 327, 327, 327, 327,
327, 333, 333, 321, 321, 321, 321, 321, 212, 347, 347, 347, 347,
281, 281, 281, 281, 263, 335, 335, 335, 335, 335, 337, 337, 337,
337, 337, 337, 335, 335, 335, 335, 335, 335, 339, 339, 339, 339,
341, 341, 341, 339, 339, 335, 335, 335, 335, 335, 335, 335, 335,
315, 315, 315, 315, 315, 315, 315, 313, 313, 0, 0, 214, 214,
315, 315, 315, 315, 315, 315, 335, 335, 335, 335, 335, 390, 390,
390, 390, 390, 390, 390, 390, 390, 390, 390, 390, 339, 333, 333,
333, 333, 333, 333, 333, 347, 280, 392, 392, 392, 392, 0, 0,
0, 0, 317, 317, 317, 317, 317, 317, 317, 317, 843, 843, 843,
1044, 1044, 1044, 1046, 1046, 1046, 1046, 1046), sex = structure(c(1L,
1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L,
1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 1L, 2L, 1L,
2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L,
2L, 2L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 2L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 1L, 2L,
2L, 2L, 2L, 2L, 2L, 1L, 2L, 1L, 1L, 1L, 2L, 2L, 2L, 1L, 1L, 1L,
1L, 1L, 2L, 2L, 2L, 1L, 1L, 2L, 2L, 1L, 1L, 2L, 2L, 2L, 1L, 2L,
2L, 2L, 2L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L,
2L, 2L, 2L, 1L, 2L, 1L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L,
1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L,
1L, 2L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 1L, 2L, 2L, 2L,
1L, 1L, 2L, 2L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 1L, 1L,
2L, 1L, 1L, 1L, 2L, 2L, 1L, 1L, 1L), levels = c("1", "2"), class = "factor")), row.names = c(NA,
-234L), class = c("tbl_df", "tbl", "data.frame"))
I am trying to do a pedigree analysis by using pedtools
.
In order to convert this dataframe into a ped object, I use this as.ped(pedigree)
function.
However, I see this malformed pedigree information:
as.ped(pedigree)
Error: Malformed pedigree.
Individual 287 is female, but appear as the father of 568
Individual 212 is male, but appear as the mother of 568
I checked the ids 568
, 287
and 212
, but everything is properly assigned. This means that 287
is the mother of 568
(it is included in fid
) and similarly with 287
.
As a convention, 1
refers to males and 2
to females.
What might be happening?
CodePudding user response:
I checked the ids 568, 287 and 212, but everything is properly assigned. This means that 287is the mother of 568 (it is included in fid) and similarly with 287.
Looking at your dataset, the record for 568 states
A tibble: 1 x 4
id fid mid sex
<dbl> <dbl> <dbl> <fct>
1 568 287 212 1
287 is in the fid column, not the mid column as you state. There is an error somewhere in the data (either fid and mid have been switched here, or the sex value of 287 and 212 have been swapped)
Edit: On further inspection, several records indicate 287 as the father and 212 as the mother, specifically:
# A tibble: 6 x 4
id fid mid sex
<dbl> <dbl> <dbl> <fct>
1 568 287 212 1
2 569 287 212 1
3 570 287 212 2
4 575 287 212 1
5 576 287 212 2
6 621 287 212 2
This may indicate the sex values for 287 and 212 are incorrect (rather than fid and mid being swapped across several records), but you will need to examine your data source (or processing pipeline) to confirm
CodePudding user response:
The problem is that males (1) are assigned as mothers (2) and females are assigned as fathers. R only returns the error for the first case it evaluates.
You can rename using colnames and then run the code:
colnames(pedigree) = c("id", "mid", "fid", "sex")
as.ped(pedigree)
You can change the name in the df directly too.