been wracking my head trying to replicate a solution to a sql exercise on pandas. the exercise contains creating 1 percentile bins using the NTILE function in order to calculate some metrics.
source Dset looks like this and the percentile i want to divide by is the measure_value column :
this is what i want to get to (first create the "Ntile" equivalent and then group by it, agging with min max and count).
tried using qcut but encountered an error because of similar values.
code attempted :
cuts = 100
weightdf = healthdf[healthdf["measure"] == "weight"].copy()
weightdf["Ntile"] = pd.qcut(weightdf["measure_value"].sort_values(), cuts , labels = range(1,
cuts 1))
this created a " Bin edges must be unique" error, changing the duplicates argument didnt work. hopefully someone can help, i'm certain there is a way to solve it. thanks!
edit target column output in response to david kaftan's request
{0: 46.03959,
7: 129.060012817,
84: 114.758776,
97: 39.916096,
132: 114.8494944,
133: 69.39963261,
139: 73.93555631,
143: 73.93555631,
148: 75.069537235,
149: 68.0388,
177: 74.389088,
181: 34.926582,
188: 115.1670088,
189: 75.29633342,
202: 115.9381152,
212: 34.0194,
214: 89.819007874,
216: 91.295005798,
218: 65.317248,
220: 80.28584949,
222: 123.1,
223: 123.6,
224: 123.0,
225: 122.7,
229: 122.7,
230: 121.0,
231: 121.7,
243: 64.9543744,
244: 65.0450928,
257: 38.55532,
261: 64.6822192,
265: 79.151868565,
267: 79.151868565,
271: 93.0,
272: 91.4,
279: 63.50288,
282: 128.820128,
301: 36.28736,
304: 63.049288,
346: 65.77084,
369: 119.8390064,
388: 64.863656,
396: 78.698276195,
400: 79.151868565,
401: 79.151868565,
415: 78.698276195,
416: 78.698276195,
419: 78.92507238,
420: 78.47148001,
424: 78.698276195,
463: 63.50288,
478: 88.7225952,
491: 116.6638624,
499: 119.9,
501: 115.0309312,
505: 102.96547699,
512: 65.1358112,
518: 78.698276195,
519: 78.698276195,
520: 78.698280334,
532: 70.6696336,
534: 69.39963261,
540: 75.8,
558: 63.50288,
574: 94.0,
590: 79.151868565,
610: 94.5,
614: 63.50288,
631: 57.6,
659: 63.50288,
674: 38.101727,
676: 108.86208,
717: 70.0,
756: 125.917145,
780: 118.6596672,
865: 79.83225712,
872: 80.73944186,
925: 42.63765,
946: 68.12952,
1007: 68.0388,
1066: 74.389088,
1067: 74.389088,
1068: 74.389088,
1069: 74.389088,
1070: 74.389088,
1131: 69.5,
1164: 92.532768,
1168: 78.698212,
1172: 64.410064,
1255: 79.832192,
1267: 65.77084,
1271: 65.317248,
1272: 65.77084,
1273: 65.77084,
1282: 65.77084,
1291: 65.317248,
1300: 66.224432,
1314: 121.750007629,
1315: 119.1,
1318: 122.900009155,
1385: 69.5,
1387: 110.5,
1496: 40.36969,
1536: 119.1,
1541: 122.750007629,
1543: 122.850006104,
1642: 91.1,
1643: 90.4,
1671: 57.6,
1699: 74.615884,
1700: 75.069476,
1779: 66.678024,
1784: 52.7527496,
1789: 105.46014,
1790: 68.9913432,
1791: 129.071,
1792: 128.935012817,
1793: 68.49244787,
1794: 68.49244787,
1799: 65.77084,
1852: 68.8099064,
1905: 59.42055,
1981: 64.863656,
1995: 92.079176,
1998: 74.0262144,
1999: 73.3458264,
2000: 73.481904,
2018: 31.3885664,
2019: 31.0256928,
2022: 65.317248,
2027: 136.036010742,
2030: 65.77084,
2040: 84.368112,
2108: 42.184055,
2139: 38.1,
2287: 124.100006104,
2293: 123.350006104,
2297: 119.1,
2298: 119.1,
2341: 119.1,
2356: 97.965065785,
2358: 98.127905446,
2438: 85.27536556,
2494: 69.12742,
2496: 68.40167,
2497: 67.67593,
2540: 96.61517481,
2542: 96.433737862,
2546: 96.705893284,
2599: 173.725736,
2607: 75.0,
2654: 80.0,
2696: 53.0,
2700: 69.2181392,
2703: 68.5831104,
2873: 70.30676,
2879: 73.481904,
2881: 31.3432072,
2882: 65.77084,
2886: 66.224432,
2892: 65.77084,
2914: 38.101728,
2930: 134.263232,
2947: 72.0,
2951: 99.79024,
2981: 70.0,
2986: 109.769264,
2987: 29.029888,
2994: 15.422128,
2995: 27.21552,
3005: 29.48348,
3006: 58.96696,
3010: 32.658623,
3032: 101.198958525,
3037: 136.531192,
3039: 32.205032,
3050: 25.854744,
3064: 85.0,
3065: 84.0,
3069: 96.070863966,
3071: 96.433737862,
3072: 95.889427018,
3076: 95.70799007,
3079: 97.159485654,
3087: 100.243832,
3088: 82.100152,
3093: 81.64656,
3097: 102.798948211,
3107: 20.865232,
3109: 93.439952,
3110: 78.4,
3111: 102.421074,
3149: 128.504013062,
3150: 129.27382,
3151: 128.50272,
3152: 130.40192,
3153: 130.402008057,
3154: 129.288009644,
3164: 102.398950789,
3172: 97.0,
3179: 109.4063904,
3185: 83.91452,
3214: 68.492392,
3216: 108.0909736,
3221: 66.224432,
3226: 100.243832,
3257: 42.184055,
3286: 75.6137864,
3310: 45.812794,
3313: 44.452015,
3319: 40.82328,
3321: 97.0,
3326: 97.25,
3330: 41.27687,
3332: 42.63765,
3376: 44.90561,
3397: 68.49244787,
3398: 68.49244787,
3406: 68.49244787,
3408: 68.49244787,
3409: 68.49244787,
3411: 68.49244787,
3413: 68.49244787,
3415: 122.4,
3479: 112.490816,
3515: 34.926582,
3527: 76.430314345,
3539: 119.0,
3541: 118.0,
3547: 76.20351816,
3556: 64.2286272,
3573: 68.49244787,
3579: 130.38014,
3580: 131.43384,
3581: 129.844,
3586: 128.302001953,
3588: 131.434005737,
3589: 130.542007446,
3591: 130.380004883,
3594: 129.762008667,
3595: 130.54207,
3597: 129.86485,
3598: 130.08621,
3599: 130.086013794,
3600: 130.539001465,
3601: 129.865005493,
3604: 130.5389,
3605: 131.562011719,
3606: 131.42204,
3607: 133.11394,
3608: 133.114013672,
3613: 130.89587,
3614: 131.154006958,
3616: 134.708007812,
3617: 134.350006104,
3618: 134.35,
3622: 76.430314345,
3624: 64.863656,
3625: 76.883906715,
3636: 65.0450928,
3637: 76.883906715,
3638: 76.883906715,
3648: 78.017824,
3658: 94.982239509,
3670: 133.53804,
3672: 67.58526313,
3673: 67.58526313,
3676: 67.58526313,
3678: 133.76485,
3681: 87.3,
3685: 132.736,
3686: 133.222000122,
3687: 132.73600769,
3690: 132.866,
3691: 131.011,
3692: 132.1251,
3693: 132.78418,
3694: 131.93187,
3695: 132.125,
3696: 131.998001099,
3697: 131.523,
3698: 131.992,
3706: 64.1379088,
3723: 122.197685,
3768: 64.9543744,
3770: 67.58526313,
3786: 62.142104,
3788: 62.595696,
3792: 62.142104,
3794: 62.595696,
3798: 62.142104,
3801: 62.142104,
3802: 38.55532,
3809: 96.524456336,
3810: 96.343016619,
3811: 96.343016619,
3812: 96.343019388,
3813: 96.343016619,
3838: 64.5915008,
3846: 98.610978469,
3848: 118.387512,
3858: 65.1358112,
3859: 64.863656,
3860: 65.0450928,
3861: 65.0450928,
3862: 64.6822192,
3868: 67.58526313,
3875: 98.610978469,
3876: 98.610978469,
3877: 124.5,
3878: 123.4,
3881: 64.863656,
3889: 96.52445772,
3891: 79.151804,
3915: 122.6,
3916: 122.6,
3922: 31.297848,
3923: 31.3885664,
3924: 31.3432072,
3926: 65.0450928,
3935: 79.151868565,
3936: 79.151868565,
3939: 56.3,
3952: 62.595696,
3957: 62.142104,
3962: 61.688512,
3965: 63.049288,
3966: 62.595696,
3968: 62.595696,
3972: 61.688512,
3973: 62.142104,
3977: 64.5007824,
3978: 64.5915008,
3980: 64.863656,
3990: 38.55532,
4004: 121.9,
4056: 64.863656,
4106: 35.380177,
4113: 38.55532,
4115: 92.98636,
4119: 62.142104,
4120: 63.049288,
4121: 123.2863056,
4138: 64.6822192,
4141: 64.9543744,
4179: 56.2,
4182: 95.85,
4888: 34.926582,
4891: 131.995272,
4897: 74.84268,
4899: 36.74095,
4901: 37.194546,
4915: 37.194546,
4919: 99.427447504,
4959: 64.5915008,
4960: 65.0450928,
4966: 65.0450928,
4967: 80.28584949,
4971: 65.317248,
4973: 64.6822192,
4975: 121.4719376,
4990: 54.099960447,
4992: 35.833767,
4995: 36.74095,
4997: 34.0194,
4998: 62.595696,
5006: 65.317248,
5008: 87.6340445,
5009: 87.724761589,
5060: 75.1601944,
5083: 74.38914868,
5102: 114.985572,
5157: 69.39963261,
5158: 74.162292,
5160: 75.29633342,
5189: 115.439164,
5199: 73.935496,
5209: 67.222389234,
5229: 47.62716,
5234: 69.39963261,
5427: 66.5,
5479: 62.595696,
5497: 98.157388868,
5506: 80.28584949,
5508: 65.1358112,
5515: 80.28584949,
5533: 64.863656,
5548: 78.698212,
5551: 78.698276195,
5572: 65.317248,
5606: 57.6,
5607: 65.317248,
5636: 63.956472,
5641: 65.317248,
5642: 65.77084,
5685: 85.0,
5712: 128.739013672,
5714: 130.042999268,
5730: 73.481904,
5757: 64.6822192,
5760: 87.089664,
5792: 64.6822192,
5796: 89.6297792,
5797: 32.4771872,
5798: 64.410064,
5804: 125.463554,
5816: 87.0,
5824: 65.1358112,
5865: 84.6,
5874: 61.6,
5907: 75.5,
5930: 39.00891,
5936: 33.112216,
5939: 63.50288,
5943: 64.410064,
5946: 64.410064,
5951: 63.049288,
5971: 88.722670341,
6019: 130.430999756,
6029: 89.176262711,
6030: 89.176262711,
6031: 90.3555264,
6032: 90.35559872,
6033: 89.176262711,
6035: 89.176262711,
6037: 38.101727,
6054: 69.8,
6055: 68.7,
6062: 69.39963261,
6074: 67.222389234,
6137: 65.317248,
6139: 69.4,
6141: 89.1761872,
6164: 69.5,
6205: 97.29549,
6225: 88.994822994,
6269: 103.237632751,
6286: 103.963378906,
6301: 117.5710464,
6376: 79.832260132,
6377: 79.83225712,
6378: 79.83225712,
6380: 79.83225712,
6381: 79.832192,
6383: 79.83225712,
6386: 86.4,
6392: 79.83225712,
6395: 79.832192,
6411: 79.83225712,
6413: 79.83225712,
6416: 80.059053305,
6418: 80.059053305,
6421: 80.059053305,
6422: 80.059053305,
6433: 79.605460935,
6437: 79.83225712,
6443: 79.605460935,
6444: 79.151868565,
6445: 79.151868565,
6446: 79.151868565,
6448: 80.73944186,
6472: 41.730465,
6519: 74.389088,
6520: 73.481904,
6561: 86.6,
6648: 65.317248,
6652: 65.317248,
6653: 65.77084,
6682: 103.056190491,
6699: 84.0,
6740: 112.0,
6743: 110.500087415,
6798: 79.151868565,
6801: 120.4740352,
6805: 40.36969,
6808: 134.889007568,
6817: 79.605396,
6838: 75.3,
6845: 53.98,
6911: 65.77084,
6958: 38.55532,
6980: 73.481904,
6983: 99.79024,
7021: 80.013691299,
7024: 52.163082,
7027: 48.534344,
7383: 8.0,
7411: 39.916096,
7477: 170.5506,
7489: 37.648136,
7510: 116.119552,
7517: 65.317248,
7522: 81.0,
7525: 65.77084,
7528: 65.997636,
7535: 65.77084,
7537: 65.317248,
7573: 123.450004578,
7578: 124.850006104,
7579: 122.250007629,
7581: 121.350006104,
7582: 119.1,
7583: 119.1,
7758: 77.11064,
7759: 77.11064,
7765: 39.00891,
7787: 80.013691299,
7788: 80.013691299,
7793: 31.75144,
7796: 63.50288,
7801: 105.8230136,
7806: 108.0,
7808: 52.163082,
7818: 52.16308,
7822: 70.0,
7824: 93.893544,
7836: 98.338825816,
7837: 97.703796498,
7838: 76.657048,
7847: 119.9297248,
7850: 94.0,
7852: 96.705893284,
7860: 102.6,
7863: 70.0,
7866: 66.0,
7871: 37.194546,
7873: 62.142104,
7874: 77.9,
7875: 86.2,
7876: 86.2,
7878: 86.3,
7885: 29.48348,
7886: 99.79024,
7898: 117.026736,
7912: 84.0,
7916: 99.79024,
7918: 58.96696,
7919: 76.20345,
7920: 72.57472,
7923: 78.0,
7932: 103.0,
7941: 99.79024,
7942: 71.213944,
7944: 70.7,
7950: 121.0183456,
7955: 54.0,
7958: 43.998425,
7975: 75.296272,
7992: 29.48348,
7998: 101.604608,
7999: 59.0,
8003: 99.79024,
8009: 42.63765,
8022: 83.007336,
8027: 72.57472,
8035: 106.4126832,
8044: 113.85168487,
8049: 82.0,
8052: 107.501305,
8055: 107.047712,
8057: 52.16308,
8064: 34.0194,
8065: 83.0,
8066: 69.5810128,
8072: 157.5778608,
8083: 91.625584,
8094: 84.821704,
8095: 107.3198672,
8123: 57.424794042,
8128: 120.20188,
8131: 99.0,
8137: 93.893544,
8139: 79.3786,
8140: 49.89512,
8143: 108.4992064,
8144: 108.9074392,
8154: 108.9074392,
8160: 86.18248,
8163: 92.079176,
8168: 65.77084,
8197: 101.698953395,
8199: 79.83219,
8234: 130.52802,
8235: 120.4,
8244: 75.29633342,
8248: 75.29633342,
8249: 75.523068,
8252: 75.976721975,
8253: 99.335006714,
8276: 64.0471904,
8277: 64.2286272,
8281: 64.5915008,
8284: 64.410064,
8285: 65.1358112,
8377: 75.29633342,
8378: 79.832192,
8379: 68.49244787,
8384: 68.49244787,
8385: 68.49244787,
8391: 68.49244787,
8396: 75.29633342,
8398: 75.29633342,
8399: 74.84274105,
8401: 74.84274105,
8402: 73.93555631,
8412: 75.296272,
8429: 48.080753,
8445: 42.63765,
8449: 129.926010132,
8455: 130.01701355,
8468: 73.93555631,
8478: 68.49244787,
8480: 68.49244787,
8484: 68.49244787,
8485: 68.49244787,
8507: 121.3,
8509: 124.2,
8510: 124.6,
8515: 75.296272,
8516: 75.29633342,
8519: 75.069476,
8523: 74.615944865,
8524: 74.162352495,
8532: 74.162352495,
8533: 73.93555631,
8534: 74.162352495,
8535: 74.162292,
8536: 75.29633342,
8538: 74.162292,
8539: 74.84274105,
8556: 68.49244787,
8557: 68.49244787,
8560: 68.49244787,
8574: 75.74992579,
8591: 98.361051399,
8609: 76.20351816,
8610: 64.863656,
8611: 100.697426,
8613: 68.49244787,
8630: 68.49244787,
8631: 68.49244787,
8634: 68.49244787,
8637: 68.49244787,
8638: 90.352005005,
8639: 91.86100769,
8645: 64.1379088,
8651: 68.49244787,
8654: 68.49244787,
8656: 68.49244787,
8657: 68.49244787,
8659: 68.49244787,
8660: 68.49244787,
8666: 68.49244787,
8671: 68.49244787,
8673: 68.49244787,
8675: 68.49244787,
8677: 68.49244787,
8679: 68.49244787,
8687: 118.8,
8695: 119.9,
8711: 44.452015,
8718: 63.956472,
8720: 64.183268,
8722: 64.2286272,
8723: 76.20351816,
8725: 76.20351816,
8734: 88.178353959,
8735: 87.18045213,
8737: 87.724761589,
8738: 87.724761589,
8740: 87.724761589,
8746: 76.430314345,
8771: 68.49244787,
8774: 76.883906715,
8796: 74.5705248,
8797: 74.5705248,
8799: 68.49244787,
8800: 68.49244787,
8809: 68.49244787,
8818: 94.0,
8824: 76.20351816,
8835: 97.522475529,
8836: 65.77084,
8843: 63.50288,
8845: 75.523129605,
8848: 75.749864,
8852: 77.791091455,
8873: 77.337499085,
8874: 77.337499085,
8886: 64.5915008,
8887: 64.5915008,
8888: 64.1379088,
8891: 133.135009766,
8892: 96.25229953,
8894: 95.980145492,
8895: 112.490816,
8897: 62.595696,
8915: 133.765,
8922: 62.595696,
8946: 63.50288,
8947: 63.50288,
8957: 34.472992,
8959: 39.00891,
8973: 87.54332741,
8978: 67.58526313,
8980: 67.58526313,
8983: 67.58526313,
8984: 67.58526313,
8987: 121.109064,
8994: 77.337499085,
9005: 131.523010254,
9016: 40.36969,
9017: 78.01788764,
9025: 77.791091455,
9037: 126.0,
9038: 126.0,
9039: 124.6,
9042: 123.4,
9054: 122.6,
9057: 78.47148001,
9063: 98.610978469,
9064: 98.610978469,
9066: 98.610978469,
9067: 98.610978469,
9068: 98.610978469,
9069: 98.610978469,
9078: 64.5915008,
9088: 78.92507238,
9089: 79.83225712,
9109: 112.490816,
9110: 113.398,
9113: 38.101727,
9119: 62.142104,
9120: 78.698276195,
9132: 118.387512,
9138: 67.58526313,
9142: 67.58526313,
9151: 36.74095,
9153: 62.595696,
9158: 121.3,
9159: 121.9,
9167: 79.605460935,
9172: 79.37866475,
9181: 65.0450928,
9221: 64.5915008,
9223: 38.101727,
9242: 56.0,
9255: 65.317248,
9259: 122.9,
9260: 37.648136,
9261: 79.3786,
9264: 82.55374,
9283: 39.00891,
9289: 101.199996948,
9313: 90.89100647,
9314: 90.73500061,
9323: 64.5007824,
9362: 85.0,
9368: 54.099960447,
9371: 79.151868565,
9372: 80.285784,
9383: 90.303001404,
9386: 39.916096,
9388: 89.759002686,
9390: 90.211006165,
9391: 35.380177,
9408: 39.916096,
9421: 36.74095,
9450: 39.462505,
9452: 123.0,
9453: 123.6,
9454: 124.0,
9458: 123.1,
9459: 121.0,
9462: 122.2,
9473: 67.7,
9475: 75.296272,
9501: 68.719244055,
9571: 66.224432,
9572: 127.136009216,
9587: 67.222389234,
9594: 74.84274105,
9615: 68.719244055,
9616: 118.6,
9633: 114.985572,
9665: 87.6340445,
9667: 87.724761589,
9681: 73.93555631,
9702: 41.27687,
9707: 39.00891,
9718: 54.0,
9720: 54.099960447,
9731: 80.28584949,
9739: 54.099960447,
9741: 64.5007824,
9744: 34.926582,
9748: 64.5007824,
9749: 121.5,
9758: 68.719244055,
9801: 63.50288,
9803: 63.50288,
9809: 78.92507238,
9813: 114.7134168,
9816: 69.39963261,
9820: 75.069476,
9821: 74.615944865,
9826: 74.615944865,
9829: 75.29633342,
9837: 70.6696336,
9838: 36.28736,
9867: 65.317248,
9915: 64.410064,
9918: 63.50288,
9919: 63.50288,
9923: 65.317248,
9933: 79.151868565,
9949: 63.50288,
9950: 131.898010254,
9974: 68.9,
9985: 38.101727,
9990: 121.1997824,
9992: 130.081008911,
10054: 91.625584,
10062: 131.570999146,
10084: 87.99685,
10100: 63.956472,
10130: 95.0,
10132: 78.698276195,
10133: 89.176262711,
10134: 89.1761872,
10136: 88.722670341,
10140: 39.00891,
10154: 69.0,
10156: 75.069476,
10157: 74.162352495,
10165: 40.82328,
10174: 65.77084,
10197: 63.50288,
10203: 64.410064,
10204: 65.317248,
10211: 64.410064,
10225: 131.854003906,
10226: 63.50288,
10231: 78.92507238,
10235: 62.595696,
10239: 130.62600708,
10251: 131.478012085,
10273: 63.50288,
10275: 130.477005005,
10299: 62.595696,
10303: 41.27687,
10364: 57.7,
10374: 78.698276195,
10389: 40.36969,
10392: 67.94808,
10613: 97.431641076,
10704: 70.1,
10712: 64.410064,
10715: 65.77084,
10716: 65.77084,
10791: 65.317248,
10842: 59.0,
10860: 42.184055,
10911: 38.101727,
10973: 59.0,
11127: 107.682825869,
11148: 107.6,
11149: 107.6,
11150: 109.95,
11151: 38.101727,
11201: 79.605460935,
11202: 79.605460935,
11205: 79.605460935,
11208: 66.224432,
11249: 64.410064,
11252: 65.77084,
11296: 64.410064,
11298: 65.77084,
11304: 65.317248,
11308: 88.99482161,
11318: 65.77084,
11320: 65.317248,
11398: 186.8799,
11420: 37.648136,
11488: 77.0,
14712: 121.7,
14721: 43.09124,
14740: 131.54168,
14741: 68.2202368,
14743: 68.492392,
14780: 132.23500061,
14783: 62.595695,
14784: 97.06876718,
14797: 77.564232,
14798: 75.000076424,
14825: 99.79024,
14843: 43.09124,
14867: 134.645004272,
14994: 69.8985272,
15064: 37.648136,
15135: 69.9,
15137: 69.9,
15176: 78.0,
15211: 105.5508584,
15230: 96.61517481,
15232: 97.159485654,
15237: 71.213944,
15292: 72.57472,
15305: 72.0,
15306: 25.854744,
15311: 40.36969,
15323: 68.0,
15337: 70.079964,
15344: 80.0,
15349: 92.98636,
15361: 97.159485654,
15362: 96.705893284,
15390: 96.887330232,
15391: 96.887330232,
15392: 96.887330232,
15394: 47.17357,
15395: 135.624008,
15397: 72.57472,
15412: 100.0,
15413: 100.0,
15414: 70.4,
15415: 105.0,
15430: 113.398,
15435: 102.511792,
15458: 94.0,
15489: 40.82328,
15490: 127.00576,
15491: 131.54168,
15494: 93.439952,
15502: 68.49244787,
15521: 75.0,
15524: 144.242256,
15527: 108.1363328,
15529: 113.398,
15546: 80.0,
15552: 39.00891,
15553: 86.18248,
15554: 107.0930712,
15557: 93.439952,
15565: 56.699,
15577: 105.5508584,
15584: 107.1,
15599: 98.338825816,
15610: 200.487664,
15618: 36.28736,
15619: 101.60461,
15621: 75.0,
15623: 53.8867296,
15624: 63.50288,
15641: 118.6596672,
15642: 102.6,
15644: 71.667536,
15647: 84.821704,
15652: 96.161504,
15654: 94.347136,
15667: 107.4105856,
15669: 89.357624,
15681: 90.7184,
15683: 34.0194,
15685: 141.1578304,
15701: 136.531192,
15705: 135.62401,
15710: 76.203456,
15711: 23.586784,
15725: 92.532768,
15726: 102.79755496,
15729: 136.0776,
15761: 101.60461,
15772: 109.4063904,
15775: 63.3,
15787: 35.833767,
15820: 91.3534288,
15836: 130.180904,
15857: 122.5,
15891: 35.380177,
15895: 68.49244787,
15897: 75.29633342,
15903: 102.511795,
15912: 68.49244787,
15917: 75.29633342,
15952: 43.09124,
15954: 64.410064,
15955: 65.0450928,
15964: 129.17401123,
15966: 129.45526,
15983: 74.615944865,
15985: 75.976721975,
15988: 73.48196394,
15990: 73.48196394,
15991: 74.162352495,
15992: 75.29633342,
15994: 75.29633342,
16043: 68.49244787,
16066: 74.615884,
16070: 73.93555631,
...}
CodePudding user response:
If the imgur you posted is truely your dataframe, then the problem is that you are performing qcut on a single-rowed dataframe. How can you split up a single row into 100 quantiles? Pandas is trying to create a bunch of bins of different quantile values, but since you only have one observation, all of those bins start and end in the same place. I can prove this is the case by recreating your error using a single row df:
weightdf = pd.DataFrame(dict(weights=[10]))
pd.qcut(weightdf["weights"].sort_values(), cuts )
returns
ValueError: Bin edges must be unique:
TLDR: You need more than one row in your df if you want to split it into quantiles.
CodePudding user response:
(New answer after OP posted data. First answer is still valid for the same error message with different data)
The problem appears to be that there are several repeated weights in the dataset. To understand why this is a problem, see this simple example: [1,2,3,4,5,5,5,5,9,10]
. Lets try to bin that by decile. Of course, we notice that the 5th, 6th, 7th, and 8th decile are 5. So what bin do we put 5 into? We could arbitrarily bin the different fives into different deciles as described in this answer, however, that generally is misleading (hence why I have a separate answer). I think the most logical thing to do is to "round up", and say that no element fits into the 5th, 6th, or 7th decile, and all of the fives fit into the 8th decile. To do that, use the following code (the code is for quantile on the original data, not decile on the example data):
edges = np.quantile(weightdf.measure_value,np.arange(0.01,1.01,0.01))
unique_edges = np.unique(edges)
labels = [ 1 np.where(unique_edge == edges)[0][-1] for unique_edge in unique_edges ]
weightdf["Ntile"] = pd.cut(weightdf.measure_value, np.insert(unique_edges,0,0), labels=labels)