Home > Blockchain >  Ntile equivalent on pandas
Ntile equivalent on pandas

Time:10-07

been wracking my head trying to replicate a solution to a sql exercise on pandas. the exercise contains creating 1 percentile bins using the NTILE function in order to calculate some metrics.

source Dset looks like this and the percentile i want to divide by is the measure_value column :

source df

this is what i want to get to (first create the "Ntile" equivalent and then group by it, agging with min max and count).

destination df

tried using qcut but encountered an error because of similar values.

code attempted :

cuts = 100
weightdf = healthdf[healthdf["measure"] == "weight"].copy()
weightdf["Ntile"] = pd.qcut(weightdf["measure_value"].sort_values(), cuts , labels = range(1, 
cuts   1))

this created a " Bin edges must be unique" error, changing the duplicates argument didnt work. hopefully someone can help, i'm certain there is a way to solve it. thanks!

edit target column output in response to david kaftan's request

{0: 46.03959,
 7: 129.060012817,
 84: 114.758776,
 97: 39.916096,
 132: 114.8494944,
 133: 69.39963261,
 139: 73.93555631,
 143: 73.93555631,
 148: 75.069537235,
 149: 68.0388,
 177: 74.389088,
 181: 34.926582,
 188: 115.1670088,
 189: 75.29633342,
 202: 115.9381152,
 212: 34.0194,
 214: 89.819007874,
 216: 91.295005798,
 218: 65.317248,
 220: 80.28584949,
 222: 123.1,
 223: 123.6,
 224: 123.0,
 225: 122.7,
 229: 122.7,
 230: 121.0,
 231: 121.7,
 243: 64.9543744,
 244: 65.0450928,
 257: 38.55532,
 261: 64.6822192,
 265: 79.151868565,
 267: 79.151868565,
 271: 93.0,
 272: 91.4,
 279: 63.50288,
 282: 128.820128,
 301: 36.28736,
 304: 63.049288,
 346: 65.77084,
 369: 119.8390064,
 388: 64.863656,
 396: 78.698276195,
 400: 79.151868565,
 401: 79.151868565,
 415: 78.698276195,
 416: 78.698276195,
 419: 78.92507238,
 420: 78.47148001,
 424: 78.698276195,
 463: 63.50288,
 478: 88.7225952,
 491: 116.6638624,
 499: 119.9,
 501: 115.0309312,
 505: 102.96547699,
 512: 65.1358112,
 518: 78.698276195,
 519: 78.698276195,
 520: 78.698280334,
 532: 70.6696336,
 534: 69.39963261,
 540: 75.8,
 558: 63.50288,
 574: 94.0,
 590: 79.151868565,
 610: 94.5,
 614: 63.50288,
 631: 57.6,
 659: 63.50288,
 674: 38.101727,
 676: 108.86208,
 717: 70.0,
 756: 125.917145,
 780: 118.6596672,
 865: 79.83225712,
 872: 80.73944186,
 925: 42.63765,
 946: 68.12952,
 1007: 68.0388,
 1066: 74.389088,
 1067: 74.389088,
 1068: 74.389088,
 1069: 74.389088,
 1070: 74.389088,
 1131: 69.5,
 1164: 92.532768,
 1168: 78.698212,
 1172: 64.410064,
 1255: 79.832192,
 1267: 65.77084,
 1271: 65.317248,
 1272: 65.77084,
 1273: 65.77084,
 1282: 65.77084,
 1291: 65.317248,
 1300: 66.224432,
 1314: 121.750007629,
 1315: 119.1,
 1318: 122.900009155,
 1385: 69.5,
 1387: 110.5,
 1496: 40.36969,
 1536: 119.1,
 1541: 122.750007629,
 1543: 122.850006104,
 1642: 91.1,
 1643: 90.4,
 1671: 57.6,
 1699: 74.615884,
 1700: 75.069476,
 1779: 66.678024,
 1784: 52.7527496,
 1789: 105.46014,
 1790: 68.9913432,
 1791: 129.071,
 1792: 128.935012817,
 1793: 68.49244787,
 1794: 68.49244787,
 1799: 65.77084,
 1852: 68.8099064,
 1905: 59.42055,
 1981: 64.863656,
 1995: 92.079176,
 1998: 74.0262144,
 1999: 73.3458264,
 2000: 73.481904,
 2018: 31.3885664,
 2019: 31.0256928,
 2022: 65.317248,
 2027: 136.036010742,
 2030: 65.77084,
 2040: 84.368112,
 2108: 42.184055,
 2139: 38.1,
 2287: 124.100006104,
 2293: 123.350006104,
 2297: 119.1,
 2298: 119.1,
 2341: 119.1,
 2356: 97.965065785,
 2358: 98.127905446,
 2438: 85.27536556,
 2494: 69.12742,
 2496: 68.40167,
 2497: 67.67593,
 2540: 96.61517481,
 2542: 96.433737862,
 2546: 96.705893284,
 2599: 173.725736,
 2607: 75.0,
 2654: 80.0,
 2696: 53.0,
 2700: 69.2181392,
 2703: 68.5831104,
 2873: 70.30676,
 2879: 73.481904,
 2881: 31.3432072,
 2882: 65.77084,
 2886: 66.224432,
 2892: 65.77084,
 2914: 38.101728,
 2930: 134.263232,
 2947: 72.0,
 2951: 99.79024,
 2981: 70.0,
 2986: 109.769264,
 2987: 29.029888,
 2994: 15.422128,
 2995: 27.21552,
 3005: 29.48348,
 3006: 58.96696,
 3010: 32.658623,
 3032: 101.198958525,
 3037: 136.531192,
 3039: 32.205032,
 3050: 25.854744,
 3064: 85.0,
 3065: 84.0,
 3069: 96.070863966,
 3071: 96.433737862,
 3072: 95.889427018,
 3076: 95.70799007,
 3079: 97.159485654,
 3087: 100.243832,
 3088: 82.100152,
 3093: 81.64656,
 3097: 102.798948211,
 3107: 20.865232,
 3109: 93.439952,
 3110: 78.4,
 3111: 102.421074,
 3149: 128.504013062,
 3150: 129.27382,
 3151: 128.50272,
 3152: 130.40192,
 3153: 130.402008057,
 3154: 129.288009644,
 3164: 102.398950789,
 3172: 97.0,
 3179: 109.4063904,
 3185: 83.91452,
 3214: 68.492392,
 3216: 108.0909736,
 3221: 66.224432,
 3226: 100.243832,
 3257: 42.184055,
 3286: 75.6137864,
 3310: 45.812794,
 3313: 44.452015,
 3319: 40.82328,
 3321: 97.0,
 3326: 97.25,
 3330: 41.27687,
 3332: 42.63765,
 3376: 44.90561,
 3397: 68.49244787,
 3398: 68.49244787,
 3406: 68.49244787,
 3408: 68.49244787,
 3409: 68.49244787,
 3411: 68.49244787,
 3413: 68.49244787,
 3415: 122.4,
 3479: 112.490816,
 3515: 34.926582,
 3527: 76.430314345,
 3539: 119.0,
 3541: 118.0,
 3547: 76.20351816,
 3556: 64.2286272,
 3573: 68.49244787,
 3579: 130.38014,
 3580: 131.43384,
 3581: 129.844,
 3586: 128.302001953,
 3588: 131.434005737,
 3589: 130.542007446,
 3591: 130.380004883,
 3594: 129.762008667,
 3595: 130.54207,
 3597: 129.86485,
 3598: 130.08621,
 3599: 130.086013794,
 3600: 130.539001465,
 3601: 129.865005493,
 3604: 130.5389,
 3605: 131.562011719,
 3606: 131.42204,
 3607: 133.11394,
 3608: 133.114013672,
 3613: 130.89587,
 3614: 131.154006958,
 3616: 134.708007812,
 3617: 134.350006104,
 3618: 134.35,
 3622: 76.430314345,
 3624: 64.863656,
 3625: 76.883906715,
 3636: 65.0450928,
 3637: 76.883906715,
 3638: 76.883906715,
 3648: 78.017824,
 3658: 94.982239509,
 3670: 133.53804,
 3672: 67.58526313,
 3673: 67.58526313,
 3676: 67.58526313,
 3678: 133.76485,
 3681: 87.3,
 3685: 132.736,
 3686: 133.222000122,
 3687: 132.73600769,
 3690: 132.866,
 3691: 131.011,
 3692: 132.1251,
 3693: 132.78418,
 3694: 131.93187,
 3695: 132.125,
 3696: 131.998001099,
 3697: 131.523,
 3698: 131.992,
 3706: 64.1379088,
 3723: 122.197685,
 3768: 64.9543744,
 3770: 67.58526313,
 3786: 62.142104,
 3788: 62.595696,
 3792: 62.142104,
 3794: 62.595696,
 3798: 62.142104,
 3801: 62.142104,
 3802: 38.55532,
 3809: 96.524456336,
 3810: 96.343016619,
 3811: 96.343016619,
 3812: 96.343019388,
 3813: 96.343016619,
 3838: 64.5915008,
 3846: 98.610978469,
 3848: 118.387512,
 3858: 65.1358112,
 3859: 64.863656,
 3860: 65.0450928,
 3861: 65.0450928,
 3862: 64.6822192,
 3868: 67.58526313,
 3875: 98.610978469,
 3876: 98.610978469,
 3877: 124.5,
 3878: 123.4,
 3881: 64.863656,
 3889: 96.52445772,
 3891: 79.151804,
 3915: 122.6,
 3916: 122.6,
 3922: 31.297848,
 3923: 31.3885664,
 3924: 31.3432072,
 3926: 65.0450928,
 3935: 79.151868565,
 3936: 79.151868565,
 3939: 56.3,
 3952: 62.595696,
 3957: 62.142104,
 3962: 61.688512,
 3965: 63.049288,
 3966: 62.595696,
 3968: 62.595696,
 3972: 61.688512,
 3973: 62.142104,
 3977: 64.5007824,
 3978: 64.5915008,
 3980: 64.863656,
 3990: 38.55532,
 4004: 121.9,
 4056: 64.863656,
 4106: 35.380177,
 4113: 38.55532,
 4115: 92.98636,
 4119: 62.142104,
 4120: 63.049288,
 4121: 123.2863056,
 4138: 64.6822192,
 4141: 64.9543744,
 4179: 56.2,
 4182: 95.85,
 4888: 34.926582,
 4891: 131.995272,
 4897: 74.84268,
 4899: 36.74095,
 4901: 37.194546,
 4915: 37.194546,
 4919: 99.427447504,
 4959: 64.5915008,
 4960: 65.0450928,
 4966: 65.0450928,
 4967: 80.28584949,
 4971: 65.317248,
 4973: 64.6822192,
 4975: 121.4719376,
 4990: 54.099960447,
 4992: 35.833767,
 4995: 36.74095,
 4997: 34.0194,
 4998: 62.595696,
 5006: 65.317248,
 5008: 87.6340445,
 5009: 87.724761589,
 5060: 75.1601944,
 5083: 74.38914868,
 5102: 114.985572,
 5157: 69.39963261,
 5158: 74.162292,
 5160: 75.29633342,
 5189: 115.439164,
 5199: 73.935496,
 5209: 67.222389234,
 5229: 47.62716,
 5234: 69.39963261,
 5427: 66.5,
 5479: 62.595696,
 5497: 98.157388868,
 5506: 80.28584949,
 5508: 65.1358112,
 5515: 80.28584949,
 5533: 64.863656,
 5548: 78.698212,
 5551: 78.698276195,
 5572: 65.317248,
 5606: 57.6,
 5607: 65.317248,
 5636: 63.956472,
 5641: 65.317248,
 5642: 65.77084,
 5685: 85.0,
 5712: 128.739013672,
 5714: 130.042999268,
 5730: 73.481904,
 5757: 64.6822192,
 5760: 87.089664,
 5792: 64.6822192,
 5796: 89.6297792,
 5797: 32.4771872,
 5798: 64.410064,
 5804: 125.463554,
 5816: 87.0,
 5824: 65.1358112,
 5865: 84.6,
 5874: 61.6,
 5907: 75.5,
 5930: 39.00891,
 5936: 33.112216,
 5939: 63.50288,
 5943: 64.410064,
 5946: 64.410064,
 5951: 63.049288,
 5971: 88.722670341,
 6019: 130.430999756,
 6029: 89.176262711,
 6030: 89.176262711,
 6031: 90.3555264,
 6032: 90.35559872,
 6033: 89.176262711,
 6035: 89.176262711,
 6037: 38.101727,
 6054: 69.8,
 6055: 68.7,
 6062: 69.39963261,
 6074: 67.222389234,
 6137: 65.317248,
 6139: 69.4,
 6141: 89.1761872,
 6164: 69.5,
 6205: 97.29549,
 6225: 88.994822994,
 6269: 103.237632751,
 6286: 103.963378906,
 6301: 117.5710464,
 6376: 79.832260132,
 6377: 79.83225712,
 6378: 79.83225712,
 6380: 79.83225712,
 6381: 79.832192,
 6383: 79.83225712,
 6386: 86.4,
 6392: 79.83225712,
 6395: 79.832192,
 6411: 79.83225712,
 6413: 79.83225712,
 6416: 80.059053305,
 6418: 80.059053305,
 6421: 80.059053305,
 6422: 80.059053305,
 6433: 79.605460935,
 6437: 79.83225712,
 6443: 79.605460935,
 6444: 79.151868565,
 6445: 79.151868565,
 6446: 79.151868565,
 6448: 80.73944186,
 6472: 41.730465,
 6519: 74.389088,
 6520: 73.481904,
 6561: 86.6,
 6648: 65.317248,
 6652: 65.317248,
 6653: 65.77084,
 6682: 103.056190491,
 6699: 84.0,
 6740: 112.0,
 6743: 110.500087415,
 6798: 79.151868565,
 6801: 120.4740352,
 6805: 40.36969,
 6808: 134.889007568,
 6817: 79.605396,
 6838: 75.3,
 6845: 53.98,
 6911: 65.77084,
 6958: 38.55532,
 6980: 73.481904,
 6983: 99.79024,
 7021: 80.013691299,
 7024: 52.163082,
 7027: 48.534344,
 7383: 8.0,
 7411: 39.916096,
 7477: 170.5506,
 7489: 37.648136,
 7510: 116.119552,
 7517: 65.317248,
 7522: 81.0,
 7525: 65.77084,
 7528: 65.997636,
 7535: 65.77084,
 7537: 65.317248,
 7573: 123.450004578,
 7578: 124.850006104,
 7579: 122.250007629,
 7581: 121.350006104,
 7582: 119.1,
 7583: 119.1,
 7758: 77.11064,
 7759: 77.11064,
 7765: 39.00891,
 7787: 80.013691299,
 7788: 80.013691299,
 7793: 31.75144,
 7796: 63.50288,
 7801: 105.8230136,
 7806: 108.0,
 7808: 52.163082,
 7818: 52.16308,
 7822: 70.0,
 7824: 93.893544,
 7836: 98.338825816,
 7837: 97.703796498,
 7838: 76.657048,
 7847: 119.9297248,
 7850: 94.0,
 7852: 96.705893284,
 7860: 102.6,
 7863: 70.0,
 7866: 66.0,
 7871: 37.194546,
 7873: 62.142104,
 7874: 77.9,
 7875: 86.2,
 7876: 86.2,
 7878: 86.3,
 7885: 29.48348,
 7886: 99.79024,
 7898: 117.026736,
 7912: 84.0,
 7916: 99.79024,
 7918: 58.96696,
 7919: 76.20345,
 7920: 72.57472,
 7923: 78.0,
 7932: 103.0,
 7941: 99.79024,
 7942: 71.213944,
 7944: 70.7,
 7950: 121.0183456,
 7955: 54.0,
 7958: 43.998425,
 7975: 75.296272,
 7992: 29.48348,
 7998: 101.604608,
 7999: 59.0,
 8003: 99.79024,
 8009: 42.63765,
 8022: 83.007336,
 8027: 72.57472,
 8035: 106.4126832,
 8044: 113.85168487,
 8049: 82.0,
 8052: 107.501305,
 8055: 107.047712,
 8057: 52.16308,
 8064: 34.0194,
 8065: 83.0,
 8066: 69.5810128,
 8072: 157.5778608,
 8083: 91.625584,
 8094: 84.821704,
 8095: 107.3198672,
 8123: 57.424794042,
 8128: 120.20188,
 8131: 99.0,
 8137: 93.893544,
 8139: 79.3786,
 8140: 49.89512,
 8143: 108.4992064,
 8144: 108.9074392,
 8154: 108.9074392,
 8160: 86.18248,
 8163: 92.079176,
 8168: 65.77084,
 8197: 101.698953395,
 8199: 79.83219,
 8234: 130.52802,
 8235: 120.4,
 8244: 75.29633342,
 8248: 75.29633342,
 8249: 75.523068,
 8252: 75.976721975,
 8253: 99.335006714,
 8276: 64.0471904,
 8277: 64.2286272,
 8281: 64.5915008,
 8284: 64.410064,
 8285: 65.1358112,
 8377: 75.29633342,
 8378: 79.832192,
 8379: 68.49244787,
 8384: 68.49244787,
 8385: 68.49244787,
 8391: 68.49244787,
 8396: 75.29633342,
 8398: 75.29633342,
 8399: 74.84274105,
 8401: 74.84274105,
 8402: 73.93555631,
 8412: 75.296272,
 8429: 48.080753,
 8445: 42.63765,
 8449: 129.926010132,
 8455: 130.01701355,
 8468: 73.93555631,
 8478: 68.49244787,
 8480: 68.49244787,
 8484: 68.49244787,
 8485: 68.49244787,
 8507: 121.3,
 8509: 124.2,
 8510: 124.6,
 8515: 75.296272,
 8516: 75.29633342,
 8519: 75.069476,
 8523: 74.615944865,
 8524: 74.162352495,
 8532: 74.162352495,
 8533: 73.93555631,
 8534: 74.162352495,
 8535: 74.162292,
 8536: 75.29633342,
 8538: 74.162292,
 8539: 74.84274105,
 8556: 68.49244787,
 8557: 68.49244787,
 8560: 68.49244787,
 8574: 75.74992579,
 8591: 98.361051399,
 8609: 76.20351816,
 8610: 64.863656,
 8611: 100.697426,
 8613: 68.49244787,
 8630: 68.49244787,
 8631: 68.49244787,
 8634: 68.49244787,
 8637: 68.49244787,
 8638: 90.352005005,
 8639: 91.86100769,
 8645: 64.1379088,
 8651: 68.49244787,
 8654: 68.49244787,
 8656: 68.49244787,
 8657: 68.49244787,
 8659: 68.49244787,
 8660: 68.49244787,
 8666: 68.49244787,
 8671: 68.49244787,
 8673: 68.49244787,
 8675: 68.49244787,
 8677: 68.49244787,
 8679: 68.49244787,
 8687: 118.8,
 8695: 119.9,
 8711: 44.452015,
 8718: 63.956472,
 8720: 64.183268,
 8722: 64.2286272,
 8723: 76.20351816,
 8725: 76.20351816,
 8734: 88.178353959,
 8735: 87.18045213,
 8737: 87.724761589,
 8738: 87.724761589,
 8740: 87.724761589,
 8746: 76.430314345,
 8771: 68.49244787,
 8774: 76.883906715,
 8796: 74.5705248,
 8797: 74.5705248,
 8799: 68.49244787,
 8800: 68.49244787,
 8809: 68.49244787,
 8818: 94.0,
 8824: 76.20351816,
 8835: 97.522475529,
 8836: 65.77084,
 8843: 63.50288,
 8845: 75.523129605,
 8848: 75.749864,
 8852: 77.791091455,
 8873: 77.337499085,
 8874: 77.337499085,
 8886: 64.5915008,
 8887: 64.5915008,
 8888: 64.1379088,
 8891: 133.135009766,
 8892: 96.25229953,
 8894: 95.980145492,
 8895: 112.490816,
 8897: 62.595696,
 8915: 133.765,
 8922: 62.595696,
 8946: 63.50288,
 8947: 63.50288,
 8957: 34.472992,
 8959: 39.00891,
 8973: 87.54332741,
 8978: 67.58526313,
 8980: 67.58526313,
 8983: 67.58526313,
 8984: 67.58526313,
 8987: 121.109064,
 8994: 77.337499085,
 9005: 131.523010254,
 9016: 40.36969,
 9017: 78.01788764,
 9025: 77.791091455,
 9037: 126.0,
 9038: 126.0,
 9039: 124.6,
 9042: 123.4,
 9054: 122.6,
 9057: 78.47148001,
 9063: 98.610978469,
 9064: 98.610978469,
 9066: 98.610978469,
 9067: 98.610978469,
 9068: 98.610978469,
 9069: 98.610978469,
 9078: 64.5915008,
 9088: 78.92507238,
 9089: 79.83225712,
 9109: 112.490816,
 9110: 113.398,
 9113: 38.101727,
 9119: 62.142104,
 9120: 78.698276195,
 9132: 118.387512,
 9138: 67.58526313,
 9142: 67.58526313,
 9151: 36.74095,
 9153: 62.595696,
 9158: 121.3,
 9159: 121.9,
 9167: 79.605460935,
 9172: 79.37866475,
 9181: 65.0450928,
 9221: 64.5915008,
 9223: 38.101727,
 9242: 56.0,
 9255: 65.317248,
 9259: 122.9,
 9260: 37.648136,
 9261: 79.3786,
 9264: 82.55374,
 9283: 39.00891,
 9289: 101.199996948,
 9313: 90.89100647,
 9314: 90.73500061,
 9323: 64.5007824,
 9362: 85.0,
 9368: 54.099960447,
 9371: 79.151868565,
 9372: 80.285784,
 9383: 90.303001404,
 9386: 39.916096,
 9388: 89.759002686,
 9390: 90.211006165,
 9391: 35.380177,
 9408: 39.916096,
 9421: 36.74095,
 9450: 39.462505,
 9452: 123.0,
 9453: 123.6,
 9454: 124.0,
 9458: 123.1,
 9459: 121.0,
 9462: 122.2,
 9473: 67.7,
 9475: 75.296272,
 9501: 68.719244055,
 9571: 66.224432,
 9572: 127.136009216,
 9587: 67.222389234,
 9594: 74.84274105,
 9615: 68.719244055,
 9616: 118.6,
 9633: 114.985572,
 9665: 87.6340445,
 9667: 87.724761589,
 9681: 73.93555631,
 9702: 41.27687,
 9707: 39.00891,
 9718: 54.0,
 9720: 54.099960447,
 9731: 80.28584949,
 9739: 54.099960447,
 9741: 64.5007824,
 9744: 34.926582,
 9748: 64.5007824,
 9749: 121.5,
 9758: 68.719244055,
 9801: 63.50288,
 9803: 63.50288,
 9809: 78.92507238,
 9813: 114.7134168,
 9816: 69.39963261,
 9820: 75.069476,
 9821: 74.615944865,
 9826: 74.615944865,
 9829: 75.29633342,
 9837: 70.6696336,
 9838: 36.28736,
 9867: 65.317248,
 9915: 64.410064,
 9918: 63.50288,
 9919: 63.50288,
 9923: 65.317248,
 9933: 79.151868565,
 9949: 63.50288,
 9950: 131.898010254,
 9974: 68.9,
 9985: 38.101727,
 9990: 121.1997824,
 9992: 130.081008911,
 10054: 91.625584,
 10062: 131.570999146,
 10084: 87.99685,
 10100: 63.956472,
 10130: 95.0,
 10132: 78.698276195,
 10133: 89.176262711,
 10134: 89.1761872,
 10136: 88.722670341,
 10140: 39.00891,
 10154: 69.0,
 10156: 75.069476,
 10157: 74.162352495,
 10165: 40.82328,
 10174: 65.77084,
 10197: 63.50288,
 10203: 64.410064,
 10204: 65.317248,
 10211: 64.410064,
 10225: 131.854003906,
 10226: 63.50288,
 10231: 78.92507238,
 10235: 62.595696,
 10239: 130.62600708,
 10251: 131.478012085,
 10273: 63.50288,
 10275: 130.477005005,
 10299: 62.595696,
 10303: 41.27687,
 10364: 57.7,
 10374: 78.698276195,
 10389: 40.36969,
 10392: 67.94808,
 10613: 97.431641076,
 10704: 70.1,
 10712: 64.410064,
 10715: 65.77084,
 10716: 65.77084,
 10791: 65.317248,
 10842: 59.0,
 10860: 42.184055,
 10911: 38.101727,
 10973: 59.0,
 11127: 107.682825869,
 11148: 107.6,
 11149: 107.6,
 11150: 109.95,
 11151: 38.101727,
 11201: 79.605460935,
 11202: 79.605460935,
 11205: 79.605460935,
 11208: 66.224432,
 11249: 64.410064,
 11252: 65.77084,
 11296: 64.410064,
 11298: 65.77084,
 11304: 65.317248,
 11308: 88.99482161,
 11318: 65.77084,
 11320: 65.317248,
 11398: 186.8799,
 11420: 37.648136,
 11488: 77.0,
 14712: 121.7,
 14721: 43.09124,
 14740: 131.54168,
 14741: 68.2202368,
 14743: 68.492392,
 14780: 132.23500061,
 14783: 62.595695,
 14784: 97.06876718,
 14797: 77.564232,
 14798: 75.000076424,
 14825: 99.79024,
 14843: 43.09124,
 14867: 134.645004272,
 14994: 69.8985272,
 15064: 37.648136,
 15135: 69.9,
 15137: 69.9,
 15176: 78.0,
 15211: 105.5508584,
 15230: 96.61517481,
 15232: 97.159485654,
 15237: 71.213944,
 15292: 72.57472,
 15305: 72.0,
 15306: 25.854744,
 15311: 40.36969,
 15323: 68.0,
 15337: 70.079964,
 15344: 80.0,
 15349: 92.98636,
 15361: 97.159485654,
 15362: 96.705893284,
 15390: 96.887330232,
 15391: 96.887330232,
 15392: 96.887330232,
 15394: 47.17357,
 15395: 135.624008,
 15397: 72.57472,
 15412: 100.0,
 15413: 100.0,
 15414: 70.4,
 15415: 105.0,
 15430: 113.398,
 15435: 102.511792,
 15458: 94.0,
 15489: 40.82328,
 15490: 127.00576,
 15491: 131.54168,
 15494: 93.439952,
 15502: 68.49244787,
 15521: 75.0,
 15524: 144.242256,
 15527: 108.1363328,
 15529: 113.398,
 15546: 80.0,
 15552: 39.00891,
 15553: 86.18248,
 15554: 107.0930712,
 15557: 93.439952,
 15565: 56.699,
 15577: 105.5508584,
 15584: 107.1,
 15599: 98.338825816,
 15610: 200.487664,
 15618: 36.28736,
 15619: 101.60461,
 15621: 75.0,
 15623: 53.8867296,
 15624: 63.50288,
 15641: 118.6596672,
 15642: 102.6,
 15644: 71.667536,
 15647: 84.821704,
 15652: 96.161504,
 15654: 94.347136,
 15667: 107.4105856,
 15669: 89.357624,
 15681: 90.7184,
 15683: 34.0194,
 15685: 141.1578304,
 15701: 136.531192,
 15705: 135.62401,
 15710: 76.203456,
 15711: 23.586784,
 15725: 92.532768,
 15726: 102.79755496,
 15729: 136.0776,
 15761: 101.60461,
 15772: 109.4063904,
 15775: 63.3,
 15787: 35.833767,
 15820: 91.3534288,
 15836: 130.180904,
 15857: 122.5,
 15891: 35.380177,
 15895: 68.49244787,
 15897: 75.29633342,
 15903: 102.511795,
 15912: 68.49244787,
 15917: 75.29633342,
 15952: 43.09124,
 15954: 64.410064,
 15955: 65.0450928,
 15964: 129.17401123,
 15966: 129.45526,
 15983: 74.615944865,
 15985: 75.976721975,
 15988: 73.48196394,
 15990: 73.48196394,
 15991: 74.162352495,
 15992: 75.29633342,
 15994: 75.29633342,
 16043: 68.49244787,
 16066: 74.615884,
 16070: 73.93555631,
 ...}

CodePudding user response:

If the imgur you posted is truely your dataframe, then the problem is that you are performing qcut on a single-rowed dataframe. How can you split up a single row into 100 quantiles? Pandas is trying to create a bunch of bins of different quantile values, but since you only have one observation, all of those bins start and end in the same place. I can prove this is the case by recreating your error using a single row df:

weightdf = pd.DataFrame(dict(weights=[10])) 
pd.qcut(weightdf["weights"].sort_values(), cuts )

returns

ValueError: Bin edges must be unique:

TLDR: You need more than one row in your df if you want to split it into quantiles.

CodePudding user response:

(New answer after OP posted data. First answer is still valid for the same error message with different data)

The problem appears to be that there are several repeated weights in the dataset. To understand why this is a problem, see this simple example: [1,2,3,4,5,5,5,5,9,10]. Lets try to bin that by decile. Of course, we notice that the 5th, 6th, 7th, and 8th decile are 5. So what bin do we put 5 into? We could arbitrarily bin the different fives into different deciles as described in this answer, however, that generally is misleading (hence why I have a separate answer). I think the most logical thing to do is to "round up", and say that no element fits into the 5th, 6th, or 7th decile, and all of the fives fit into the 8th decile. To do that, use the following code (the code is for quantile on the original data, not decile on the example data):

edges = np.quantile(weightdf.measure_value,np.arange(0.01,1.01,0.01))                                                                                                                                                                    
unique_edges = np.unique(edges)                                                                                                                                                                                                           
labels = [ 1   np.where(unique_edge == edges)[0][-1] for unique_edge in unique_edges ]                                                                                                                                                   
weightdf["Ntile"] = pd.cut(weightdf.measure_value, np.insert(unique_edges,0,0), labels=labels) 
  • Related