If have lists of natural numbers which I would like to group. I am looking for an algorithm in JavaScript but I am happy with any suggestions.
The examples series are just examples, what the dataset looks like. The resulting groups don't have to be exactly like in the examples, just to give a General idea what I try to achieve.
I have tested hierachical clustering but it does not lead to the desired grouping.
| separates groups:
- 3 4 5 | 10 12 13 14 | 100 200 300
- 100 110 111 112 118 | 200 300 400 | 1000 1001 1004 | 1020 1030 1032 1040
- 1 3 4 7 8 9 | 10 15 20 30 50 60 80 | 100 200 3001
- 1 2 3 | 10 11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27 28 29 30 31 40 41 50 | 100 101 102 103 104 105 106 107 108 | 200 201 202 203 204 205 206 209 210 211 212 | 900 901 | 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 | 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 | 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125|3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223
CodePudding user response:
You are placing a separator when either of two conditions is met:
the number of digits increases
the gap between adjacent numbers increases significantly relative to what was established early in the group.
3 4 5 | 10 12 13 14 | 100 200 300
Here your first separator has both a big increase in gap and a new digit, as does the second.
100 110 111 112 118 | 200 300 400 | 1000 1001 1004 | 1020 1030 1032 1040
Here your first separator has a big increase in gap, the second has both a big increase in gap and a new digit, and the third has a big increase in gap.
1 3 4 7 8 9 | 10 15 20 30 50 60 80 | 100 200 3001
Here my theory fails; I'd expect a separator between 200 and 3001. Maybe there's a third rule about the minimum size of a group. If that's the case, then if we added a 3002 at the end of this and the min size is 2 then we'd have a gap between 200 and 3001.
1 2 3 | 10 11 12 13 14 15 16 17 18 19 21 22 23 24 25 26 27 28 29 30 31 40 41 50 | 100 101 102 103 104 105 106 107 108 | 200 201 202 203 204 205 206 209 210 211 212 | 900 901 | 1001 1002 1003 1004 1005 1006 1007 1008 1009 1010 1011 1012 1013 1014 1015 1016 1017 1018 1019 1020 1021 1022 1023 1024 1025 | 2001 2002 2003 2004 2005 2006 2007 2008 2009 2010 2011 2012 2013 2014 2015 2016 2017 2018 2019 2020 2021 2022 2023 2024 2025 | 3101 3102 3103 3104 3105 3106 3107 3108 3109 3110 3111 3112 3113 3114 3115 3116 3117 3118 3119 3120 3121 3122 3123 3124 3125|3201 3202 3203 3204 3205 3206 3207 3208 3209 3210 3211 3212 3213 3214 3215 3216 3217 3218 3219 3220 3221 3222 3223
This seems to fit the pattern.
CodePudding user response:
import itertools
nums = [1, 6, 9, 100, 102, 105, 109, 134, 139]
for k, g in itertools.groupby(nums, key=lambda n: n//10):
print(k, list(g))
#Result:
0 [1, 6, 9]
10 [100, 102, 105, 109]
13 [134, 139]
See this for more detail.