My input is:
d = {'x': [1479903, 1479879, 1479926, 1479736, 1479760, 1479784, 1479808, 1479831, 1480331, 1480355, 1480949, 1480973, 1480997, 1481021, 1481877, 1481901, 1481924, 1481948, 1481996, 1482020], 'y': [7195293, 7195293, 7195293, 7194829, 7194829, 7194829, 7194829, 7194829, 7194829, 7194829, 7194384, 7194384, 7194384, 7194384, 7194384, 7194384, 7194384, 7194384, 7193920, 7193920], 'group':[1, 1, 1, 1, 1, 1, 1, 1, 2, 2, 2, 2, 2, 2, 3, 3, 3, 3, 3, 3]}
df = pd.DataFrame(data=d)
df = df.sort_values(by = ['group', 'y','x'], ascending = [True, False, True])
The dataframe is already sorted. The desired output is to create a column called 'subgroup' that starts at 1 and then only counts up when the y-value changes and starts over at 1 when the group number changes. Also create another column called 'count' that counts up until the 'subgroup' changes, when 'subgroup' changes then the 'count' starts over at 1. The desired output should look like:
group | subgroup | count | x | y |
---|---|---|---|---|
1 | 1 | 1 | 1479879 | 7195293 |
1 | 1 | 2 | 1479903 | 7195293 |
1 | 1 | 3 | 1479926 | 7195293 |
1 | 2 | 1 | 1479736 | 7194829 |
1 | 2 | 2 | 1479760 | 7194829 |
1 | 2 | 3 | 1479784 | 7194829 |
1 | 2 | 4 | 1479808 | 7194829 |
1 | 2 | 5 | 1479831 | 7194829 |
2 | 1 | 1 | 1480331 | 7194829 |
2 | 1 | 2 | 1480355 | 7194829 |
2 | 2 | 1 | 1480949 | 7194384 |
2 | 2 | 2 | 1480973 | 7194384 |
2 | 2 | 3 | 1480997 | 7194384 |
2 | 2 | 4 | 1481021 | 7194384 |
3 | 1 | 1 | 1481877 | 7194384 |
3 | 1 | 2 | 1481901 | 7194384 |
3 | 1 | 3 | 1481924 | 7194384 |
3 | 1 | 4 | 1481948 | 7194384 |
3 | 2 | 1 | 1481996 | 7193920 |
3 | 2 | 2 | 1482020 | 7193920 |
Thanks in advance!
CodePudding user response:
Try:
df['subgroup new'] = df.groupby('group').apply(lambda x: (x['y'] != x['y'].shift()).cumsum()).values
df['count new'] = df.groupby(['group', 'subgroup new']).cumcount() 1
print(df)
Prints:
group subgroup count x y subgroup new count new
0 1 1 1 1479879 7195293 1 1
1 1 1 2 1479903 7195293 1 2
2 1 1 3 1479926 7195293 1 3
3 1 2 1 1479736 7194829 2 1
4 1 2 2 1479760 7194829 2 2
5 1 2 3 1479784 7194829 2 3
6 1 2 4 1479808 7194829 2 4
7 1 2 5 1479831 7194829 2 5
8 2 1 1 1480331 7194829 1 1
9 2 1 2 1480355 7194829 1 2
10 2 2 1 1480949 7194384 2 1
11 2 2 2 1480973 7194384 2 2
12 2 2 3 1480997 7194384 2 3
13 2 2 4 1481021 7194384 2 4
14 3 1 1 1481877 7194384 1 1
15 3 1 2 1481901 7194384 1 2
16 3 1 3 1481924 7194384 1 3
17 3 1 4 1481948 7194384 1 4
18 3 2 1 1481996 7193920 2 1
19 3 2 2 1482020 7193920 2 2