I have a dataframe:
sqm antal
30 1254
29 982
28 1570
27 1219
26 1332
25 1183
24 1177
23 679
22 615
21 296
20 369
19 306
18 301
It represents the squaremeter of an apt, and how many apt ("antal") there are in that range.
Now, I want to group the sqm together in groups so that the max "antal" is 2500
The output should look like this:
sqm_from sqm_to antal
30 29 2236
28 28 1570
27 27 1219
26 26 1332
24 25 2371
19 23 2265
18 18 301
Is there a best practice to do this or do I need to write my own function, and iterate over the rows?
CodePudding user response:
There is no way to do it as vectorized
all = []
i = 0
l = []
for x in df.index:
i = i df.loc[x,'antal']
if i < 2500 :
l.append(x)
else :
l = [x]
i = df.loc[x,'antal']
all.append(l[0])
out = df.groupby(all).agg(sqm_from = ('sqm','last'),
sqm_to = ('sqm','first'),
antal = ('antal','sum'))
out
Out[784]:
sqm_from sqm_to antal
0 29 30 2236
2 28 28 1570
3 27 27 1219
4 26 26 1332
5 24 25 2360
7 19 23 2265
12 18 18 301
CodePudding user response:
You need to write your own function that iterates over the rows of the input dataframe.
Here is a way to do it:
outList = []
f, t, antal = None, None, 0
for k in df.index:
cur = df.antal[k]
if cur > 2500:
if f != t:
outList.append((df.sqm[f], df.sqm[t], antal))
outList.append((df.sqm[k], df.sqm[k], cur))
f, t, antal = None, None, 0
elif antal cur > 2500:
outList.append((df.sqm[f], df.sqm[t], antal))
f, t, antal = k, k, cur
else:
f, t, antal = k if f is None else f, k, antal cur
if outList[-1][1] != df.sqm[df.index[-1]]:
outList.append((df.sqm[f], df.sqm[t], antal))
out = pd.DataFrame(outList, columns=['sqm_from','sqm_to','antal'])
Output:
sqm_from sqm_to antal
0 30 29 2236
1 28 28 1570
2 27 27 1219
3 26 26 1332
4 25 24 2360
5 23 19 2265
6 18 18 301