Pandas iterate over rows or groupby and sum conditionally-CodePudding

I have a dataframe:

It represents the squaremeter of an apt, and how many apt ("antal") there are in that range.
Now, I want to group the sqm together in groups so that the max "antal" is 2500

The output should look like this:

sqm_from  sqm_to    antal
      30      29     2236
      28      28     1570
      27      27     1219
      26      26     1332
      24      25     2371
      19      23     2265
      18      18      301

Is there a best practice to do this or do I need to write my own function, and iterate over the rows?

CodePudding user response：

There is no way to do it as vectorized

all = []
i = 0
l = []
for x in df.index:

    i = i   df.loc[x,'antal']
    if i < 2500 :
        l.append(x)
    else :
        l = [x]
        i = df.loc[x,'antal']
    all.append(l[0])
out = df.groupby(all).agg(sqm_from = ('sqm','last'),
                           sqm_to = ('sqm','first'),
                           antal = ('antal','sum'))
out
Out[784]: 
    sqm_from  sqm_to  antal
0         29      30   2236
2         28      28   1570
3         27      27   1219
4         26      26   1332
5         24      25   2360
7         19      23   2265
12        18      18    301

CodePudding user response：

You need to write your own function that iterates over the rows of the input dataframe.

Here is a way to do it:

outList = []
f, t, antal = None, None, 0
for k in df.index:
    cur = df.antal[k]
    if cur > 2500:
        if f != t:
            outList.append((df.sqm[f], df.sqm[t], antal))
        outList.append((df.sqm[k], df.sqm[k], cur))
        f, t, antal = None, None, 0
    elif antal   cur > 2500:
        outList.append((df.sqm[f], df.sqm[t], antal))
        f, t, antal = k, k, cur
    else:
        f, t, antal = k if f is None else f, k, antal   cur
if outList[-1][1] != df.sqm[df.index[-1]]:
    outList.append((df.sqm[f], df.sqm[t], antal))
out = pd.DataFrame(outList, columns=['sqm_from','sqm_to','antal'])

Output:

   sqm_from  sqm_to  antal
0        30      29   2236
1        28      28   1570
2        27      27   1219
3        26      26   1332
4        25      24   2360
5        23      19   2265
6        18      18    301