Home > front end >  Pandas iterate over rows or groupby and sum conditionally
Pandas iterate over rows or groupby and sum conditionally

Time:06-19

I have a dataframe:

sqm  antal
 30    1254
 29     982
 28    1570
 27    1219
 26    1332
 25    1183
 24    1177
 23     679
 22     615
 21     296
 20     369
 19     306
 18     301

It represents the squaremeter of an apt, and how many apt ("antal") there are in that range.
Now, I want to group the sqm together in groups so that the max "antal" is 2500

The output should look like this:

sqm_from  sqm_to    antal
      30      29     2236
      28      28     1570
      27      27     1219
      26      26     1332
      24      25     2371
      19      23     2265
      18      18      301

Is there a best practice to do this or do I need to write my own function, and iterate over the rows?

CodePudding user response:

There is no way to do it as vectorized

all = []
i = 0
l = []
for x in df.index:

    i = i   df.loc[x,'antal']
    if i < 2500 :
        l.append(x)
    else :
        l = [x]
        i = df.loc[x,'antal']
    all.append(l[0])
out = df.groupby(all).agg(sqm_from = ('sqm','last'),
                           sqm_to = ('sqm','first'),
                           antal = ('antal','sum'))
out
Out[784]: 
    sqm_from  sqm_to  antal
0         29      30   2236
2         28      28   1570
3         27      27   1219
4         26      26   1332
5         24      25   2360
7         19      23   2265
12        18      18    301

CodePudding user response:

You need to write your own function that iterates over the rows of the input dataframe.

Here is a way to do it:

outList = []
f, t, antal = None, None, 0
for k in df.index:
    cur = df.antal[k]
    if cur > 2500:
        if f != t:
            outList.append((df.sqm[f], df.sqm[t], antal))
        outList.append((df.sqm[k], df.sqm[k], cur))
        f, t, antal = None, None, 0
    elif antal   cur > 2500:
        outList.append((df.sqm[f], df.sqm[t], antal))
        f, t, antal = k, k, cur
    else:
        f, t, antal = k if f is None else f, k, antal   cur
if outList[-1][1] != df.sqm[df.index[-1]]:
    outList.append((df.sqm[f], df.sqm[t], antal))
out = pd.DataFrame(outList, columns=['sqm_from','sqm_to','antal'])

Output:

   sqm_from  sqm_to  antal
0        30      29   2236
1        28      28   1570
2        27      27   1219
3        26      26   1332
4        25      24   2360
5        23      19   2265
6        18      18    301
  • Related