Home > Blockchain >  Pandas Dataframe aggregating function to count also nan values
Pandas Dataframe aggregating function to count also nan values

Time:01-26

I have the following dataframe

print(A)

   Index  1or0
0      1     0
1      2     0
2      3     0
3      4     1
4      5     1
5      6     1
6      7     1
7      8     0
8      9     1
9     10     1

And I have the following Code (Pandas Dataframe count occurrences that only happen immediately), which counts the occurrences of values that happen immediately one after another.

ser = A["1or0"].ne(A["1or0"].shift().bfill()).cumsum()

B = (
        A.groupby(ser, as_index=False)
            .agg({"Index": ["first", "last", "count"],
                  "1or0": "unique"})
            .set_axis(["StartNum", "EndNum", "Size", "Value"], axis=1)
            .assign(Value= lambda d: d["Value"].astype(str).str.strip("[]"))
    )

print(B)
​
   StartNum  EndNum  Size Value
0         1       3     3     0
1         4       7     4     1
2         8       8     1     0
3         9      10     2     1

The issues is, when NaN Values occur, the code doesn't put them together in one interval it count them always as one sized interval and not e.g. 3

print(A2)

   Index  1or0
0      1     0
1      2     0
2      3     0
3      4     1
4      5     1
5      6     1
6      7     1
7      8     0
8      9     1
9     10     1
10    11    NaN
11    12    NaN
12    13    NaN

print(B2)
​
   StartNum  EndNum  Size Value
0         1       3     3     0
1         4       7     4     1
2         8       8     1     0
3         9      10     2     1
4         11     11     1     NaN
5         12     12     1     NaN
6         13     13     1     NaN

But I want B2 to be the following

print(B2Wanted)
​
   StartNum  EndNum  Size Value
0         1       3     3     0
1         4       7     4     1
2         8       8     1     0
3         9      10     2     1
4         11     13     3     NaN

What do I need to change so that it works also with NaN?

CodePudding user response:

First fillna with a value this is not possible (here -1) before creating your grouper:

group = A['1or0'].fillna(-1).diff().ne(0).cumsum()

# or
# s = A['1or0'].fillna(-1)
# group = s.ne(s.shift()).cumsum()

B = (A.groupby(group, as_index=False)
      .agg(**{'StartNum': ('Index', 'first'),
              'EndNum': ('Index', 'last'),
              'Size': ('1or0', 'size'),
              'Value': ('1or0', 'first')
             })
    )

Output:

   StartNum  EndNum  Size  Value
0         1       3     3    0.0
1         4       7     4    1.0
2         8       8     1    0.0
3         9      10     2    1.0
4        11      13     3    NaN
  • Related