Home > Enterprise >  Update a DataFrame to internally sum its elements
Update a DataFrame to internally sum its elements

Time:10-23

I have a data frame (df), consisting of float64, that I want to essentially "fold on itself." It is currently indexed at 0-30 and I need it at 0-9, combining elements that end in the same digit (0 with 10, 20, and 30; 1 with 11, 21; etc.) So something like:

for i in df:
    for x in df[i]:
        df[i][x] = df[i][x]   df[i][x   10]   df[i][x   20]   df[i][x   30]

For some reason, I get a KeyError outputting the first [i][x]

CodePudding user response:

EDIT: this same technique can be applied to any column, not just the index. I originally interpreted the question to be asking about grouping the indices but I think I may have misunderstood that bit. OP, if you want to do this for the values in column x, then do df.groupby(df["x"] % 10).sum().

Solution

You can use groupby and modular arithmetic:

df.groupby(df.index % 10).sum()

Explanation

To show how this is working, take this sample dataframe:

In [3]: df
Out[3]:
       a     b
0   0.92  0.21
1   0.17  0.91
2   0.52  0.34
3   0.11  0.50
4   0.23  0.15
5   0.14  0.34
6   0.41  0.83
7   0.11  0.79
8   0.13  0.01
9   0.23  0.25
10  0.24  0.00
11  0.59  0.64
12  0.58  0.16
13  0.20  0.02
14  0.37  0.65
15  0.06  0.21
16  0.26  0.47
17  0.30  0.98
18  0.79  0.45
19  0.70  0.55

Using df.groupby(df.index % 10) to group indices which end with the same digit and printing the groups:

In [4]: df.groupby(df.index % 10).apply(print)
       a     b
0   0.92  0.21
10  0.24  0.00
       a     b
1   0.17  0.91
11  0.59  0.64
       a     b
2   0.52  0.34
12  0.58  0.16
       a     b
3   0.11  0.50
13  0.20  0.02
       a     b
4   0.23  0.15
14  0.37  0.65
       a     b
5   0.14  0.34
15  0.06  0.21
       a     b
6   0.41  0.83
16  0.26  0.47
       a     b
7   0.11  0.79
17  0.30  0.98
       a     b
8   0.13  0.01
18  0.79  0.45
       a     b
9   0.23  0.25
19  0.70  0.55

Now instead of .apply(print) simply .sum() instead to get the sum of each group:

In [5]: df.groupby(df.index % 10).sum()
Out[5]:
      a     b
0  1.16  0.21
1  0.76  1.55
2  1.10  0.50
3  0.31  0.52
4  0.60  0.80
5  0.20  0.55
6  0.67  1.30
7  0.41  1.77
8  0.92  0.46
9  0.93  0.80
  • Related