I have a data frame (df), consisting of float64, that I want to essentially "fold on itself." It is currently indexed at 0-30 and I need it at 0-9, combining elements that end in the same digit (0 with 10, 20, and 30; 1 with 11, 21; etc.) So something like:
for i in df:
for x in df[i]:
df[i][x] = df[i][x] df[i][x 10] df[i][x 20] df[i][x 30]
For some reason, I get a KeyError outputting the first [i][x]
CodePudding user response:
EDIT: this same technique can be applied to any column, not just the index. I originally interpreted the question to be asking about grouping the indices but I think I may have misunderstood that bit. OP, if you want to do this for the values in column x
, then do df.groupby(df["x"] % 10).sum()
.
Solution
You can use groupby and modular arithmetic:
df.groupby(df.index % 10).sum()
Explanation
To show how this is working, take this sample dataframe:
In [3]: df
Out[3]:
a b
0 0.92 0.21
1 0.17 0.91
2 0.52 0.34
3 0.11 0.50
4 0.23 0.15
5 0.14 0.34
6 0.41 0.83
7 0.11 0.79
8 0.13 0.01
9 0.23 0.25
10 0.24 0.00
11 0.59 0.64
12 0.58 0.16
13 0.20 0.02
14 0.37 0.65
15 0.06 0.21
16 0.26 0.47
17 0.30 0.98
18 0.79 0.45
19 0.70 0.55
Using df.groupby(df.index % 10)
to group indices which end with the same digit and printing the groups:
In [4]: df.groupby(df.index % 10).apply(print)
a b
0 0.92 0.21
10 0.24 0.00
a b
1 0.17 0.91
11 0.59 0.64
a b
2 0.52 0.34
12 0.58 0.16
a b
3 0.11 0.50
13 0.20 0.02
a b
4 0.23 0.15
14 0.37 0.65
a b
5 0.14 0.34
15 0.06 0.21
a b
6 0.41 0.83
16 0.26 0.47
a b
7 0.11 0.79
17 0.30 0.98
a b
8 0.13 0.01
18 0.79 0.45
a b
9 0.23 0.25
19 0.70 0.55
Now instead of .apply(print)
simply .sum()
instead to get the sum of each group:
In [5]: df.groupby(df.index % 10).sum()
Out[5]:
a b
0 1.16 0.21
1 0.76 1.55
2 1.10 0.50
3 0.31 0.52
4 0.60 0.80
5 0.20 0.55
6 0.67 1.30
7 0.41 1.77
8 0.92 0.46
9 0.93 0.80