I have a DataFrame
in which I have already defined rows to be summed up and store the results in a new row.
For example in Year 1990:
Category | A | B | C | D | Year |
---|---|---|---|---|---|
E | 147 | 78 | 476 | 531 | 1990 |
F | 914 | 356 | 337 | 781 | 1990 |
G | 117 | 874 | 15 | 69 | 1990 |
H | 45 | 682 | 247 | 65 | 1990 |
I | 20 | 255 | 465 | 19 | 1990 |
Here, the rows G - H should be summed up and the results stored in a new row. The same categories repeat every year from 1990 - 2019
I have already tried it with .iloc
e.g. [4:8], [50:54] [96:100] and so on, but with iloc I can not specify multiple index. I can't manage to make a loop over the single years.
Is there a way to sum the values in categories (G-H) for each year (1990 -2019)?
CodePudding user response:
If you are only interested in G/H, you can slice with isin
combined with boolean indexing, then sum
:
df[df['Category'].isin(['G', 'H'])].sum()
output:
Category GH
A 162
B 1556
C 262
D 134
Year 3980
dtype: object
NB. note here the side effect of sum
that combines the two "G"/"H" strings into one "GH".
Or, better, set Category as index and slice with loc
:
df.set_index('Category').loc[['G', 'H']].sum()
output:
A 162
B 1556
C 262
D 134
Year 3980
dtype: int64
CodePudding user response:
I'm not sure the multiple index
what you mean.
It usually appear after some group
and aggregate
function.
At your table, it looks just multiple column
So, if I understand correctly.
Here a complete code to show how to use the multiple condition of DataFrame
import io
import pandas as pd
data = """Category A B C D Year
E 147 78 476 531 1990
F 914 356 337 781 1990
G 117 874 15 69 1990
H 45 682 247 65 1990
I 20 255 465 19 1990"""
table = pd.read_csv(io.StringIO(data), delimiter="\t")
years = table["Year"].unique()
for year in years:
row = table[((table["Category"] == "G") | (table["Category"] == "H")) & (table["Year"] == year)]
row = row[["A", "B", "C", "D"]].sum()
row["Category"], row["Year"] = "sum", year
table = table.append(row, ignore_index=True)