Iterate over specific rows, sum results and store in new row-CodePudding

I have a DataFrame in which I have already defined rows to be summed up and store the results in a new row.

For example in Year 1990:

Category	A	B	C	D	Year
E	147	78	476	531	1990
F	914	356	337	781	1990
G	117	874	15	69	1990
H	45	682	247	65	1990
I	20	255	465	19	1990

Here, the rows G - H should be summed up and the results stored in a new row. The same categories repeat every year from 1990 - 2019

I have already tried it with .iloc e.g. [4:8], [50:54] [96:100] and so on, but with iloc I can not specify multiple index. I can't manage to make a loop over the single years. Is there a way to sum the values in categories (G-H) for each year (1990 -2019)?

CodePudding user response：

If you are only interested in G/H, you can slice with isin combined with boolean indexing, then sum:

df[df['Category'].isin(['G', 'H'])].sum()

output:

Category      GH
A            162
B           1556
C            262
D            134
Year        3980
dtype: object

NB. note here the side effect of sum that combines the two "G"/"H" strings into one "GH".

Or, better, set Category as index and slice with loc:

df.set_index('Category').loc[['G', 'H']].sum()

output:

A        162
B       1556
C        262
D        134
Year    3980
dtype: int64

CodePudding user response：

I'm not sure the multiple index what you mean.

It usually appear after some group and aggregate function.

At your table, it looks just multiple column

So, if I understand correctly.

Here a complete code to show how to use the multiple condition of DataFrame

import io

import pandas as pd

data = """Category  A   B   C   D   Year
E   147 78  476 531 1990
F   914 356 337 781 1990
G   117 874 15  69  1990
H   45  682 247 65  1990
I   20  255 465 19  1990"""

table = pd.read_csv(io.StringIO(data), delimiter="\t")

years = table["Year"].unique()
for year in years:
    row = table[((table["Category"] == "G") | (table["Category"] == "H")) & (table["Year"] == year)]
    row = row[["A", "B", "C", "D"]].sum()
    row["Category"], row["Year"] = "sum", year
    table = table.append(row, ignore_index=True)