I have the following df which contains 2 types of information. The first one is the characteristics of the item (some are strings and others are integers). The other type is regarding emission values of the said item (in a float format).
Charact. 1 | Charact. 2 | Charact. 3 | Emission 1 | Emission 2 |
---|---|---|---|---|
1998 | AB | C | 1 | 2 |
1998 | AB | C | 3 | 4 |
2000 | AB | C | 1 | 2 |
2001 | DE | F | 1 | 2 |
2001 | DE | F | 3 | 4 |
I would like to combine the items which have the same 3 characteristics and get the mean value of the 2 emissions to get the following df :
Charact. 1 | Charact. 2 | Charact. 3 | Emission 1 | Emission 2 |
---|---|---|---|---|
1998 | AB | C | 2 | 3 |
2000 | AB | C | 1 | 2 |
2001 | DE | F | 2 | 3 |
I have tried this line of code to get it to work but it gives me an error
df.groupby(['Charact. 1', 'Charact. 2', 'Charact. 3'], as_index=False).agg({'Emission 1': 'mean', 'Emission 2': 'mean',})
The specific error says : ValueError: Length of values (10345) does not match length of index (10687600)
CodePudding user response:
df.groupby(['Charact. 1','Charact. 2', 'Charact. 3'])[['Emission 1','Emission 2']].mean()
Emission 1 Emission 2
Charact. 1 Charact. 2 Charact. 3
1998 AB C 2.0 3.0
2000 AB C 1.0 2.0
2001 DE F 2.0 3.0
CodePudding user response:
columns = ["1","2","3","E1","E2"]
row1 = ["1998",
"1998",
"2000",
"2001",
"2001"]
row2 = ["AB",
"AB",
"AB",
"DE",
"DE"]
row3 = ["C",
"C",
"C",
"F",
"F"]
row4 = [1,
3,
1,
1,
3]
row5 = [2,
4,
2,
2,
4]
df = pd.DataFrame([row1, row2, row3, row4, row5]).T
df.columns = columns
df.groupby(["1","2","3"]).agg('mean').reset_index()
results in the way that you want
CodePudding user response:
This worked for me:
df = pd.DataFrame({'c1': [1998, 1998, 2000, 2001, 2001],
'c2': ['AB', 'AB', 'AB', 'DE', 'DE'],
'c3': ['C', 'C', 'C', 'F', 'F'],
'e1': [1, 3, 1, 1, 3],
'e2': [2, 4, 2, 2, 4]})
print(df.groupby(['c1','c2','c3'], as_index=False).mean())
# Output:
# c1 c2 c3 e1 e2
# 0 1998 AB C 2 3
# 1 2000 AB C 1 2
# 2 2001 DE F 2 3
Edit: This also worked for me, so I'm not sure where exactly the problem lies in your code-- perhaps the DataFrame is structured somewhat differently compared to what your question implies?
df = pd.DataFrame({'c1': [1998, 1998, 2000, 2001, 2001],
'c2': ['AB', 'AB', 'AB', 'DE', 'DE'],
'c3': ['C', 'C', 'C', 'F', 'F'],
'e1': [1, 3, 1, 1, 3],
'e2': [2, 4, 2, 2, 4]})
print(df.groupby(['c1','c2','c3'], as_index=False).agg({'e1': 'mean', 'e2': 'mean',}))
# Output:
# c1 c2 c3 e1 e2
# 0 1998 AB C 2 3
# 1 2000 AB C 1 2
# 2 2001 DE F 2 3