I'm new to Python and had a question, would appreciate if someone could please help:
I have a dataset that looks like this:
Chem1 | ChemValue1 | Chem2 | ChemValue2 | Chem3 | ChemValue3 |
---|---|---|---|---|---|
Carbon Monoxide | 32 | Carbon Dioxide | 32 | Carbon | 45 |
Sulfur | 32 | Sulfur | 32 | Sulfur | 45 |
Carbon | 32 | Carbon | 32 | Iron | 45 |
Carbon | 32 | Iron | 32 | Iron | 45 |
If Chem1 equals Chem2 and/or Chem 3, I want to add up the corresponding ChemValue. So I get a dataset like this:
Chem1 | ChemValue1 | Chem2 | ChemValue2 | Chem3 | ChemValue3 |
---|---|---|---|---|---|
Carbon Monoxide | 32 | Carbon Dioxide | 32 | Carbon | 45 |
Sulfur | 109 | - | - | - | - |
Carbon | 64 | - | - | Iron | 45 |
Carbon | 32 | Iron | 77 | - | - |
Then I want to just get the Chem with the highest corresponding ChemValue (it's fine if I don't get the above table as an intemediary, just fine with getting the max aggregate Chem):
Highest Chem |
---|
Carbon Monoxide |
Sulfur |
Carbon |
Iron |
How can I do this in Python?
#here is the first dataset
data = {'Chem1': ['Carbon Monoxide','Sulfur','Carbon','Carbon'], 'ChemValue1': [32,32,32,32],'Chem2': ['Carbon Dioxide','Sulfur','Carbon','Iron'],'ChemValue2': [32,32,32,32],'Chem3': ['Carbon','Sulfur','Iron','Iron'],'ChemValue3': [45,45,45,45]}
df = pd.DataFrame(data)
CodePudding user response:
This do need more work than we think , the 2nd df can be done while we get the 1stdf
s = pd.wide_to_long(df.reset_index(),stubnames=['Chem','ChemValue'],i = 'index',j='v').reset_index()
#use pd.wide_to_long melt the df from row to column
secdf = s.groupby(['index','Chem'])['ChemValue'].sum().reset_index(level=0).groupby('index')['ChemValue'].idxmax()
# then we can do groupby combine the same value and get the max id of index
Out[274]:
index
0 Carbon
1 Sulfur
2 Carbon
3 Iron
Name: ChemValue, dtype: object
s = s.groupby(['index','Chem']).agg({'ChemValue':'sum','v':'first'}).reset_index(level=1).set_index('v',append=True).unstack().sort_index(axis=1,level=1)
s.columns = s.columns.map('{0[0]}{0[1]}'.format)
s
Out[288]:
Chem1 ChemValue1 ... Chem3 ChemValue3
index ...
0 Carbon Monoxide 32.0 ... Carbon 45.0
1 Sulfur 109.0 ... NaN NaN
2 Carbon 64.0 ... Iron 45.0
3 Carbon 32.0 ... NaN NaN
[4 rows x 6 columns]
CodePudding user response:
This should get the final result that the question asks for:
import pandas as pd
data = {'Chem1': ['Carbon Monoxide','Sulfur','Carbon','Carbon'], 'ChemValue1': [32,32,32,32],'Chem2': ['Carbon Dioxide','Sulfur','Carbon','Iron'],'ChemValue2': [32,32,32,32],'Chem3': ['Carbon','Sulfur','Iron','Iron'],'ChemValue3': [45,45,45,45]}
df = pd.DataFrame(data)
print(df)
from collections import defaultdict
highestChem, highestChemValue = [], []
def aggregateChems(x):
dct = defaultdict(int)
for k, v in (('Chem1', 'ChemValue1'), ('Chem2', 'ChemValue2'), ('Chem3', 'ChemValue3')):
dct[x[k]] = x[v]
highestChem.append(max(dct, key=lambda x: dct[x]))
highestChemValue.append(dct[highestChem[-1]])
df.apply(aggregateChems, axis = 1)
df2 = pd.DataFrame({'HighestChem' : highestChem, 'ChemValue' : highestChemValue})
print(f"\n{df2}")
df3 = df2['HighestChem'].to_frame()
print(f"\n{df3}")
Output:
Chem1 ChemValue1 Chem2 ChemValue2 Chem3 ChemValue3
0 Carbon Monoxide 32 Carbon Dioxide 32 Carbon 45
1 Sulfur 32 Sulfur 32 Sulfur 45
2 Carbon 32 Carbon 32 Iron 45
3 Carbon 32 Iron 32 Iron 45
HighestChem ChemValue
0 Carbon 45
1 Sulfur 109
2 Carbon 64
3 Iron 77
HighestChem
0 Carbon
1 Sulfur
2 Carbon
3 Iron
CodePudding user response:
I took a more manual approach that works fine for 3 Chem but won't scale well
c12 = df['Chem1'].eq(df['Chem2'])
c13 = df['Chem1'].eq(df['Chem3'])
df.loc[c12, 'ChemValue1'] = df['ChemValue1'] df['ChemValue2']
df.loc[c13, 'ChemValue1'] = df['ChemValue1'] df['ChemValue3']
c23 = df['Chem2'].eq(df['Chem3'])
df.loc[c23, 'ChemValue2'] = df['ChemValue2'] df['ChemValue3']
pd.wide_to_long(df.reset_index(),stubnames=['Chem','ChemValue'],i = 'index',j='k').reset_index(drop=True)
Chem ChemValue
0 Carbon Monoxide 32
1 Sulfur 109
2 Carbon 64
3 Carbon 32
4 Carbon Dioxide 32
5 Sulfur 77
6 Carbon 32
7 Iron 77
8 Carbon 45
9 Sulfur 45
10 Iron 45
11 Iron 45