I have two dataframes:
df1 = pd.DataFrame({'Code' : ['10', '100', '1010'],
'Value' : [25, 50, 75]})
df2 = pd.DataFrame({'ID' : ['A', 'B', 'C'],
'Codes' : ['10', '100;1010', '100'],
'Value' : [25, 125, 50]})
Column "Codes" in df2 can contain multiple codes separated by ";". If this is the case, I need to sum up their values from df1.
I tried .map(), but this did not work for rows with multiple codes in a row. Also, I end up converting code '1010' to value '2525'.
How do I specify a perfect match and the summation for ";" separated values?
CodePudding user response:
explode()
the list of Codesmerge()
with df1 and calculate total, grouping on the index of df2- have created a new column with this calculated
df1 = pd.DataFrame({"Code": ["10", "100", "1010"], "Value": [25, 50, 75]})
df2 = pd.DataFrame(
{"ID": ["A", "B", "C"], "Codes": ["10", "100;1010", "100"], "Value": [25, 125, 50]}
)
df2.join(
df2["Codes"]
.str.split(";")
.explode()
.reset_index()
.merge(df1, left_on="Codes", right_on="Code")
.groupby("index")
.agg({"Value": "sum"}),
rsuffix="_calc",
)
ID | Codes | Value | Value_calc | |
---|---|---|---|---|
0 | A | 10 | 25 | 25 |
1 | B | 100;1010 | 125 | 125 |
2 | C | 100 | 50 | 50 |
CodePudding user response:
def sum(df1, df2):
df1['sum'] = df1['Value'] df2['Value']
print(df1)
df1.loc[df2['Codes'].isin(df1['Code'])].apply(sum(df1, df2))
If the code in df2 is in df1 theen add values
CodePudding user response:
We can make a lookup table of Code
to Value
mapping from df1
, then use .map()
on df2
to map the expanded list of Codes
to the mapping. Finally, sum up the mapped values for the same ID
to arrive at the desired value, as follows:
1. Make a lookup table of Code
to Value
mapping from df1
:
mapping = df1.set_index('Code')['Value']
2. Use .map()
on df2
to map the expanded list of Codes
to the mapping. Sum up the mapped values for the same ID
to arrive at the desired value:
df2a = df2.set_index('ID') # set `ID` as index
df2a['value_map'] = (
df2a['Codes'].str.split(';') # split by semicolon
.explode() # expand splitted values into rows
.map(mapping) # map Code from mapping
.groupby('ID').sum() # group sum by ID
)
df2 = df2a.reset_index() # reset `ID` from index back to data column
Result:
print(df2)
ID Codes Value value_map
0 A 10 25 25
1 B 100;1010 125 125
2 C 100 50 50