Home > Enterprise >  Lookup Values and sum values in cell pandas
Lookup Values and sum values in cell pandas

Time:09-23

I have two dataframes:

df1 = pd.DataFrame({'Code' : ['10', '100', '1010'],
                     'Value' : [25, 50, 75]})

df2 = pd.DataFrame({'ID' : ['A', 'B', 'C'],
                     'Codes' : ['10', '100;1010', '100'],
                     'Value' : [25, 125, 50]})

Column "Codes" in df2 can contain multiple codes separated by ";". If this is the case, I need to sum up their values from df1.

I tried .map(), but this did not work for rows with multiple codes in a row. Also, I end up converting code '1010' to value '2525'.

How do I specify a perfect match and the summation for ";" separated values?

CodePudding user response:

  • explode() the list of Codes
  • merge() with df1 and calculate total, grouping on the index of df2
  • have created a new column with this calculated
df1 = pd.DataFrame({"Code": ["10", "100", "1010"], "Value": [25, 50, 75]})

df2 = pd.DataFrame(
    {"ID": ["A", "B", "C"], "Codes": ["10", "100;1010", "100"], "Value": [25, 125, 50]}
)

df2.join(
    df2["Codes"]
    .str.split(";")
    .explode()
    .reset_index()
    .merge(df1, left_on="Codes", right_on="Code")
    .groupby("index")
    .agg({"Value": "sum"}),
    rsuffix="_calc",
)
ID Codes Value Value_calc
0 A 10 25 25
1 B 100;1010 125 125
2 C 100 50 50

CodePudding user response:

def sum(df1, df2):
    df1['sum'] = df1['Value']   df2['Value']
    print(df1)


df1.loc[df2['Codes'].isin(df1['Code'])].apply(sum(df1, df2))

If the code in df2 is in df1 theen add values

enter image description here

CodePudding user response:

We can make a lookup table of Code to Value mapping from df1, then use .map() on df2 to map the expanded list of Codes to the mapping. Finally, sum up the mapped values for the same ID to arrive at the desired value, as follows:

1. Make a lookup table of Code to Value mapping from df1:

mapping = df1.set_index('Code')['Value']

2. Use .map() on df2 to map the expanded list of Codes to the mapping. Sum up the mapped values for the same ID to arrive at the desired value:

df2a = df2.set_index('ID')             # set `ID` as index

df2a['value_map'] = (
    df2a['Codes'].str.split(';')       # split by semicolon 
                 .explode()            # expand splitted values into rows
                 .map(mapping)         # map Code from mapping                
                 .groupby('ID').sum()  # group sum by ID
    )

df2 = df2a.reset_index()               # reset `ID` from index back to data column

Result:

print(df2)

  ID     Codes  Value  value_map
0  A        10     25         25
1  B  100;1010    125        125
2  C       100     50         50
  • Related