Calculation between python pandas DataFrames, and store the result to dictionary-CodePudding

I'm trying some mapping and calculation between dataframes.

Is there any examples or anyone can help how to use python code to do this?

I've 2 dataframes:

products    components          
    c1  c2  c3  c4
p1  1   0   1   0
p2  0   1   1   0
p3  1   0   0   1
p4  0   1   0   1

        items cost          
components  i1  i2  i3  i4
c1      0   10  30  0
c2      20  10  0   0
c3      0   0   10  15
c4      20  0   0   30

The end results should be a dictionary contains the sum of the cost for each components and find the maximum:

{p1: [c1,c3] }  ->  {p1: [i2 i3,i3 i4] }    ->  {p1: [40,25] } -> {p1: 40 }
{p2: [c2,c3] }  ->  {p2: [i1 i2,i3 i4] }    ->  {p2: [30,25] } -> {p2: 30 }
{p3: [c1,c4] }  ->  {p3: [i2 i3,i1 i4] }    ->  {p3: [40,50] } -> {p3: 50 }
{p4: [c2,c4] }  ->  {p4: [i1 i2,i1 i4] }    ->  {p4: [30,50] } -> {p4: 50 }

CodePudding user response：

Try (df1 is your first DataFrame, df2 the second):

print(df1.apply(lambda x: df2.loc[x[x.eq(1)].index].sum(axis=1).max(), axis=1))

Prints:

p1    40
p2    30
p3    50
p4    50
dtype: int64

To store the result to dictionary:

out = dict(
    df1.apply(lambda x: df2.loc[x[x.eq(1)].index].sum(axis=1).max(), axis=1)
)
print(out)

Prints:

{'p1': 40, 'p2': 30, 'p3': 50, 'p4': 50}

CodePudding user response：

You can use itertools.compress() on products data to find components required:

prod_df["comps"] = prod_df.apply(lambda x: list(itertools.compress(x.index[1:], x.values[1:])), axis=1)

[Out]:
   product  c1  c2  c3  c4            comps
0       p1   1   0   1   0  [c1, c3, comps]
1       p2   0   1   1   0  [c2, c3, comps]
2       p3   1   0   0   1  [c1, c4, comps]
3       p4   0   1   0   1  [c2, c4, comps]

Then select respective component rows from components data and sum each row and filter max row:

prod_df["max_cost"] = prod_df.apply(lambda x: comp_df[comp_df["component"].isin(x["comps"])].iloc[:,1:].sum(axis=1).max(), axis=1)

[Out]:
  product  max_cost
0      p1        40
1      p2        30
2      p3        50
3      p4        50

Datasets used:

prod_data = [
    ("p1",1,0,1,0),
    ("p2",0,1,1,0),
    ("p3",1,0,0,1),
    ("p4",0,1,0,1),
]

prod_columns = ["product","c1","c2","c3","c4"]

prod_df = pd.DataFrame(data=prod_data, columns=prod_columns)


comp_data = [
    ("c1",0,10,30,0),
    ("c2",20,10,0,0),
    ("c3",0,0,10,15),
    ("c4",20,0,0,30),
]

comp_columns = ["component","i1","i2","i3","i4"]

comp_df = pd.DataFrame(data=comp_data, columns=comp_columns)