Given df1 and df2:
df1:
id | item | sub_item |
---|---|---|
1 | 52 | AAA |
2 | 52 | AAA |
3 | 55 | BAA |
4 | 77 | CAA |
5 | 77 | CAA |
6 | 77 | CAA |
7 | 77 | CAB |
8 | 77 | CAC |
df2:
id | color |
---|---|
1 | #11 |
2 | #12 |
3 | #13 |
4 | #14 |
5 | #17 |
6 | #18 |
7 | #19 |
8 | #20 |
9 | #21 |
10 | #25 |
produce df_result:
id | item | sub_item | color |
---|---|---|---|
1 | 52 | AAA | #11 |
2 | 52 | AAA | #11 |
3 | 55 | BAA | #11 |
4 | 77 | CAA | #11 |
5 | 77 | CAA | #11 |
6 | 77 | CAA | #11 |
7 | 77 | CAB | #12 |
8 | 77 | CAC | #13 |
id
is unimportant. Each new sub-item, within an item group, gets new serial color number. Colors start re-assignment back to #11 for each new item. df2 will always be long enough to cover unique sub-items within an item group.
My intuition says to use groupby
on df1
, then merge
each group with df2
, but I need help with the code to make this happen.
CodePudding user response:
I'm not sure, but I think that's what you want
merged_df = pd.merge(
df1,df2, how="left", on="Id"
)
CodePudding user response:
Try:
df1["color"] = (df2.merge(df1["sub-item"].ne(df1.groupby("item")["sub-item"].transform("first"))
.cumsum()
.add(1)
.rename("id"),
how="right")["color"]
)
>>> df1
id item sub-item color
0 1 52 AAA #11
1 2 52 AAA #11
2 3 55 BAA #11
3 4 77 CAA #11
4 5 77 CAA #11
5 6 77 CAA #11
6 7 77 CAB #12
7 8 77 CAC #13