Have got two dataframe like below with index values: tab_df
row table_no table_bit
83 45 1 1
84 45 1 2
85 45 1 3
86 45 2 1
87 45 2 2
88 45 2 3
91 46 1 1
92 46 1 2
93 46 1 3
item_df
count Item
15 2 Apple
16 2 Apple
17 1 Lemon
18 2 Apricot
19 2 Apricot
35 2 Melon
36 2 Melon
37 1 Orange
38 1 Potato
Condition to be considered:
tab_df
rows along with index value should not be jumbled. It's fixed.item_df
rows to be concat/joined withtab_df
to attainoutput_df
considering below rules: (item_df
data will be in row-table_no-table_bit order. Each row can have any number of table_no. Each table_no will have only 3 table_bit.) 2.1.item_df
rows to be arranged/joined in the same order withtab_df
the number of times it repeats. [Column count gives number of times that particular Item repeats]. E.g., Item Apple repeats twice so it takes two bits from first row, then comes Lemon which takes single bit from same row and continues. 2.2. No Item should sharetable_bit
across rows. E.g., On arranging up items with rule 2.1, at the end of row 45, after Apricot, Melon need to be arranged but that row got only single table_bit left out. So, need to check for next Item which needs only single bit and that to be arranged there and Melon should be moved to other row and carry on with the same flow as of rule 2.1 2.3. Each Item initem_df
got it's respective index which should not be changed. If any item is rearranged like Melon same index need to be carried out for every item.
output_df
tab_index row table_no table_bit item_index count Item
83 45 1 1 15 2 Apple
84 45 1 2 16 2 Apple
85 45 1 3 17 1 Lemon
86 45 2 1 18 2 Apricot
87 45 2 2 19 2 Apricot
88 45 2 3 37 1 Orange
91 46 1 1 35 2 Melon
92 46 1 2 36 2 Melon
93 46 1 3 38 1 Potato
Looking forward. Any help will be appreciated!
CodePudding user response:
I feel like there ought to be an easier way to do this, but if I understand you correctly, the following should work.
Setup
import pandas as pd
data = {'row': {83: 45, 84: 45, 85: 45, 86: 45, 87: 45, 88: 45, 91: 46, 92: 46, 93: 46},
'table_no': {83: 1, 84: 1, 85: 1, 86: 2, 87: 2, 88: 2, 91: 1, 92: 1, 93: 1},
'table_bit': {83: 1, 84: 2, 85: 3, 86: 1, 87: 2, 88: 3, 91: 1, 92: 2, 93: 3}}
tab_df = pd.DataFrame(data)
data2 = {'count': {15: 2, 16: 2, 17: 1, 18: 2, 19: 2, 35: 2, 36: 2, 37: 1, 38: 1},
'Item': {15: 'Apple', 16: 'Apple', 17: 'Lemon', 18: 'Apricot', 19: 'Apricot', 35: 'Melon', 36: 'Melon', 37: 'Orange', 38: 'Potato'}}
item_df = pd.DataFrame(data2)
Code
tmp = item_df.reset_index().groupby('Item', as_index=False).agg(
{'index': list, 'count': 'first'})
tmp = tmp.loc[tmp.groupby('count').cumcount().sort_values().index].explode('index')
tmp = tmp.loc[:, ['index','count','Item']].rename(columns={'index':'item_index'})
tab_df[tmp.columns] = tmp.to_numpy()
print(tab_df)
row table_no table_bit item_index count Item
83 45 1 1 15 2 Apple
84 45 1 2 16 2 Apple
85 45 1 3 17 1 Lemon
86 45 2 1 18 2 Apricot
87 45 2 2 19 2 Apricot
88 45 2 3 37 1 Orange
91 46 1 1 35 2 Melon
92 46 1 2 36 2 Melon
93 46 1 3 38 1 Potato
Explanation
- First, use
df.reset_index
and applydf.groupby
onItem
. Use.agg
to turnindex
values (now a column) per group into lists (e.g. forApple
:[15, 16]
), and for columncount
simply get.first
(e.g. forApple
:2
). - In the next step, apply
df.groupby
again, this time oncount
(now:[2, 2, 1, 2, 1, 1]
associated with['Apple', 'Apricot', 'Lemon', 'Melon', 'Orange', 'Potato']
), and get.cumcount
. Sorting these values usingSeries.sort_values
gets us a rearranged index[0, 2, 1, 4, 3, 5]
. We want the original index order adjusted in this way, which we can achieve by nesting the above insidetmp.loc
. Chain.explode
on columnindex
to get the lists back as rows again. - The row order for
tmp
(derived fromitem_df
) is now correct; last step is just cosmetic: i.e. getting the columns in the desired order as well:tmp = tmp.loc[:, ['index','count','Item']]
, and chaining.rename
to change col nameindex
intoitem_index
. - Finally, add these columns to
tab_df
. Useto_numpy
, since we want to ignore the index oftmp
.