Consider two dataframes:
>> import pandas as pd
>> df1 = pd.DataFrame({"category": ["foo", "foo", "bar", "bar", "bar"], "quantity": [1,2,1,2,3]})
>> print(df1)
category quantity
0 foo 1
1 foo 2
2 bar 1
3 bar 2
4 bar 3
>> df2 = pd.DataFrame({
"category": ["foo", "foo", "foo", "foo", "bar", "bar", "bar", "bar", "bar", "bar"],
"item": ["A", "B", "C", "D", "E", "F", "G", "H", "I", "J"]
})
>> print(df2)
category item
0 foo A
1 foo B
2 foo C
3 foo D
4 bar E
5 bar F
6 bar G
7 bar H
8 bar I
9 bar J
How can I create a new column in df1
(new dataframe called df3
) which joins on category
column of df1
and allocates the item
column in df2
. So, create something like:
>> df3 = pd.DataFrame({
"category": ["foo", "foo", "bar", "bar", "bar"],
"quantity": [1,2,1,2,3],
"item": ["A", "B,C", "E", "F,G", "H,I,J"]
})
category quantity item
0 foo 1 A
1 foo 2 B,C
2 bar 1 E
3 bar 2 F,G
4 bar 3 H,I,J
CodePudding user response:
You can create helper DataFrame by repeat rows by quantity
column by Index.repeat
with DataFrame.loc
, convert index to column for avoid lost indices
and create helper column g
in both DataFrames for merging by duplicated categories
by GroupBy.cumcount
, then use DataFrame.merge
with aggregate join
:
df11 = (df1.loc[df1.index.repeat(df1['quantity'])].reset_index()
.assign(g = lambda x: x.groupby('category').cumcount()))
df22 = df2.assign(g = df2.groupby('category').cumcount())
df = (df11.merge(df22, on=['g','category'], how='left')
.groupby(['index','category','quantity'])['item']
.agg(lambda x: ','.join(x.dropna()))
.droplevel(0)
.reset_index())
print (df)
category quantity item
0 foo 1 A
1 foo 2 B,C
2 bar 1 E
3 bar 2 F,G
4 bar 3 H,I,J
CodePudding user response:
You can use an iterator with itertools.islice
:
from itertools import islice
# aggregate the items as iterator
s = df2.groupby('category')['item'].agg(iter)
# for each category, allocate as many items as needed and join
df1['item'] = (df1.groupby('category', group_keys=False)['quantity']
.apply(lambda g:
g.map(lambda x: ','.join(list(islice(s[g.name], x)))))
)
Output:
category quantity item
0 foo 1 A
1 foo 2 B,C
2 bar 1 E
3 bar 2 F,G
4 bar 3 H,I,J
Note that if you don't have enough items, this will just use what is available.
Example using df2
truncated after F as input:
category quantity item
0 foo 1 A
1 foo 2 B,C
2 bar 1 E
3 bar 2 F
4 bar 3