I have a data frame that looks like this:
id | tag | cnt |
---|---|---|
123 | Lorem | 34 |
123 | Ipsum | 12 |
456 | Ipsum | 10 |
456 | Dolor | 2 |
And another data frame that looks like this:
id | tags |
---|---|
123 | ['Ipsum','Lorem'] |
456 | ['Lorem', 'Dolor'] |
I need to find the index of each tag in df one in the list of tags in df two. So the new df one would look like:
id | tag | cnt | Rank |
---|---|---|---|
123 | Lorem | 34 | 2 |
123 | Ipsum | 12 | 1 |
456 | Ipsum | 10 | |
456 | Dolor | 2 | 2 |
CodePudding user response:
Use DataFrame.explode
with rename
for possible add Rank
column by GroupBy.cumcount
and append it to df1
by left join:
df = df2.explode('tags').rename(columns={'tags':'tag'})
df['Rank'] = df.groupby('id').cumcount().add(1)
df = df1.merge(df, how='left')
print (df)
id tag cnt Rank
0 123 Lorem 34 2.0
1 123 Ipsum 12 1.0
2 456 Ipsum 10 NaN
3 456 Dolor 2 2.0
df['Rank'] = df['Rank'].astype('Int64')
print (df)
id tag cnt Rank
0 123 Lorem 34 2
1 123 Ipsum 12 1
2 456 Ipsum 10 <NA>
3 456 Dolor 2 2
CodePudding user response:
You can do this via a simple lambda function as follows:
df = df1.merge(df2, on='id')
df['Rank'] = df.apply(lambda x: x.tags.index(x.tag) 1 if x.tag in x.tags else np.nan, axis=1).astype('Int64')
Resultant dataframe will look like this:
id tag cnt tags Rank
0 123 Lorem 34 [Ipsum, Lorem] 2
1 123 Ipsum 12 [Ipsum, Lorem] 1
2 456 Ipsum 10 [Lorem, Dolor] <NA>
3 456 Dolor 2 [Lorem, Dolor] 2
drop the tags column if you want with:
df.drop(columns = ['tags'])
and resultant dataframe looks like:
id tag cnt Rank
0 123 Lorem 34 2
1 123 Ipsum 12 1
2 456 Ipsum 10 <NA>
3 456 Dolor 2 2