I have the following two dataframes:
prod_id land_ids
0 1 [1,2]
1 2 [1]
2 3 [2,3,4]
3 4 []
4 5 [3,4]
land_id land_desc
0 1 germany
1 2 austria
2 3 switzerland
3 4 italy
Bascially, I want all numbers in column land_ids to individually join the other df.
The result should look something like this:
prod_id land_ids list_land
0 1 [1,2] germany austria
1 2 [1] germany
2 3 [2,3,4] austria switzerland italy
3 4 []
4 5 [3,4] switzerland italy
Preferrably, the column list_land is one string where the lands are concatenated. But I would also be fine with getting a list as a result.
Any idea on how to do this?
Here is my code for creating the df:
data_prod = {'prod_id': [1,2,3,4,5], 'land_ids': [[1,2],[1],[2,3,4],[1,3],[3,4]]}
prod_df = pd.DataFrame(data_prod)
data_land = {'land_id': [1,2,3,4], 'land_desc': ['germany', 'austria', 'switzerland', 'italy']}
land_df = pd.DataFrame(data_land)
EDIT: what do I have to add if one value of land_ids is empty?
CodePudding user response:
you can use the apply
method:
prod_df['list_land'] = prod_df['land_ids'].apply(lambda x: [land_df.loc[land_df['land_id'] == y]['land_ids'].values[0] for y in x])
In this case, the list_land
column is a list. You can use the following code if you want it to be a string.
prod_df['list_land'] = prod_df['land_ids'].apply(lambda x: ' '.joind([land_df.loc[land_df['land_id'] == y]['land_ids'].values[0] for y in x]))
CodePudding user response:
df1 = pd.DataFrame({"prod_id":[1,2,3,4,5],"land_ids":[[1,2],[1],[2,3,4],[1,3],[3,4]]})
df2 = pd.DataFrame({"land_id":[1,2,3,4],"land_ids":["germany","austria","switzerland","italy"]})
df2 = df2.set_index('land_id', drop=True)
df1['list_land'] = df1['land_ids'].apply(lambda x: [df2.at[ids, 'land_desc'] for ids in x])
If you want to get list_land as a string, than you can do like this.
df1['list_land'] = df1['land_ids'].apply(lambda x: " ".join([df2.at[ids, 'land_desc'] for ids in x]))
CodePudding user response:
Maybe something like this:
import pandas as pd
df1 = pd.DataFrame({"prod_id":[1,2,3,4,5],"land_ids":[[1,2],[1],[2,3,4],[1,3],[3,4]]})
df2 = pd.DataFrame({"land_id":[1,2,3,4],"land_ids":["germany","austria","switzerland","italy"]})
list_land = []
for index, row in df1.iterrows():
list_land.append([row2.land_ids for land_id in row["land_ids"] for _, row2 in df2.iterrows() if row2.land_id == land_id])
df1["list_land"] = list_land