I would like to create another column in a dataframe.
The dataframe is like the following, sub_id is part of the id, say id is the 'parent' for sub_id, it includes id itself and some items included in id.
id has no name but sub_id has corresponding name
I would like to check id with sub_id's name, and then create id's name
df = pd.DataFrame({'id':[1,1,1,2,2],
'sub_id':[12,1,13,23,2],
'name':['pear','fruit','orange','cat','animal']})
id sub_id name
0 1 12 pear
1 1 1 fruit
2 1 13 orange
3 2 23 cat
4 2 2 animal
I would like to create another column id_name, to get:
id sub_id name id_name
0 1 12 pear fruit
1 1 1 fruit fruit
2 1 13 orange fruit
3 2 23 cat animal
4 2 2 animal animal
I have no idea how it could be achieved efficiently, I only think of to merge the dataframe twice, but I think there is a better way.
CodePudding user response:
If replace not matched id
with sub_id
to misisng values in Series.where
then GroupBy.transform
with first
working, because return first non missing values:
df['id_name'] = (df['name'].where(df['id'].eq(df['sub_id']))
.groupby(df['id'])
.transform('first'))
Or filter rows by mask and mapping helper Series by Series.map
:
s = df[df['id'].eq(df['sub_id'])].set_index('id')['name']
df['id_name'] = df['id'].map(s)
print (df)
id sub_id name id_name
0 1 12 pear fruit
1 1 1 fruit fruit
2 1 13 orange fruit
3 2 23 cat animal
4 2 2 animal animal
Details:
print (df['name'].where(df['id'].eq(df['sub_id'])))
0 NaN
1 fruit
2 NaN
3 NaN
4 animal
Name: name, dtype: object
print (s)
id
1 fruit
2 animal
Name: name, dtype: object
CodePudding user response:
Are your IDs unique?
You use GroupBy.transform
to get the min
id per group and map
this to the existing id
:
df['id_name'] = (df.groupby('id')['sub_id'].transform('min')
.map(df.set_index('sub_id')['name'])
)
output:
id sub_id name id_name
0 1 12 pear fruit
1 1 1 fruit fruit
2 1 13 orange fruit
3 2 23 cat animal
4 2 2 animal animal