python dataframe create one column based on another column-CodePudding

I would like to create another column in a dataframe.

The dataframe is like the following, sub_id is part of the id, say id is the 'parent' for sub_id, it includes id itself and some items included in id.

id has no name but sub_id has corresponding name

I would like to check id with sub_id's name, and then create id's name

df = pd.DataFrame({'id':[1,1,1,2,2],
                    'sub_id':[12,1,13,23,2],
                    'name':['pear','fruit','orange','cat','animal']})
   id  sub_id    name
0   1      12    pear
1   1       1   fruit
2   1      13  orange
3   2      23     cat
4   2       2  animal

I would like to create another column id_name, to get:

   id  sub_id    name id_name
0   1      12    pear   fruit
1   1       1   fruit   fruit
2   1      13  orange   fruit
3   2      23     cat  animal
4   2       2  animal  animal

I have no idea how it could be achieved efficiently, I only think of to merge the dataframe twice, but I think there is a better way.

CodePudding user response：

If replace not matched id with sub_id to misisng values in Series.where then GroupBy.transform with first working, because return first non missing values:

df['id_name'] = (df['name'].where(df['id'].eq(df['sub_id']))
                           .groupby(df['id'])
                           .transform('first'))

Or filter rows by mask and mapping helper Series by Series.map:

s = df[df['id'].eq(df['sub_id'])].set_index('id')['name']
df['id_name'] = df['id'].map(s)
print (df)
   id  sub_id    name id_name
0   1      12    pear   fruit
1   1       1   fruit   fruit
2   1      13  orange   fruit
3   2      23     cat  animal
4   2       2  animal  animal

Details:

print (df['name'].where(df['id'].eq(df['sub_id'])))
0       NaN
1     fruit
2       NaN
3       NaN
4    animal
Name: name, dtype: object


print (s)
id
1     fruit
2    animal
Name: name, dtype: object

CodePudding user response：

Are your IDs unique?

You use GroupBy.transform to get the min id per group and map this to the existing id:

df['id_name'] = (df.groupby('id')['sub_id'].transform('min')
                   .map(df.set_index('sub_id')['name'])
                )

output:

   id  sub_id    name id_name
0   1      12    pear   fruit
1   1       1   fruit   fruit
2   1      13  orange   fruit
3   2      23     cat  animal
4   2       2  animal  animal