I have a data frame that looks like this
Name | Order |
---|---|
Manufacturer | 0 |
Company1 | 1 |
product | 2 |
Company2 | 1 |
product | 2 |
product | 2 |
product | 2 |
the only identifier for the value in the Names column is the order, where 0 represents Manufacturers, 1 represents companies, and 2 represents products.
And I want to add a column with a value-based in comparison between previous and current rows under the same group.
basically, I want to identify that company1 relates to Manufacturer1, and product1 relates to company1, etc...
Name | Order | Desired_Output |
---|---|---|
Manufacturer | 0 | Manufacturer |
Company1 | 1 | Manufacturer_Company1 |
product | 2 | Company1_product |
Company2 | 1 | Manufacturer_Company2 |
product | 2 | Company2_product |
product | 2 | Company2_product |
product | 2 | Company2_product |
CodePudding user response:
This one is a little tricky but will do the job
df['output'] = df.apply(lambda x: df[df['Order'] == x['Order'] -1]['Name'].iloc[-1] '_' x['Name'] if x['Order'] > 0 else x['Name'],axis=1)
However, that's not the most clever solution for large dfs
CodePudding user response:
You can pivot
the data, ffill
and join the last two items:
df['output'] = (df
.reset_index()
.pivot(index='index', columns='Order', values='Name')
.ffill()
.apply(lambda d: '_'.join(d.dropna().iloc[-2:]), axis=1)
)
NB. This should work with any number of 0 values.
output:
Name Order output
0 Manufacturer 0 Manufacturer
1 Company1 1 Manufacturer_Company1
2 product 2 Company1_product
3 Company2 1 Company2_product
4 product 2 Company2_product
5 product 2 Company2_product
6 product 2 Company2_product