Detect parent rows of each row and add their values to a new column-CodePudding

I have a dataframe that contains multiple hierarchy columns and a value attach to them, something like this:

	location hierarhcy	industry hierarchy	value
0	US	All	V1
1	US-California	All	V2
2	US-California	All-HT	V3
3	US-California-LA	All	V4
4	US-California-LA	All-HT	V5
5	US-California-LA	All-HT-SoftWare	V6
6	US-California-LA	All-Manufacturing	V7

Now I want to find the immediate parents of each row and add their value a a new column. My example just shows two parents but it could be more. So, the expected output would be as follows:

	location hierarhcy	industry hierarchy	value	value-parent-1	value-parent-2
0	US	All	V1	NA	NA
1	US-California	All	V2	V1	NA
2	US-California	All-HT	V3	V2	NA
3	US-California-LA	All	V4	V2	NA
4	US-California-LA	All-HT	V5	V3	V4
5	US-California-LA	All-HT-SoftWare	V6	V5	NA
6	US-California-LA	All-Manufacturing	V7	V4	NA

My Dataframe could be extremely large, so the speed and memory are important factors. So, row-wise process won't really work.

Logic:

The immediate parents for a row will be all rows that are one level higher than one of the hierarchies. For example for row 5, the potential immediate parents will be:

-	location hierarhcy	industry hierarchy
0	US-California	All-HT-SoftWare
1	US-California-LA	All-HT

This is the code to create the dataframe if you want to play with it.

import pandas as pd

df = pd.DataFrame({'location hierarchy': ['US', 'US-California', 'US-California', 'US-California-LA', 'US-California-LA', 'US-California-LA', 'US-California-LA'],
                   'industry hierarchy': ['All', 'All', 'All-HT', 'All', 'All-HT', 'All-HT-SoftWare', 'All-Manufacturing'],
                   'value': ['V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7']})

CodePudding user response：

Here you are:

df = pd.DataFrame({'Location Hierarchy': ['US', 'US-California', 'US-California', 'US-California-LA', 'US-California-LA', 'US-California-LA', 'US-California-LA'],
                   'Industry Hierarchy': ['All', 'All', 'All-HT', 'All', 'All-HT', 'All-HT-SoftWare', 'All-Manufacturing'],
                   'Value': ['V1', 'V2', 'V3', 'V4', 'V5', 'V6', 'V7']})

hierarchies = ["Location Hierarchy", "Industry Hierarchy"]
for i, hierarchy in enumerate(hierarchies): 
    df1 = df.copy()
    df1[hierarchy] = df1[hierarchy].str.split('-').str[:-1].str.join("-")
    df[f"Value_parent_{i 1}"] = df1.merge(df, on=hierarchies, how="left", suffixes=["", "_parent"])["Value_parent"]