Lookup value by index and name in Pandas-CodePudding

I have a pandas dataframe with a flattened hierarchy:

Level 1 ID	Level 2 ID	Level 3 ID	Level 4 ID	Name	Path
1	null	null	null	Finance	Finance
1	4	null	null	Reporting	Finance > Reporting
1	4	5	null	Tax Reporting	Finance > Reporting > Tax Reporting

What I want to do is add or replace with the Level ID columns with 4 Level Name columns based on the Level [] ID columns, like the following:

Level 1 Name	Level 2 Name	Level 3 Name	Level 4 Name	Name	Path
Finance	null	null	null	Finance	Finance
Finance	Reporting	null	null	Reporting	Finance > Reporting
Finance	Reporting	Tax Reporting	null	Tax Reporting	Finance > Reporting > Tax Reporting

I would use a separator on the Path column, but in the real dataframe, there are IDs instead of names (formatted like "1 > 4 > 5")

How should I approach this?

CodePudding user response：

The logic is unclear, in particular what is the source of the final values? See two different options below.

Assuming the source is `df['Name']`

cols = df.filter(like='Level ').columns
names = df['Name'].values
mask = df[cols[:len(names)]].notna()

df[cols[:len(names)]] = mask.mul(names, axis=1).where(mask)

Output:

  Level 1 ID Level 2 ID     Level 3 ID  Level 4 ID           Name                                 Path
0    Finance        NaN            NaN         NaN        Finance                              Finance
1    Finance  Reporting            NaN         NaN      Reporting                  Finance > Reporting
2    Finance  Reporting  Tax Reporting         NaN  Tax Reporting  Finance > Reporting > Tax Reporting

If you rather want to extract from "Path"

cols = df.filter(like='Level ').columns
names = df['Path'].str.split(' > ', expand=True)

df.loc[:, cols[:names.shape[1]]] = names.to_numpy()

Output:

  Level 1 ID Level 2 ID     Level 3 ID  Level 4 ID           Name                                 Path
0    Finance       None           None         NaN        Finance                              Finance
1    Finance  Reporting           None         NaN      Reporting                  Finance > Reporting
2    Finance  Reporting  Tax Reporting         NaN  Tax Reporting  Finance > Reporting > Tax Reporting

CodePudding user response：

You can create a mapping Series to resolve number -> name:

cols = df1.columns[df.columns.str.contains('Level \d  ID')]
idx = df.filter(like='Level').ffill(axis=1).iloc[:, -1].tolist()

sr = pd.Series(df['Name'].tolist(), index=idx)
df[cols] = df[cols].apply(lambda x: x.map(sr))

Output:

>>> df
  Level 1 ID Level 2 ID     Level 3 ID Level 4 ID           Name                                 Path
0    Finance        NaN            NaN        NaN        Finance                              Finance
1    Finance  Reporting            NaN        NaN      Reporting                  Finance > Reporting
2    Finance  Reporting  Tax Reporting        NaN  Tax Reporting  Finance > Reporting > Tax Reporting

Assuming the source is df['Name']

If you rather want to extract from "Path"

Assuming the source is `df['Name']`