Home > Software design >  Lookup value by index and name in Pandas
Lookup value by index and name in Pandas

Time:01-09

I have a pandas dataframe with a flattened hierarchy:

Level 1 ID Level 2 ID Level 3 ID Level 4 ID Name Path
1 null null null Finance Finance
1 4 null null Reporting Finance > Reporting
1 4 5 null Tax Reporting Finance > Reporting > Tax Reporting

What I want to do is add or replace with the Level ID columns with 4 Level Name columns based on the Level [] ID columns, like the following:

Level 1 Name Level 2 Name Level 3 Name Level 4 Name Name Path
Finance null null null Finance Finance
Finance Reporting null null Reporting Finance > Reporting
Finance Reporting Tax Reporting null Tax Reporting Finance > Reporting > Tax Reporting

I would use a separator on the Path column, but in the real dataframe, there are IDs instead of names (formatted like "1 > 4 > 5")

How should I approach this?

CodePudding user response:

The logic is unclear, in particular what is the source of the final values? See two different options below.

Assuming the source is df['Name']
cols = df.filter(like='Level ').columns
names = df['Name'].values
mask = df[cols[:len(names)]].notna()

df[cols[:len(names)]] = mask.mul(names, axis=1).where(mask)

Output:

  Level 1 ID Level 2 ID     Level 3 ID  Level 4 ID           Name                                 Path
0    Finance        NaN            NaN         NaN        Finance                              Finance
1    Finance  Reporting            NaN         NaN      Reporting                  Finance > Reporting
2    Finance  Reporting  Tax Reporting         NaN  Tax Reporting  Finance > Reporting > Tax Reporting
If you rather want to extract from "Path"
cols = df.filter(like='Level ').columns
names = df['Path'].str.split(' > ', expand=True)

df.loc[:, cols[:names.shape[1]]] = names.to_numpy()

Output:

  Level 1 ID Level 2 ID     Level 3 ID  Level 4 ID           Name                                 Path
0    Finance       None           None         NaN        Finance                              Finance
1    Finance  Reporting           None         NaN      Reporting                  Finance > Reporting
2    Finance  Reporting  Tax Reporting         NaN  Tax Reporting  Finance > Reporting > Tax Reporting

CodePudding user response:

You can create a mapping Series to resolve number -> name:

cols = df1.columns[df.columns.str.contains('Level \d  ID')]
idx = df.filter(like='Level').ffill(axis=1).iloc[:, -1].tolist()

sr = pd.Series(df['Name'].tolist(), index=idx)
df[cols] = df[cols].apply(lambda x: x.map(sr))

Output:

>>> df
  Level 1 ID Level 2 ID     Level 3 ID Level 4 ID           Name                                 Path
0    Finance        NaN            NaN        NaN        Finance                              Finance
1    Finance  Reporting            NaN        NaN      Reporting                  Finance > Reporting
2    Finance  Reporting  Tax Reporting        NaN  Tax Reporting  Finance > Reporting > Tax Reporting
  • Related