I have a column in dataframe containing name of another column. How do I reference that column if I-CodePudding

I have a dataframe with several columns showing percentage distributions.

I used df["MaxColumn"] = df.idxmax(axis=1) to identify the name of the column with the largest number for each row.

Now I need to do more calculations, i.e. add 1 to the cell with max value in each row.

I assumed that I could do something like df[df["MaxColumn"]] = 1, but I'm getting errors.

My dataframe:

  Level1 Level2 Level3 MaxColumn
1    0.5    0.3    0.1    Level1
2    0.2    0.1    0.6    Level3

What I want is to add 1 to the max value in each row and get this:

  Level1 Level2 Level3 MaxColumn
1    1.5    0.3    0.1    Level1
2    0.2    0.1    1.6    Level3

CodePudding user response：

To the best my knowledge, it is impossible indexing cells of DataFrame using the row and column indices.

In other words, your attempt to indexing cells whose indices are [0, 1] and [1, 3] are not proper.

So, I recommend to use for loop instead. Here is the example code.

import pandas as pd
df = pd.DataFrame([[0.5, 0.3, 0.1],
                  [0.2, 0.1, 0.6]],
                  columns = ["Level1", "Level2", "Level3"])

df["MaxColumn"] = df.idxmax(axis = 1)
for idx, value in enumerate(df["MaxColumn"]):
    df.loc[idx, value]  = 1

CodePudding user response：

You can do it like this:

# Find level columns
level_cols = [col for col in df.columns if col.lower().startswith('level')]

# Create a temporary dataframe and copy just the column name of each column into all the values of that column
temp_df = df.copy()
temp_df[level_cols] = level_cols

# Create a mask where each value is True if the column it's in is the one in MaxColumn for that row
mask = temp_df[level_cols].isin(temp_df['MaxColumn'])

# Change the datatype of the dataframe to object so that we can edit values using a mask
df = df.astype('object')

# Add 1 to the values selected by the mask
df[mask]  = 1

Output:

>>> df
  Level1 Level2 Level3 MaxColumn
1    1.5    0.3    0.1    Level1
2    0.2    0.1    1.6    Level3

CodePudding user response：

You can flip the columns (assuming the index is unique), filter for booleans, then add:

temp = df.pivot(columns='MaxColumn')
bools = temp.columns.get_level_values(0) == temp.columns.get_level_values(1)
temp.loc[:, bools] = temp.loc[:, bools]   1
temp.stack().reset_index('MaxColumn')

  MaxColumn  Level1 Level2  Level3
1    Level1     1.5   0.3.     0.1
2    Level3     0.2    0.1     1.6

You can also flip it the other way (wide to long), and compare before flipping back to wide :

(df.melt('MaxColumn')
   .assign(value = lambda df: np.where(df.MaxColumn == df.variable,
                                       df.value   1,
                                       df.value))
   .pivot('MaxColumn','variable','value')
   .rename_axis(columns=None)
   .reset_index()
   .reindex(columns=df.columns)
)
   Level1  Level2  Level3 MaxColumn
0     1.5     0.3     0.1    Level1
1     0.2     0.1     1.6    Level3