Home > Blockchain >  I have a column in dataframe containing name of another column. How do I reference that column if I
I have a column in dataframe containing name of another column. How do I reference that column if I

Time:11-18

I have a dataframe with several columns showing percentage distributions.

I used df["MaxColumn"] = df.idxmax(axis=1) to identify the name of the column with the largest number for each row.

Now I need to do more calculations, i.e. add 1 to the cell with max value in each row.

I assumed that I could do something like df[df["MaxColumn"]] = 1, but I'm getting errors.

My dataframe:

  Level1 Level2 Level3 MaxColumn
1    0.5    0.3    0.1    Level1
2    0.2    0.1    0.6    Level3

What I want is to add 1 to the max value in each row and get this:

  Level1 Level2 Level3 MaxColumn
1    1.5    0.3    0.1    Level1
2    0.2    0.1    1.6    Level3

CodePudding user response:

To the best my knowledge, it is impossible indexing cells of DataFrame using the row and column indices.

In other words, your attempt to indexing cells whose indices are [0, 1] and [1, 3] are not proper.

So, I recommend to use for loop instead. Here is the example code.

import pandas as pd
df = pd.DataFrame([[0.5, 0.3, 0.1],
                  [0.2, 0.1, 0.6]],
                  columns = ["Level1", "Level2", "Level3"])

df["MaxColumn"] = df.idxmax(axis = 1)
for idx, value in enumerate(df["MaxColumn"]):
    df.loc[idx, value]  = 1

CodePudding user response:

You can do it like this:

# Find level columns
level_cols = [col for col in df.columns if col.lower().startswith('level')]

# Create a temporary dataframe and copy just the column name of each column into all the values of that column
temp_df = df.copy()
temp_df[level_cols] = level_cols

# Create a mask where each value is True if the column it's in is the one in MaxColumn for that row
mask = temp_df[level_cols].isin(temp_df['MaxColumn'])

# Change the datatype of the dataframe to object so that we can edit values using a mask
df = df.astype('object')

# Add 1 to the values selected by the mask
df[mask]  = 1

Output:

>>> df
  Level1 Level2 Level3 MaxColumn
1    1.5    0.3    0.1    Level1
2    0.2    0.1    1.6    Level3

CodePudding user response:

You can flip the columns (assuming the index is unique), filter for booleans, then add:

temp = df.pivot(columns='MaxColumn')
bools = temp.columns.get_level_values(0) == temp.columns.get_level_values(1)
temp.loc[:, bools] = temp.loc[:, bools]   1
temp.stack().reset_index('MaxColumn')

  MaxColumn  Level1 Level2  Level3
1    Level1     1.5   0.3.     0.1
2    Level3     0.2    0.1     1.6

You can also flip it the other way (wide to long), and compare before flipping back to wide :

(df.melt('MaxColumn')
   .assign(value = lambda df: np.where(df.MaxColumn == df.variable,
                                       df.value   1,
                                       df.value))
   .pivot('MaxColumn','variable','value')
   .rename_axis(columns=None)
   .reset_index()
   .reindex(columns=df.columns)
)
   Level1  Level2  Level3 MaxColumn
0     1.5     0.3     0.1    Level1
1     0.2     0.1     1.6    Level3
  • Related