I have a dataframe with several columns showing percentage distributions.
I used df["MaxColumn"] = df.idxmax(axis=1)
to identify the name of the column with the largest number for each row.
Now I need to do more calculations, i.e. add 1 to the cell with max value in each row.
I assumed that I could do something like df[df["MaxColumn"]] = 1
, but I'm getting errors.
My dataframe:
Level1 Level2 Level3 MaxColumn
1 0.5 0.3 0.1 Level1
2 0.2 0.1 0.6 Level3
What I want is to add 1 to the max value in each row and get this:
Level1 Level2 Level3 MaxColumn
1 1.5 0.3 0.1 Level1
2 0.2 0.1 1.6 Level3
CodePudding user response:
To the best my knowledge, it is impossible indexing cells of DataFrame using the row and column indices.
In other words, your attempt to indexing cells whose indices are [0, 1] and [1, 3] are not proper.
So, I recommend to use for loop instead. Here is the example code.
import pandas as pd
df = pd.DataFrame([[0.5, 0.3, 0.1],
[0.2, 0.1, 0.6]],
columns = ["Level1", "Level2", "Level3"])
df["MaxColumn"] = df.idxmax(axis = 1)
for idx, value in enumerate(df["MaxColumn"]):
df.loc[idx, value] = 1
CodePudding user response:
You can do it like this:
# Find level columns
level_cols = [col for col in df.columns if col.lower().startswith('level')]
# Create a temporary dataframe and copy just the column name of each column into all the values of that column
temp_df = df.copy()
temp_df[level_cols] = level_cols
# Create a mask where each value is True if the column it's in is the one in MaxColumn for that row
mask = temp_df[level_cols].isin(temp_df['MaxColumn'])
# Change the datatype of the dataframe to object so that we can edit values using a mask
df = df.astype('object')
# Add 1 to the values selected by the mask
df[mask] = 1
Output:
>>> df
Level1 Level2 Level3 MaxColumn
1 1.5 0.3 0.1 Level1
2 0.2 0.1 1.6 Level3
CodePudding user response:
You can flip the columns (assuming the index is unique), filter for booleans, then add:
temp = df.pivot(columns='MaxColumn')
bools = temp.columns.get_level_values(0) == temp.columns.get_level_values(1)
temp.loc[:, bools] = temp.loc[:, bools] 1
temp.stack().reset_index('MaxColumn')
MaxColumn Level1 Level2 Level3
1 Level1 1.5 0.3. 0.1
2 Level3 0.2 0.1 1.6
You can also flip it the other way (wide to long), and compare before flipping back to wide :
(df.melt('MaxColumn')
.assign(value = lambda df: np.where(df.MaxColumn == df.variable,
df.value 1,
df.value))
.pivot('MaxColumn','variable','value')
.rename_axis(columns=None)
.reset_index()
.reindex(columns=df.columns)
)
Level1 Level2 Level3 MaxColumn
0 1.5 0.3 0.1 Level1
1 0.2 0.1 1.6 Level3