Replace singular value on second level of pandas dataframe multiindex when that index level has iden-CodePudding

I have a dataframe that has a multiindex with two levels. Given the following example for the second level:

d = {
"col1": [1, 2, 3, 4],
"col2": [1, 2, 3, 4],
"col3": [1, 2, 3, 4],
"col4": [1, 2, 3, 4],
"col5": [1, 2, 3, 4],
}
df = pd.DataFrame(data=d)
df.columns = pd.MultiIndex.from_product([df.columns, ["identical"]])

How do I change a singular value so that the second level of the index looks like this?

['example', 'identical', 'identical', 'identical', 'identical']

I have tried to do it this way:

        updated_columns = list(df.columns.get_level_values(1))

        updated_columns[0] = 'example'

        df.columns.set_levels(
            updated_columns, level=1, inplace=True, verify_integrity=False
        )

My change is ignored in this case.

I have also tried the answer from this topic: pandas MultiIndex with duplicate values in one level

        df.columns = pd.MultiIndex.from_tuples(
            df.columns.set_levels(updated_columns, 1, verify_integrity=False).values
        )

Which was also ignored.

I have also considered using the rename() method. Unfortunately it only works if the value of the column that is renamed is provided. Given that there are identical values, this will not work.

For non-multiindexes there is this method:

df.columns.values[0] = 'example'

But from what I have gathered, it will not work for a multi index.

I have added verify_integrity=False because the method would not otherwise allow me to set identical values.

Any help would be appreciated.

CodePudding user response：

One way would be to get the tuples that make up the MultiIndex and modify them directly:

    tuples = df.columns.tolist()
    tuples[0] = (tuples[0][0], 'example')

    df.columns = pd.MultiIndex.from_tuples(tuples)

Output:

>>> df
     col1      col2      col3      col4      col5
  example identical identical identical identical
0       1         1         1         1         1
1       2         2         2         2         2
2       3         3         3         3         3
3       4         4         4         4         4

MultiIndexes are a bit weird. They're stored as a list of levels (which contain unique label values), and codes (which are the indexes of the labels). For example, the levels and codes for your current MultiIndex look like this:

>>> df.columns.levels
FrozenList([['col1', 'col2', 'col3', 'col4', 'col5'], ['identical']])

>>> df.columns.codes
FrozenList([[0, 1, 2, 3, 4], [0, 0, 0, 0, 0]])

As you can see, there's only one 'indentical' string in there. The repetitions of it are defined by the codes.

So if you wanted to set the first label of the second level to example by manipulating the levels and codes, you might go about it like this:

vals = df.columns.levels[1].tolist()
vals.append('example')

df.columns = df.columns.set_levels(vals, level=1)
df.columns = df.columns.set_codes([1, 0, 0, 0, 0], level=1) # 1 = 'example', 2 = 'identical'

Here's a function that you can use to set all the labels of a particular level of a MultiIndex:

def set_level_of_df(df, level_index, labels):
    new_df = df.copy()
    
    unique, inverse = np.unique(labels, return_inverse=True)
    new_df.columns = new_df.columns.set_levels(unique, level=level_index)
    new_df.columns = new_df.columns.set_codes(inverse, level=level_index)
    return new_df

Usage:

>>> df
     col1      col2      col3      col4      col5
  example identical identical identical identical
0       1         1         1         1         1
1       2         2         2         2         2
2       3         3         3         3         3
3       4         4         4         4         4

>>> df = set_level_of_df(df, level_index=0, labels=['aaa', 'abc', 'xyz', '0123', '-----'])
>>> df
      aaa       abc       xyz      0123     -----
  example identical identical identical identical
0       1         1         1         1         1
1       2         2         2         2         2
2       3         3         3         3         3
3       4         4         4         4         4

Here's another function (which uses the previous one) that you can use to set a particular label of a particular level of a MultiIndex (which is your question):

def set_label_of_label_of_df(df, level_index, label_index, label):
    new_df = df.copy()
    
    labels = df.columns.get_level_values(level=level_index).tolist()
    labels[label_index] = label
    return set_level_of_df(df, level_index, labels)

Usage:

>>> df
     col1      col2      col3      col4      col5
  example identical identical identical identical
0       1         1         1         1         1
1       2         2         2         2         2
2       3         3         3         3         3
3       4         4         4         4         4

>>> df = set_label_of_label_of_df(df, level_index=1, label_index=3, label='THIS_IS_A_TEST_VALUE')
>>> df
     col1      col2      col3                 col4      col5
  example identical identical THIS_IS_A_TEST_VALUE identical
0       1         1         1                    1         1
1       2         2         2                    2         2
2       3         3         3                    3         3
3       4         4         4                    4         4