I have a dataframe that has a multiindex with two levels. Given the following example for the second level:
d = {
"col1": [1, 2, 3, 4],
"col2": [1, 2, 3, 4],
"col3": [1, 2, 3, 4],
"col4": [1, 2, 3, 4],
"col5": [1, 2, 3, 4],
}
df = pd.DataFrame(data=d)
df.columns = pd.MultiIndex.from_product([df.columns, ["identical"]])
How do I change a singular value so that the second level of the index looks like this?
['example', 'identical', 'identical', 'identical', 'identical']
I have tried to do it this way:
updated_columns = list(df.columns.get_level_values(1))
updated_columns[0] = 'example'
df.columns.set_levels(
updated_columns, level=1, inplace=True, verify_integrity=False
)
My change is ignored in this case.
I have also tried the answer from this topic: pandas MultiIndex with duplicate values in one level
df.columns = pd.MultiIndex.from_tuples(
df.columns.set_levels(updated_columns, 1, verify_integrity=False).values
)
Which was also ignored.
I have also considered using the rename() method. Unfortunately it only works if the value of the column that is renamed is provided. Given that there are identical values, this will not work.
For non-multiindexes there is this method:
df.columns.values[0] = 'example'
But from what I have gathered, it will not work for a multi index.
I have added verify_integrity=False because the method would not otherwise allow me to set identical values.
Any help would be appreciated.
CodePudding user response:
One way would be to get the tuples that make up the MultiIndex and modify them directly:
tuples = df.columns.tolist()
tuples[0] = (tuples[0][0], 'example')
df.columns = pd.MultiIndex.from_tuples(tuples)
Output:
>>> df
col1 col2 col3 col4 col5
example identical identical identical identical
0 1 1 1 1 1
1 2 2 2 2 2
2 3 3 3 3 3
3 4 4 4 4 4
MultiIndexes are a bit weird. They're stored as a list of levels (which contain unique label values), and codes (which are the indexes of the labels). For example, the levels and codes for your current MultiIndex look like this:
>>> df.columns.levels
FrozenList([['col1', 'col2', 'col3', 'col4', 'col5'], ['identical']])
>>> df.columns.codes
FrozenList([[0, 1, 2, 3, 4], [0, 0, 0, 0, 0]])
As you can see, there's only one 'indentical'
string in there. The repetitions of it are defined by the codes.
So if you wanted to set the first label of the second level to example
by manipulating the levels and codes, you might go about it like this:
vals = df.columns.levels[1].tolist()
vals.append('example')
df.columns = df.columns.set_levels(vals, level=1)
df.columns = df.columns.set_codes([1, 0, 0, 0, 0], level=1) # 1 = 'example', 2 = 'identical'
Here's a function that you can use to set all the labels of a particular level of a MultiIndex:
def set_level_of_df(df, level_index, labels):
new_df = df.copy()
unique, inverse = np.unique(labels, return_inverse=True)
new_df.columns = new_df.columns.set_levels(unique, level=level_index)
new_df.columns = new_df.columns.set_codes(inverse, level=level_index)
return new_df
Usage:
>>> df
col1 col2 col3 col4 col5
example identical identical identical identical
0 1 1 1 1 1
1 2 2 2 2 2
2 3 3 3 3 3
3 4 4 4 4 4
>>> df = set_level_of_df(df, level_index=0, labels=['aaa', 'abc', 'xyz', '0123', '-----'])
>>> df
aaa abc xyz 0123 -----
example identical identical identical identical
0 1 1 1 1 1
1 2 2 2 2 2
2 3 3 3 3 3
3 4 4 4 4 4
Here's another function (which uses the previous one) that you can use to set a particular label of a particular level of a MultiIndex (which is your question):
def set_label_of_label_of_df(df, level_index, label_index, label):
new_df = df.copy()
labels = df.columns.get_level_values(level=level_index).tolist()
labels[label_index] = label
return set_level_of_df(df, level_index, labels)
Usage:
>>> df
col1 col2 col3 col4 col5
example identical identical identical identical
0 1 1 1 1 1
1 2 2 2 2 2
2 3 3 3 3 3
3 4 4 4 4 4
>>> df = set_label_of_label_of_df(df, level_index=1, label_index=3, label='THIS_IS_A_TEST_VALUE')
>>> df
col1 col2 col3 col4 col5
example identical identical THIS_IS_A_TEST_VALUE identical
0 1 1 1 1 1
1 2 2 2 2 2
2 3 3 3 3 3
3 4 4 4 4 4