Let's say I have a DataFrame like this:
df = pd.DataFrame(data = [[1,2,3,4,5,6], [3,4,5,6,7,8]],
columns = pd.MultiIndex.from_product([('A1', 'B1', 'A2'), (10,20)], names=['level_0','level_1']))
Here's how it looks like: DataFrame image
I want to add a new level in the columns which contains 1
where level_0
value contains "1"
and and 2
where level_0 value contains "2"
. So, basically:
- Where
level_0 == "A1"
-->new_level
=1
- Where
level_0 == "B1"
-->new_level
=1
- Where
level_0 == "A2"
-->new_level
=2
Any suggestions on how to do it?
CodePudding user response:
Use lsit comprehension for extract number from first level values and create new MultiIndex by MultiIndex.from_tuples
:
import re
df.columns = pd.MultiIndex.from_tuples([(re.findall(r'(\d )$', x[0])[0], *x)
for x in df.columns.tolist()],
names=('new_level',*df.columns.names))
print (df)
new_level 1 2
level_0 A1 B1 A2
level_1 10 20 10 20 10 20
0 1 2 3 4 5 6
1 3 4 5 6 7 8
CodePudding user response:
You could extract the values with a regex ((\d )$
= last digits of the value) and rework the MultiIndex with MultiIndex.from_arrays
:
values = df.columns.get_level_values('level_0').str.extract('(\d )$', expand=False)
# ['1', '1', '1', '1', '2', '2']
df.columns = pd.MultiIndex.from_arrays([*zip(*df.columns.to_list()), values],
names=[*df.columns.names, 'level_2']
)
NB. this generalizes to any XXX00 value
output:
level_0 A1 B1 A2
level_1 10 20 10 20 10 20
level_2 1 1 1 1 2 2
0 1 2 3 4 5 6
1 3 4 5 6 7 8