I have a pandas dataframe that I have named as data_df
, Which looks like this.
A B C D
a1 a2 a3 b1 b2 b3 c1 c2 c3 d1 d2 d3
1 1 2 3 5 6 7 3 4 5 10 1 8
2 0 5 4 8 7 9 1 5 7 8 6 1
I have another dataframe named universe_df
which looks like this
Index Symbol Category
1 A X
2 B Y
3 C Y
4 D Z
I want the output to look like this:
X Y Z
A B C D
a1 a2 a3 b1 b2 b3 c1 c2 c3 d1 d2 d3
1 1 2 3 5 6 7 3 4 5 10 1 8
2 0 5 4 8 7 9 1 5 7 8 6 1
How should I go about it? Thanks in advance.
CodePudding user response:
Sample data:
data_df = pd.DataFrame({('A', 'a1'): {1: 1, 2: 0}, ('A', 'a2'): {1: 2, 2: 5}, ('A', 'a3'): {1: 3, 2: 4}, ('B', 'b1'): {1: 5, 2: 8}, ('B', 'b2'): {1: 6, 2: 7}, ('B', 'b3'): {1: 7, 2: 9}, ('C', 'c1'): {1: 3, 2: 1}, ('C', 'c2'): {1: 4, 2: 5}, ('C', 'c3'): {1: 5, 2: 7}, ('D', 'd1'): {1: 10, 2: 8}, ('D', 'd2'): {1: 1, 2: 6}, ('D', 'd3'): {1: 8, 2: 1}})
print (data_df)
A B C D
a1 a2 a3 b1 b2 b3 c1 c2 c3 d1 d2 d3
1 1 2 3 5 6 7 3 4 5 10 1 8
2 0 5 4 8 7 9 1 5 7 8 6 1
universe_df = pd.DataFrame({'Symbol': {1: 'A', 2: 'B', 3: 'C', 4: 'D'}, 'Category': {1: 'X', 2: 'Y', 3: 'Y', 4: 'Z'}})
print (universe_df)
Symbol Category
1 A X
2 B Y
3 C Y
4 D Z
First check if same ouput, no trailing whitespaces:
print (universe_df)
MultiIndex([('A', 'a1'),
('A', 'a2'),
('A', 'a3'),
('B', 'b1'),
('B', 'b2'),
('B', 'b3'),
('C', 'c1'),
('C', 'c2'),
('C', 'c3'),
('D', 'd1'),
('D', 'd2'),
('D', 'd3')],
)
print (universe_df['Symbol'].tolist())
['A', 'B', 'C', 'D']
Create Series
for mapping first level in list comprehension and recreate MultiIndex:
d = universe_df.set_index('Symbol')['Category']
data_df.columns = pd.MultiIndex.from_tuples([(d.get(a), a, b) for a, b in data_df.columns])
print (data_df)
X Y Z
A B C D
a1 a2 a3 b1 b2 b3 c1 c2 c3 d1 d2 d3
1 1 2 3 5 6 7 3 4 5 10 1 8
2 0 5 4 8 7 9 1 5 7 8 6 1
Or you can mapping first level of MultiIndex:
d = universe_df.set_index('Symbol')['Category']
a = data_df.columns.get_level_values(0)
b = data_df.columns.get_level_values(1)
data_df.columns = [a.map(d),a, b]
Another idea with helper DataFrame:
d = universe_df.set_index('Symbol')['Category']
df = data_df.columns.to_frame()
df.insert(0, '', df[0].map(d))
data_df.columns = pd.MultiIndex.from_frame(df)
print (data_df)
X Y Z
0 A B C D
1 a1 a2 a3 b1 b2 b3 c1 c2 c3 d1 d2 d3
1 1 2 3 5 6 7 3 4 5 10 1 8
2 0 5 4 8 7 9 1 5 7 8 6 1