I have a Dataframe in the following format:
2022-07-04 2022-07-11
27 27.1 27.2 28 28.1 28.2
index id employee
1 1234565 Max project1 customer1 in progress project1 customer1 in progress
project2 customer2 in progress project2 customer2 in progress
What I want to achieve is the following format:
2022-07-04 2022-07-11
27 28
index id employee
1 1234565 Max project1, customer1, in progress project1, customer1, in progress
project2, customer2, in progress project2, customer2, in progress
So I have a multilevel Index, and I want to merge the columns to one column, which are under the Top Level column. So for example under the top level '2022-07-04' I have three columns (27, 27.1, 27.2) and I want to merge these 3 columns to one column, and the same for the next date (2022-07-11) and so on... The size is always 3 columns, so below top level are always 3 columns.
CodePudding user response:
Use Index.get_level_values
for columns names by levels, in second remove values after .
and use custom lambda function for join
:
c1 = df.columns.get_level_values(1).str.split('.').str[0]
c2 = df.columns.get_level_values(0)
df = df.groupby([c2, c1], axis=1).apply(lambda x: x.agg(', '.join, axis=1))
print (df)
2022-07-04 \
27
index id employee
1 1234565 Max project1, customer1, in progress
Max project2, customer2, in progress
2022-07-11
28
index id employee
1 1234565 Max project1, customer1, in progress
Max project2, customer2, in progress
CodePudding user response:
It is a bit hard for me to understand the datastructure. I guess you could create a new array in the shape which you want, say
newArray = np.zeros((5,2),dtype = object)
and then go through the array with a for-loop. And for each column set the newArray-column to the one of the original array, except at column 4 and 5, where you can use sth like np.append or str.join to link the remaining entries together.