Home > database >  Unflatten a pandas dataframe
Unflatten a pandas dataframe

Time:11-17

I have a pandas dataframe

df_flat = pd.DataFrame({'dim1': ['a', 'a', 'b', 'b'], 'dim2': ['x', 'y', 'x', 'y'], 'val': [2, 4, 6, 8]})

I want to transform this dataframe, unflatten for want of a better words and transform it to a np ND array such that is looks like:

df_unflatten = pd.DataFrame({'dim1': ['a', 'b'], 'x': [2, 6], 'y': [4, 8]}).set_index('dim1').to_numpy()

I want this method to be flexible, such that if I add another dimension my 'unflattened' dataframe would become a numpy ndarray.

Are there any in built pandas functions that can help me achieve this. I am aware of functions that do the opposite e.g. .flatten() . unstack() etc. but I could not find any which achieve what I desire.

CodePudding user response:

I think the term you're looking for is "unmelt" since to "melt" a DataFrame is to bring it into the form you called df_flat. In order to achiece said unmelting, you can to as follows:

df = df_flat.set_index(['dim1', 'dim2'])['val'].unstack().reset_index()

# Output:
dim2 dim1  x  y
0       a  2  4
1       b  6  8

For the flexible part, you can add more dimensions in the list as parameter for set_index.

CodePudding user response:

Given your data:

df_flat = pd.DataFrame({
    'dim1': ['a', 'a', 'b', 'b'],
    'dim2': ['x', 'y', 'x', 'y'],
    'val': [2, 4, 6, 8]})
df_unflatten = pd.DataFrame({
    'dim1': ['a', 'b'], 
    'x': [2, 6], 
    'y': [4, 8]}).set_index('dim1')

Just unstack after setting indicies. Unstack without parameters uses the last multi-index for unstacking.

>>> new_df = df_flat.set_index(['dim1', 'dim2']).unstack()
>>> np.allclose(new_df, df_unflatten)
True
  • Related