Home > Mobile >  Expand pandas dataframe from row-wise to column-wise
Expand pandas dataframe from row-wise to column-wise

Time:08-22

I want to expand the columns of the following (toy example) pandas DataFrame,

df = pd.DataFrame({'col1': ["A", "A", "A", "B", "B", "B"],
                   'col2': [1, 7, 3, 2, 9, 4],
                   'col3': [3, -1, 0, 5, -2, -3],})


  col1  col2  col3
0    A     1     3
1    A     7    -1
2    A     3     0
3    B     2     5
4    B     9    -2
5    B     4    -3

such that it will become row-wise,

  col1  col2_1  col2_2  col2_3  col3_1  col3_2  col3_3
0    A     1      7        3      3       -1      0
1    B     2      9        4      5       -2     -3

I know that I shall use groupby('col1') but do not know how to achieve the desired DataFrame. Note: The number of elements in each group when we perform groupby('col1') are all equal (in this case we have three A's and three B's)

Edit: I managed to do it by the following code, but it is not efficient,

import pandas as pd
from functools import partial

def func(x, exclude_list):
    for col in x.columns:
        if col in exclude_list:
            continue
        for i, value in enumerate(x[col].values):
            x[f'{col}_{i 1}'] = value
    return x

df = pd.DataFrame({'col1': ["A", "A", "A", "B", "B", "B"],
                   'col2': [1, 7, 3, 2, 9, 4],
                   'col3': [3, -1, 0, 5, -2, -3],})

exclude_list = ['col1']
columns_to_expand = ['col2', 'col3']
func2 = partial(func, exclude_list=exclude_list)
df2 = df.groupby(exclude_list).apply(func2)
df2.drop(columns_to_expand, axis=1, inplace=True)
df3 = df2.groupby(exclude_list).tail(1).reset_index()
df3.drop('index', axis=1, inplace=True)
print(df3)

which results in,

  col1  col2_1  col2_2  col2_3  col3_1  col3_2  col3_3
0    A       1       7       3       3      -1       0
1    B       2       9       4       5      -2      -3

Edit2: This code, based on ouroboros1 answer works efficiently,

df_pivot = None
for col in columns_to_expand:
    df['index'] = [f'{col}_{i}' for i in range(1,4)]*len(np.unique(df[exclude_list].values))
    if df_pivot is None:
        df_pivot = df.pivot(index=exclude_list, values=col, columns='index').reset_index(drop=False)
    else:
        df_pivot = df_pivot.merge(df.pivot(index=exclude_list, values=col, columns='index').reset_index(drop=False))

CodePudding user response:

Solution for any number of columns:

def f(group):
    result = (group.set_axis(range(1, len(group)   1))
                   .T
                   .stack())
    result.index = (result.index
                          .to_flat_index()
                          .map('{0[0]}_{0[1]}'.format))
    return result

Specify the columns (as a list) here:

chosen_cols = ['col2', 'col3']
df.groupby('col1')[chosen_cols].apply(f).reset_index()

Result:

  col1  col2_1  col2_2  col2_3  col3_1  col3_2  col3_3
0    A       1       7       3       3      -1       0
1    B       2       9       4       5      -2      -3

Explanation:

f accepts a sub-dataframe: a group (where 'col1' is the same), and converts this group into a Series. When returned, this Series will become the row.

For example:

Let's say that that f receives a group where col1 is 'A':

   col2  col3
0     1     3
1     7    -1
2     3     0

It resets the index (because you want the numbering to start from 1):

group.set_axis(range(1, len(group)   1))

   col2  col3
1     1     3
2     7    -1
3     3     0

After .T.stack(), the dataframe becomes a Series with a MultiIndex:

col2  1    1
      2    7
      3    3
col3  1    3
      2   -1
      3    0
dtype: int64

This MultiIndex is flattened (into tuples) and the tuple elements are merged together with a '_' in between them:

col2_1    1
col2_2    7
col2_3    3
col3_1    3
col3_2   -1
col3_3    0
dtype: int64

That's it! After that, this Series is returned from f and becomes a row:

  col1  col2_1  col2_2  col2_3  col3_1  col3_2  col3_3
0    A       1       7       3       3      -1       0

 


Previous solution (it works only for 1 column, so don't use it):

3 steps (using .groupby):

  • Aggregate all elements of each group into a list:
temp = df.groupby('col1')['col2'].agg(list)
col1
A    [1, 7, 3]
B    [2, 9, 4]
Name: col2, dtype: object
  • Expand these lists into new columns:
result = pd.DataFrame(temp.tolist(), index=temp.index)
      0  1  2
col1         
A     1  7  3
B     2  9  4
  • Rename the columns and reset index:
result.columns = range(1, len(result.columns)   1)
result = result.add_prefix('col2_').reset_index()
  col1  col2_1  col2_2  col2_3
0    A       1       7       3
1    B       2       9       4

CodePudding user response:

Update: the question has been updated to expand multiple columns row-wise. This requires some refactoring of the initial answers that were tailored to the initial question, which only required the operation to take place on one column (col2). Note that the current refactored answers also work perfectly fine on a single column. However, since they are a little verbose for that situation, I'm keeping the original answers for just 1 column at the end.


Answers for expanding multiple columns row-wise

You could use df.pivot for this:

import pandas as pd

df = pd.DataFrame({'col1': ["A", "A", "A", "B", "B", "B"],
                   'col2': [1, 7, 3, 2, 9, 4],
                   'col3': [3, -1, 0, 5, -2, -3],})

cols = ['col2','col3']

# val count per unique val in col1: N.B. expecting all to have same count!
vals_unique_col1 = df.col1.value_counts()[0] 1 # 3 1 (use in `range()`) 
len_unique_col1 = len(df.col1.unique()) # 2

# create temp cols [1,2,3] and store in new col
df['my_index'] = [i for i in range(1,vals_unique_col1)]*len_unique_col1
df_pivot = df.pivot(index='col1',values=cols,columns='my_index')\
    .reset_index(drop=False)

# customize df cols by joining MultiIndex columns
df_pivot.columns = ['_'.join(str(i) for i in x) for x in df_pivot.columns]
df_pivot.rename(columns={'col1_':'col1'}, inplace=True)

print(df_pivot)

  col1  col2_1  col2_2  col2_3  col3_1  col3_2  col3_3
0    A       1       7       3       3      -1       0
1    B       2       9       4       5      -2      -3

2 alternative solutions based on df.groupby could like this:

  • Groupby solution 1
import pandas as pd

df = pd.DataFrame({'col1': ["A", "A", "A", "B", "B", "B"],
                   'col2': [1, 7, 3, 2, 9, 4],
                   'col3': [3, -1, 0, 5, -2, -3],})

cols = ['col2','col3']

df_groupby = df.groupby('col1')[cols].agg(list)\
    .apply(pd.Series.explode, axis=1).reset_index(drop=False)

# same as in `pivot` method, this will be 3
len_cols = df.col1.value_counts()[0]

# rename cols
df_groupby.columns=[f'{col}_{(idx-1)%len_cols 1}' if col != 'col1' else col 
                    for idx, col in enumerate(df_groupby.columns)]
  • Groupby solution 2
import pandas as pd
import numpy as np

df = pd.DataFrame({'col1': ["A", "A", "A", "B", "B", "B"],
                   'col2': [1, 7, 3, 2, 9, 4],
                   'col3': [3, -1, 0, 5, -2, -3],})

cols = ['col2','col3']

agg_lists = df.groupby('col1')[cols].agg(list)

dfs = [pd.DataFrame(agg_lists[col].tolist(), index=agg_lists.index) 
       for col in agg_lists.columns]

df_groupby = pd.concat(dfs, axis=1)

len_cols = df.col1.value_counts()[0]
cols_rep = np.repeat(cols,len_cols)

df_groupby.columns = [f'{col}_{str(i 1)}' for col, i 
                      in zip(cols_rep, df_groupby.columns)]
df_groupby.reset_index(drop=False, inplace=True)

(Original) answers for expanding single column row-wise

You could use df.pivot for this:

import pandas as pd

df = pd.DataFrame({'col1': ["A", "A", "A", "B", "B", "B"], 
                   'col2': [1, 7, 3, 2, 9, 4]})

# add col with prospective col names (`col1_1,*_2,*_3`) 
# and multiply by len unique values in `df.col1`
df['index'] = [f'col2_{i}' for i in range(1,4)]*len(df.col1.unique())

df_pivot = df.pivot(index='col1',values='col2',columns='index')\
    .reset_index(drop=False)

print(df_pivot)

index col1  col2_1  col2_2  col2_3
0        A       1       7       3
1        B       2       9       4

Alternative solution based on df.groupby could like this:

import pandas as pd

df = pd.DataFrame({'col1': ["A", "A", "A", "B", "B", "B"], \
                   'col2': [1, 7, 3, 2, 9, 4]})


# create lists of values in `col2` per group in `col1`,
# then expand into multiple cols with `apply(pd.Series), finally reset index
df_groupby = df.groupby('col1').agg(list)['col2']\
    .apply(pd.Series).reset_index(drop=False)

# overwrite new cols (`0,1,2`) with desired col names `col2_1, etc.`
df_groupby.columns=[f'col2_{col 1}' if col != 'col1' else col 
                    for col in list(df_groupby.columns)]

print(df_groupby)

  col1  col2_1  col2_2  col2_3
0    A       1       7       3
1    B       2       9       4
  • Related