How can I take the values from columns and send them to rows?-CodePudding

I have a data frame with 60 columns, which most of them are just informations about other sold products, it looks like this:

store_id|seller_id | produtc_0_code |product_0_value   |product_1_code |product_1_value
        ----------------------------------------------------------------------------------
 12     |    1234  |  a2356         |    52,20         |   ae822,       |    50,20

in order to reduce the amount of columns, I've been trying to put the values from the colomns in rows, something like this:

store_id |seller_id | product_ code |product_value 
12       |  1234    |  a2356        | 52,20
12       |  1234    |  ae822        | 50,20

that means, keeping the values of store_id and seller_id, but taking the values from the products_code and product_value columns and inserting them in new rows.

CodePudding user response：

pivot_longer from pyjanitor can also help in this transformation; since your columns have an order to them (some end with code, others with value):

# pip install pyjanitor
import pandas as pd
import janitor as jn
df.pivot_longer(index = ['store_id', 'seller_id'], 
                names_to = ['product_code', 'product_value'], 
                names_pattern = ["code$", "value$"])
 
   store_id  seller_id product_code product_value
0        12       1234        a2356         52,20
1        12       1234       ae822,         50,20

It is also possible to reshape using pd.wide_to_long.

First, reorder the columns with str.replace:

pat = "(?P<start>. )_(?P<middle>. )_(?P<end>. )"
repl = lambda m : f"{m.group('start')}_{m.group('end')}_{m.group('middle')}"
df.columns = df.columns.str.replace(pat, repl, regex = True)
# or just skip the steps above and use this option from @HenryEcker
# df.columns = df.columns.str.replace(r'(. )_(. )_(. )', r'\1_\3_\2', regex=True)

Apply pd.wide_to_long:

pd.wide_to_long(df, 
                stubnames = ['product_code', 'product_value'], 
                i = ['store_id', 'seller_id'], 
                j = 'num', 
                sep='_').droplevel('num').reset_index()

   store_id  seller_id product_code product_value
0        12       1234        a2356         52,20
1        12       1234       ae822,         50,20

CodePudding user response：

We can use set_index stack, but we first need to create a MultiIndex using str.split:

# Set index to the columns to be unaffected
new_df = df.set_index(['store_id', 'seller_id'])
# Split remaining column into a MultiIndex
new_df.columns = new_df.columns.str.split('_', expand=True)
# Stack Middle level (the numbers) into rows
new_df = new_df.stack(1).droplevel(-1)
# Join MultiIndex Columns together
new_df.columns = new_df.columns.map('_'.join)
# Reset to default range index
new_df = new_df.reset_index()

*Note this assumes that the remaining columns have values separated by _ with a number in the second split position.

new_df:

   store_id  seller_id product_code product_value
0        12       1234        a2356         52,20
1        12       1234       ae822,         50,20

Setup Used:

import pandas as pd

df = pd.DataFrame({
    'store_id': [12], 'seller_id': [1234], 'product_0_code': ['a2356'],
    'product_0_value': ['52,20'], 'product_1_code': ['ae822,'],
    'product_1_value': ['50,20']
})