Get row values as column values-CodePudding

I have a single row data-frame like below

Num     TP1(USD)    TP2(USD)    TP3(USD)    VReal1(USD)     VReal2(USD)     VReal3(USD)     TiV1 (EUR)  TiV2 (EUR)  TiV3 (EUR)  TR  TR-Tag
AA-24   0       700     2100    300     1159    2877    30       30     47      10  5

I want to get a dataframe like the one below

ID  Price   Net     Range
1   0       300     30
2   700     1159    30
3   2100    2877    47

The logic here is that a. there will be 3 columns names that contain TP/VR/TV. So in the ID, we have 1, 2 & 3 (these can be generated by extracting the value from the column names or just by using a range to fill) b. TP1 value goes into first row of column 'Price',TP2 value goes into second row of column 'Price' & so on c. Same for VR & TV. The values go into 'Net' & 'Range columns d. Columns 'Num', 'TR' & 'TR=Tag' are not relevant for the result.

I tried df.filter(regex='TP').stack(). I get all the 'TP' column & I can access individual values be index ([0],[1],[2]). I could not get all of them into a column directly.

I also wondered if there may be a easier way of doing this.

CodePudding user response：

Assuming 'Num' is a unique identifier, you can use pandas.wide_to_long:

pd.wide_to_long(df, stubnames=['TP', 'VR', 'TV'], i='Num', j='ID')

or, for an output closer to yours:

out = (pd
 .wide_to_long(df, stubnames=['TP', 'VR', 'TV'], i='Num', j='ID')
 .reset_index('ID')
 .drop(columns=['TR', 'TR-Tag'])
 .rename(columns={'TP': 'Price', 'VR': 'Net', 'TV': 'Range'})
 )

output:

       ID  Price   Net  Range
Num                          
AA-24   1      0   300     30
AA-24   2    700  1159     30
AA-24   3   2100  2877     47

updated answer

out = (pd
 .wide_to_long(df.set_axis(df.columns.str.replace(r'\(USD\)$', '', regex=True),
                           axis=1),
               stubnames=['TP', 'VReal', 'TiV'], i='Num', j='ID')
 .reset_index('ID')
 .drop(columns=['TR', 'TR-Tag'])
 .rename(columns={'TP': 'Price', 'VReal': 'Net', 'TiV': 'Range'})
 )

output:

       ID  Price   Net  Range
Num                          
AA-24   1      0   300     30
AA-24   2    700  1159     30
AA-24   3   2100  2877     47

CodePudding user response：

let us create a Multiindex then use .stack

df1 = df.filter(regex='TP|VR|TV')
#i couldn't figure out to split by 
#word\number without creating an additional whitespace split.
df1.columns = df1.columns\
     .str.replace('(\d )', r' \1' ,regex=True).str.split(' ',expand=True)

#or more succinctly.
df1.columns = pd.MultiIndex.from_frame(df1.columns.str.extract('(\D )(\d )'))   

print(df1)

  TP              VR              TV
   1    2     3    1     2     3   1   2   3
0  0  700  2100  300  1159  2877  30  30  47

df1.stack(1).rename(columns={'TP': 'Price', 'VR': 'Net', 'TV': 'Range'})
    
     Price  Range   Net
0 1      0     30   300
  2    700     30  1159
  3   2100     47  2877

CodePudding user response：

pivot_wider (see mozway's answer) is probably best here from a pure pandas perspective, but if you need more flexibility, you could also melt and pivot:

import pandas as pd

# recreating your dataframe
df = pd.DataFrame(['AA-24', '0', '700', '2100', '300', '1159', '2877', '30', '30', '47', '10', '5'], 
                  index= ['Num', 'TP1(USD)', 'TP2(USD)', 'TP3(USD)', 'VReal1(USD)', 'VReal2(USD)', 'VReal3(USD)', 'TiV1(EUR)', 'TiV2(EUR)', 'TiV3(EUR)', 'TR', 'TR-Tag']).T

# reshaping the data
(df.melt(id_vars=['Num','TR', 'TR-Tag'])
 .assign(col=lambda x: x['variable'].str[:2], idx=lambda x: x['variable'].str.extract("([0-9])"))
 .pivot(values='value', columns='col', index='idx')
 .rename(columns={'TP': 'Price', 'VR': 'Net', 'Ti': 'Range'})
)

Perhaps surprisingly, this is also faster than wide_to_long. Benchmarking gives 7.76 ms ± 841 µs per loop for this method.

The wide_to_long approach from mozway:

(pd
 .wide_to_long(df.set_axis(df.columns.str.replace(r'\([A-Z]{3}\)$', '', regex=True),
                           axis=1),
               stubnames=['TP', 'VReal', 'TiV'], i='Num', j='ID')
 .reset_index('ID')
 .drop(columns=['TR', 'TR-Tag'])
 .rename(columns={'TP': 'Price', 'VReal': 'Net', 'TiV': 'Range'})
 )

benchmarks at 30.4 ms ± 3.07 ms per loop on my machine.

Umar.H's answer using stack is the faster than both:

df1 = df.filter(regex='TP|VR|TV')
df1.columns = df1.columns\
     .str.replace('(\d )', r' \1' ,regex=True).str.split(' ',expand=True)
df1.stack(1).rename(columns={'TP': 'Price', 'VR': 'Net', 'TV': 'Range'})

Runs at 6.07 ms ± 156 µs per loop

If you don't mind the additional import, sammywemmy's answer using pyjanitor's pivot_wider offers speed and an elegant syntax.

(df
.select_columns('TP*', 'VR*', 'Ti*')
.pivot_longer(index = None, 
              names_to = ('.value', 'ID'), 
              names_pattern = ('(. )(\d). '))
.rename(columns = {'TP':'Price', 'VReal':'Net', 'TiV':'Range'})
)

benchmarks at 11.2 ms ± 229 µs per loop

and the names pattern approach:

df.pivot_longer(index = None, 
                names_to = ('Price', 'Net', 'Range'), 
                names_pattern = ('TP.*', 'VR.*', 'Ti.*'), 
                ignore_index = False)

is the fastest of the lot as tested, coming in at 3.53 ms ± 95 µs per loop.

(It is worth noting that this dataset is probably too small to care about speed, and the order may not be the same on larger datasets)

CodePudding user response：

One option is with pivot_longer from pyjanitor:

# pip install pyjanitor
import pandas as pd
import janitor

(df
.select_columns('TP*', 'VR*', 'Ti*')
.pivot_longer(index = None, 
              names_to = ('.value', 'ID'), 
              names_pattern = ('(. )(\d). '))
.rename(columns = {'TP':'Price', 'VReal':'Net', 'TiV':'Range'})
)
  ID  Price   Net  Range
0  1      0   300     30
1  2    700  1159     30
2  3   2100  2877     47

In the above solution, regex pattern is used to extract the relevant sub labels in the columns ; .value determines which of the sub labels remain as headers.

Another solution, that might be useful is to pass a list of regular expressions to names_pattern parameter:

df.pivot_longer(index = None, 
                names_to = ('Price', 'Net', 'Range'), 
                names_pattern = ('TP.*', 'VR.*', 'Ti.*'), 
                ignore_index = False)

   Price   Net  Range
0      0   300     30
0    700  1159     30
0   2100  2877     47

CodePudding user response：

IIUC, you can use:

df = pd.DataFrame({'TP1':[0], 'TP2':[700], 'TP3':[2100], 'VR1':[300], 'VR2':[1159], 'VR3':[2877], 'TV1':[30], 'TV2':[30], 'TV3':[47]})

pd.wide_to_long(df.reset_index(), ["TP", "VR", "TV"], i="index", j="Nr").droplevel('index').rename(columns={'TP': 'Price', 'VR': 'Net', 'TV': 'Range'})

Result:

    Price   Net  Range
Nr                    
1       0   300     30
2     700  1159     30
3    2100  2877     47