Python: strip pair-wise column names-CodePudding

I have a DataFrame with columns that look like this:

df=pd.DataFrame(columns=['(NYSE_close, close)','(NYSE_close, open)','(NYSE_close, volume)', '(NASDAQ_close, close)','(NASDAQ_close, open)','(NASDAQ_close, volume)'])

df:
(NYSE_close, close) (NYSE_close, open) (NYSE_close, volume) (NASDAQ_close, close) (NASDAQ_close, open) (NASDAQ_close, volume)

I want to remove everything after the underscore and append whatever comes after the comma to get the following:

df:
NYSE_close  NYSE_open  NYSE_volume  NASDAQ_close  NASDAQ_open  NASDAQ_volume

I tried to strip the column name but it replaced it with nan. Any suggestions on how to do that?

Thank you in advance.

CodePudding user response：

You could use re.sub to extract the appropriate parts of the column names to replace them with:

import re

df=pd.DataFrame(columns=['(NYSE_close, close)','(NYSE_close, open)','(NYSE_close, volume)', '(NASDAQ_close, close)','(NASDAQ_close, open)','(NASDAQ_close, volume)'])
df.columns = [re.sub(r'\(([^_] _)\w , (\w )\)', r'\1\2', c) for c in df.columns]

Output:

Empty DataFrame
Columns: [NYSE_close, NYSE_open, NYSE_volume, NASDAQ_close, NASDAQ_open, NASDAQ_volume]
Index: []

CodePudding user response：

You could:

import re

def cvt_col(x):
    s = re.sub('[()_,]', ' ', x).split()
    return s[0]   '_'   s[2] 

df.rename(columns = cvt_col)

Empty DataFrame
Columns: [NYSE_close, NYSE_open, NYSE_volume, NASDAQ_close, NASDAQ_open, NASDAQ_volume]
Index: []

CodePudding user response：

Use a list comprehension, twice:

step1 = [ent.strip('()').split(',') for ent  in df]

df.columns = ["_".join([left.split('_')[0], right.strip()]) 
              for left, right  in step1]

df

Empty DataFrame
Columns: [NYSE_close, NYSE_open, NYSE_volume, NASDAQ_close, NASDAQ_open, NASDAQ_volume]
Index: []