I want to rename the indices of my pandas dataframe by retaining only the substring before the third hyphen. My code doesn't modify the indices. Why?
import re
for i in meth_450.index:
re.sub(r"^[^-]*-[^-]*:[^-]*", "", i)
meth_450.index
Index(['TCGA-06-0125-01A-01D-A45W-05', 'TCGA-06-0125-02A-11D-2004-05',
'TCGA-06-0152-01A-02D-A45W-05', 'TCGA-06-0152-02A-01D-2004-05',
'TCGA-06-0171-01A-02D-A45W-05', 'TCGA-06-0171-02A-11D-2004-05',
'TCGA-06-0190-01A-01D-A45W-05', 'TCGA-06-0190-02A-01D-2004-05',
'TCGA-06-0210-01A-01D-A45W-05', 'TCGA-06-0210-02A-01D-2004-05'],
dtype='object', length=155)
Desired output:
TCGA-06-0125, TCGA-06-0125,
TCGA-06-0152, TCGA-06-0152,
TCGA-06-0171, TCGA-06-0171,
TCGA-06-0190, TCGA-06-0190,
TCGA-06-0210, TCGA-06-0210
Ultimately, I want to match this dataframe to another dataframe:
clin = clin[clin.index.isin(meth_450.index)]
CodePudding user response:
index = pd.Index(['TCGA-06-0125-01A-01D-A45W-05', 'TCGA-06-0125-02A-11D-2004-05',
'TCGA-06-0152-01A-02D-A45W-05', 'TCGA-06-0152-02A-01D-2004-05',
'TCGA-06-0171-01A-02D-A45W-05', 'TCGA-06-0171-02A-11D-2004-05',
'TCGA-06-0190-01A-01D-A45W-05', 'TCGA-06-0190-02A-01D-2004-05',
'TCGA-06-0210-01A-01D-A45W-05', 'TCGA-06-0210-02A-01D-2004-05']
)
# You can extract by character count if your index is always consistent
index.str[:12]
# if you want to use regex: use . ? for non-greedy match
index.str.extract("^(. ?-. ?-. ?)-")[0]
CodePudding user response:
try this
import re
for i in meth_450.index:
re.sub(r"^\w*[-]\w*[-]\w*", "", i)
you have an error in your regex, it should be ^[^-]*-[^-]*-[^-]*
not ^[^-]*-[^-]*:[^-]*
CodePudding user response:
Try re.sub(r"-\w{3}-\w{3}-\w{4}-\d\d", "", i)
CodePudding user response:
Don't forget to assign back after substitution whatever method you use:
meth_450.index = meth_450.index.str.extract(r'^([^-] -[^-] -[^-] )', expand=False)
print(meth_450.index)
# Output
Index(['TCGA-06-0125', 'TCGA-06-0125', 'TCGA-06-0152', 'TCGA-06-0152',
'TCGA-06-0171', 'TCGA-06-0171', 'TCGA-06-0190', 'TCGA-06-0190',
'TCGA-06-0210', 'TCGA-06-0210'],
dtype='object')