The header of my data frame looks like this
header = list(data_no_control.columns.values)
header
['MLID_D_08_NGS_34_H08.fsa',
'MLID_D_25_NGS_38_A11.fsa',
'MLID_D_36_NGS_41_D12.fsa',
'MLID_D_37_NGS_42_E12.fsa']
I want to change my header to look like this
['NGS_34',
'NGS_38',
'NGS_41',
'NGS_42']
How can I do this?
CodePudding user response:
header = ['MLID_D_08_NGS_34_H08.fsa',
'MLID_D_25_NGS_38_A11.fsa',
'MLID_D_36_NGS_41_D12.fsa',
'MLID_D_37_NGS_42_E12.fsa']
new_header = []
for item in header:
item = item.split('_')
new_header.append(item[3] '_' item[4])
# output: ['NGS_34', 'NGS_38', 'NGS_41', 'NGS_42']
print(new_header)
CodePudding user response:
Using str.extract
:
df["col"] = df["col"].str.extract(r'_([^_] _[^_] )_[^_] \.\w $')
Here is a regex demo.