I have a dataframe "counts" and I would like to change the name of the second column using a regular expression because I have multiple files with this "extra information", so I have:
| GeneID | /home/rmachado/Biotec/ARJNA231684/mapa_fin_starterar/SRR1212121_mapped.bamAligned.sortedByCoord.out.bam |
| -------- | -------------- |
| Ciclev10010164m.g.v1.0 | 2 |
| Ciclev10007306m.g.v1.0 | 647 |
| Ciclev10009318m.g.v1.0 | 39 |
| Ciclev... | ... |
| Ciclev10007306m.g.v1.0 | 112 |
I tried with the following code with no success:
for col in counts1:
counts1.rename(columns={col:col.upper().replace("/home/rmachado/Biotec/ARJNA231684/mapa_fin_starterar/SRR1212121_mapped.bamAligned.sortedByCoord.out.bam","SRR[\d]{6}")},inplace=True)
How can I obtain a df with the following format?
| GeneID | SRR1212121 |
| -------- | -------------- |
| Ciclev10010164m.g.v1.0 | 2 |
| Ciclev10007306m.g.v1.0 | 647 |
| Ciclev10009318m.g.v1.0 | 39 |
| Ciclev... | ... |
| Ciclev10007306m.g.v1.0 | 112 |
CodePudding user response:
You could try:
df.columns = df.columns.str.extract(r'((?<=/)SRR\d |^[^/] $)', expand=False)
regex:
(?<=/)SRR\d # match SDD digits if preceded by "/"
^[^/] $ # else match full string if it doesn't contain "/"