How to get rid of decimal points in dataframe that is NOT numbers-CodePudding

I just want to get rid of characters (or whatever you want to call it)that has ".xxxxx"

Gene_ID
ENSG00000000003.14
ENSG00000000005.5
ERCC-00164
ENSG00000002586.18_PAR_Y
ENSG00000054803.3
ERCC-00012
ENSG00000284332.1

So this is how I want it to look like:

Gene_ID
ENSG00000000003
ENSG00000000005
ERCC-00164
ENSG00000002586
ENSG00000054803
ERCC-00012
ENSG00000284332

This is what I have tried:

df['Gene_ID'].str.replace('.',''))

but when I do that it only gets rid of the decimal not the characters that comes after the decimal point.

Note: the actual column is much longer than what I am showing on stack which has all that ".xxxx"

CodePudding user response：

Use Series.str.replace with regex (\..*)$ for decimal and any value, $ is for end of string:

df['Gene_ID'] = df['Gene_ID'].str.replace('(\..*)$','', regex=True)
print (df)
           Gene_ID
0  ENSG00000000003
1  ENSG00000000005
2       ERCC-00164
3  ENSG00000002586
4  ENSG00000054803
5       ERCC-00012
6  ENSG00000284332

CodePudding user response：

Check the comment above:

Note that . is a metacharacter which represents Anything apart from the line breaks, hence to match a literal . you need to escape it by a backslash or put in a character class ie inside brackets.

df['Gene_ID'] = df['Gene_ID'].str.replace('[.].*','', regex = True)

df
           Gene_ID
0  ENSG00000000003
1  ENSG00000000005
2       ERCC-00164
3  ENSG00000002586
4  ENSG00000054803
5       ERCC-00012
6  ENSG00000284332