Home > Blockchain >  how to replace the comma in numbers in dataframe by dot?
how to replace the comma in numbers in dataframe by dot?

Time:11-19

I have this dataframe that I wish to replace all the comma by dot, for example it would be 50.5 and 81.5.

   Unnamed: 0    NB Ppt Resale  5 yrs 10 yrs 15 yrs 20 yrs
1        VLCC   120         114    87     64   50,5     37
3     SUEZMAX  81,5          80    62     45     36     24
5        LR 2    69          72    57     42     32     20
7     AFRAMAX    66          68    55   40,5   30,5     19
9        LR 1    58          58    40     28     21   13,5
11        MR2    44        44,5    38     29     21     13

As dtypes for all the columns are object, I tried

df_useful[['NB', 'Ppt Resale ', '5 yrs', '10 yrs', '15 yrs',
       '20 yrs']] = df_useful[['NB', 'Ppt Resale ', '5 yrs', '10 yrs', '15 yrs',
              '20 yrs']].apply(pd.to_numeric, errors='coerce')

then the numbers with comma would become NAN.

CodePudding user response:

A simple way:

out = df.replace(',', '.', regex=True)

Output:

   Unnamed: 0    NB Ppt Resale  5 yrs 10 yrs 15 yrs 20 yrs
1        VLCC   120        114     87     64   50.5     37
3     SUEZMAX  81.5         80     62     45     36     24
5        LR 2    69         72     57     42     32     20
7     AFRAMAX    66         68     55   40.5   30.5     19
9        LR 1    58         58     40     28     21   13.5
11        MR2    44       44.5     38     29     21     13

If your goal is to convert to numeric automatically, you can use:

df2 = (df
  .drop(columns='Unnamed: 0')
  .select_dtypes(exclude='number')
  .apply(lambda s: pd.to_numeric(s.str.replace(',', '.'),
                                  errors='coerce')) 
)
df[list(df2)] = df2

Output:

   Unnamed: 0     NB Ppt Resale  5 yrs 10 yrs 15 yrs 20 yrs
1        VLCC  120.0      114.0     87   64.0   50.5   37.0
3     SUEZMAX   81.5       80.0     62   45.0   36.0   24.0
5        LR 2   69.0       72.0     57   42.0   32.0   20.0
7     AFRAMAX   66.0       68.0     55   40.5   30.5   19.0
9        LR 1   58.0       58.0     40   28.0   21.0   13.5
11        MR2   44.0       44.5     38   29.0   21.0   13.0

dtypes:

print(df.dtypes)

Unnamed: 0     object
NB            float64
Ppt Resale    float64
5 yrs           int64
10 yrs        float64
15 yrs        float64
20 yrs        float64
dtype: object

CodePudding user response:

Another possible solution, based on the following idea:

    Convert the dataframe to CSV format and then read the CSV string back, using the decimal separator parameter of pd.read_csv to have decimal dots instead of decimal commas.
from io import StringIO
   
pd.read_csv(StringIO(df.to_csv()), decimal=',', index_col=0)

Output:

    Unnamed: 0     NB  Ppt Resale   5 yrs  10 yrs  15 yrs  20 yrs
1         VLCC  120.0       114.0      87    64.0    50.5    37.0
3      SUEZMAX   81.5        80.0      62    45.0    36.0    24.0
5         LR 2   69.0        72.0      57    42.0    32.0    20.0
7      AFRAMAX   66.0        68.0      55    40.5    30.5    19.0
9         LR 1   58.0        58.0      40    28.0    21.0    13.5
11         MR2   44.0        44.5      38    29.0    21.0    13.0
  • Related