Home > database >  Pandas type conversion not working in my case
Pandas type conversion not working in my case

Time:10-21

I've a Data Frame like this. I'm trying to use pd.numeric pd.numeric

import pandas as pd
import numpy as np

series = pd.Series([0,1,2,2,3,4,7,8.2,"stackoverflow",7,9.9])
df = pd.to_numeric(series, errors="coerce")
print(df)

output I got as expected

    0     0.0
    1     1.0
    2     2.0
    3     2.0
    4     3.0
    5     4.0
    6     7.0
    7     8.2
|------------------   
    8     NaN
|-------------------
    9     7.0
    10    9.9
|-----------------------
    dtype: float64
|--------------------------

when I use with pd series giving me different outputs

import pandas as pd
import numpy as np

series = pd.Series([0,1,2,2,3,np.array(4),7,8.2,"stackoverflow",7,9.9])
df = pd.to_numeric(series, errors="coerce")    
print(df)

Output I got in case 2 not as expected. It not even converting string to Nan as it done above example.It's not even converting to dtype as float

0                 0
1                 1
2                 2
3                 2
4                 3
5                 4
6                 7
7               8.2
-----------------------------------
8     stackoverflow
----------------------------------
9                 7
10              9.9
dtype: object

CodePudding user response:

Please note that both your df variables have <class 'pandas.core.series.Series'> type

After some manipulations, came up with the fix for your case:

def convert_list_to_series(data_list):
    series = pd.Series(data_list)
    return series.apply(pd.to_numeric, **{"errors": "coerce"})


data_list = [0, 1, 2, 2, 3, 4, 7, 8.2, "stackoverflow", 7, 9.9]
print(convert_list_to_series(data_list))

Output:

0     0.0
1     1.0
2     2.0
3     2.0
4     3.0
5     4.0
6     7.0
7     8.2
8     NaN
9     7.0
10    9.9
dtype: float64
data_list = [0, 1, 2, 2, 3, np.array(4), 7, 8.2, "stackoverflow", 7, 9.9]
print(convert_list_to_series(data_list))

Output:

0     0.0
1     1.0
2     2.0
3     2.0
4     3.0
5     4.0
6     7.0
7     8.2
8     NaN
9     7.0
10    9.9
dtype: float64

CodePudding user response:

This is strange behaviour. I guess it because values are not scalar .

Just create a np array list of particular row & try uisng pd.to_numeric

import pandas as pd
import numpy as np

series = pd.Series([0,1,2,2,3,np.array(4),7,8.2,"stackoverflow",7,9.9])


df = pd.DataFrame(series)

serieslist = np.array(df[0].tolist())
df['new'] = (pd.to_numeric(serieslist , errors='coerce'))
print(df)

output

                0  new
0               0  0.0
1               1  1.0
2               2  2.0
3               2  2.0
4               3  3.0
5               4  4.0
6               7  7.0
7             8.2  8.2
8   stackoverflow  NaN
9               7  7.0
10            9.9  9.9
>>> 
  • Related