Home > OS >  Having issues trying to make my dataframe numeric
Having issues trying to make my dataframe numeric

Time:03-27

So I have a sqlite local database, I read it into my program as a pandas dataframe using

""" Seperating hitters and pitchers """
pitchers = pd.read_sql_query("SELECT * FROM ALL_NORTHWOODS_DATA WHERE BF_y >= 20 AND BF_x >= 20", northwoods_db)
hitters = pd.read_sql_query("SELECT * FROM ALL_NORTHWOODS_DATA WHERE PA_y >= 25 AND PA_x >= 25", northwoods_db)

But when I do this, some of the numbers are not numeric. Here is a head of one of the dataframes:

index   Year    Age_x   AgeDif_x    Tm_x    Lg_x    Lev_x   Aff_x   G_x PA_x    ... ER_y    BK_y    WP_y    BF_y    WHIP_y  H9_y    HR9_y   BB9_y   SO9_y   SO/W_y
0   84  2020    21  -0.3    Hillsdale   GMAC    NCAA    None    5   None    ... 4.0 None    3.0 71.0    1.132   5.6 0.0 4.6 8.7 1.89
1   264 2018    --  None    Duke    ACC NCAA    None    15  None    ... 13  0   1   88  2.111   10.0    0.5 9.0 8.0 0.89
2   298 2019    21  0.1 Wisconsin-Milwaukee Horz    NCAA    None    8   None    ... 1.0 0.0 2.0 21.0    2.25    9.0 0.0 11.3    11.3    1.0
3   357 2017    22  1.0 Nova Southeastern   SSC NCAA    None    15.0    None    ... 20.0    0.0 3.0 206.0   1.489   9.7 0.4 3.7 8.5 2.32
4   418 2021    21  -0.4    Creighton   BigE    NCAA    None    4   None    ... 26.0    1.0 6.0 226.0   1.625   8.6 0.9 6.0 7.5 1.25

When I try to make the dataframe numeric, I used this line of code:

hitters = hitters.apply(pd.to_numeric, errors='coerce')
pitchers = pitchers.apply(pd.to_numeric, errors='coerce')

But when I did that, the new head of the dataframes is full of NaN's, it seems like it got rid of all of the string values but I want to keep those.

    index   Year    Age_x   AgeDif_x    Tm_x    Lg_x    Lev_x   Aff_x   G_x PA_x    ... ER_y    BK_y    WP_y    BF_y    WHIP_y  H9_y    HR9_y   BB9_y   SO9_y   SO/W_y
0   84  2020    21.0    -0.3    NaN NaN NaN NaN 5.0 NaN ... 4.0 NaN 3.0 71.0    1.132   5.6 0.0 4.6 8.7 1.89
1   264 2018    NaN NaN NaN NaN NaN NaN 15.0    NaN ... 13.0    0.0 1.0 88.0    2.111   10.0    0.5 9.0 8.0 0.89
2   298 2019    21.0    0.1 NaN NaN NaN NaN 8.0 NaN ... 1.0 0.0 2.0 21.0    2.250   9.0 0.0 11.3    11.3    1.00
3   357 2017    22.0    1.0 NaN NaN NaN NaN 15.0    NaN ... 20.0    0.0 3.0 206.0   1.489   9.7 0.4 3.7 8.5 2.32
4   418 2021    21.0    -0.4    NaN NaN NaN NaN 4.0 NaN ... 26.0    1.0 6.0 226.0   1.625   8.6 0.9 6.0 7.5 1.25

Is there a better way to makethe number values numeric and keep all my string columns? Maybe there is an sqlite function that can do it better? I am not sure, any help is appriciated.

CodePudding user response:

Maybe you can use combine_first:

hitters_new = hitters.apply(pd.to_numeric, errors='coerce').combine_first(hitters)
pitchers_new = pitchers.apply(pd.to_numeric, errors='coerce').combine_first(pitchers)

CodePudding user response:

You can try using astype or convert_dtypes. They both take an argument which is the columns you want to convert, if you already know which columns are numeric and which ones are strings that can work. Otherwise, take a look at this thread to do this automatically.

  • Related