Pandas: Easy way to remove float values from string values in large data-CodePudding

I am using juypter to analyze a large csv file.

Inside the file there are around 40,000 str values and 15 float values. I am trying to convert all the str values to numeric so i can analyze all the data.

However, I cannot due to the float values randomly inside the data. Is there a simple way to simply remove all these values?

I am relativley new to coding so please bear with me if this seems like a "dumb" quesiton.

import pandas as pd

df = pd.read_csv('stripperdata.csv')

for i in df['Pressure']:
    if isinstance(i , str):
        int(i)
    if isinstance(i , float):
        df.remove(i)

when I do this i am getting a error "Invalid literal for int() with base 10:"

CodePudding user response：

Assuming you have the following dataframe:

df = pd.DataFrame({'val': ['1', 2.0, '3', 4, '5', '6.6', '7', '8.8']})

   val
0    1
1  2.0          <=== float
2    3
3    4          <=== int
4    5
5  6.6
6    7
7  8.8

where 2.0 and 4 are float and int types. Others are strings of numbers.

You can drop the float and int values by, for example:

s_cleaned = df['val'].loc[~df['val'].map(lambda x: isinstance(x, float) | isinstance(x, int))]

Result:

print(s_cleaned)


0      1
2      3
4      5
5    6.6
6      7
7    8.8
Name: val, dtype: object

You can also "remove" these float and int values by changing them to NaN (null values), as follows:

df['val'] = df['val'].mask(df['val'].map(lambda x: isinstance(x, float) | isinstance(x, int)))

Result

print(df)

   val
0    1
1  NaN
2    3
3  NaN
4    5
5  6.6
6    7
7  8.8

CodePudding user response：

Edit: I made a mistake in my code the first time. I was removing the index during iteration causing it to skip over one of the elements. I admit this is a messy solution. I'm still learning myself.

values = ["11", "15", "74", "2.3", "11.7", "34"]

index = 0

for i in values:
    print(values[index])
    if "." in values[index]:
        print("Here's one: "   values[index])
        values.remove(values[index])
    elif isinstance(values[index], str):
        int(values[index])
        index  = 1
    print(index)

print(values)