I am using juypter to analyze a large csv file.
Inside the file there are around 40,000 str values and 15 float values. I am trying to convert all the str values to numeric so i can analyze all the data.
However, I cannot due to the float values randomly inside the data. Is there a simple way to simply remove all these values?
I am relativley new to coding so please bear with me if this seems like a "dumb" quesiton.
import pandas as pd
df = pd.read_csv('stripperdata.csv')
for i in df['Pressure']:
if isinstance(i , str):
int(i)
if isinstance(i , float):
df.remove(i)
when I do this i am getting a error "Invalid literal for int() with base 10:"
CodePudding user response:
Assuming you have the following dataframe:
df = pd.DataFrame({'val': ['1', 2.0, '3', 4, '5', '6.6', '7', '8.8']})
val
0 1
1 2.0 <=== float
2 3
3 4 <=== int
4 5
5 6.6
6 7
7 8.8
where 2.0 and 4 are float and int types. Others are strings of numbers.
You can drop the float and int values by, for example:
s_cleaned = df['val'].loc[~df['val'].map(lambda x: isinstance(x, float) | isinstance(x, int))]
Result:
print(s_cleaned)
0 1
2 3
4 5
5 6.6
6 7
7 8.8
Name: val, dtype: object
You can also "remove" these float and int values by changing them to NaN
(null values), as follows:
df['val'] = df['val'].mask(df['val'].map(lambda x: isinstance(x, float) | isinstance(x, int)))
Result
print(df)
val
0 1
1 NaN
2 3
3 NaN
4 5
5 6.6
6 7
7 8.8
CodePudding user response:
Edit: I made a mistake in my code the first time. I was removing the index during iteration causing it to skip over one of the elements. I admit this is a messy solution. I'm still learning myself.
values = ["11", "15", "74", "2.3", "11.7", "34"]
index = 0
for i in values:
print(values[index])
if "." in values[index]:
print("Here's one: " values[index])
values.remove(values[index])
elif isinstance(values[index], str):
int(values[index])
index = 1
print(index)
print(values)