I have this df and trying to clean it. How to convert irs_pop,latitude,longitude and fips in real floats and ints?
The code below returns float() argument must be a string or a real number, not 'set'
mask['latitude'] = mask['latitude'].astype('float64')
mask['longitude'] = mask['irs_pop'].astype('float64')
mask['irs_pop'] = mask['irs_pop'].astype('int64')
mask['fips'] = mask['fips'].astype('int64')
Code below returns sequence item 0: expected str instance, float found
mask['fips'] = mask['fips'].apply(lambda x: ','.join(x))
mask = mask.astype({'fips' : 'int64'})
returns int() argument must be a string, a bytes-like object or a real number, not 'set'
CodePudding user response:
So, you could do the following. Notice, you need to convert every element in the set to a str
, so just use map
and str
:
mask['fips'] = mask['fips'].apply(lambda x: ','.join(map(str, x)))
This will store your floats as a comma delimited string. This would have to be parsed back into whatever format you want when reading it back.
CodePudding user response:
Try this:
for col in ['irs_pop', 'latitude', 'longitude']:
mask[col] = mask[col].astype(str).str[1:-1].astype(int)
It looks like you have multiple FIPS in your FIPS column so you wont be able to convert to a single FIPS code. Most importantly, FIPS can have leading zeros so should be converted to strings.
CodePudding user response:
You would need to convert to tuple/list and to slice with str
:
df['col'] = df['col'].agg(tuple).str[0]
Example:
df = pd.DataFrame({'col': [{1},{2,3},{}]})
df['col2'] = df['col'].agg(tuple).str[0]
Output:
col col2
0 {1} 1.0
1 {2, 3} 2.0 # this doesn't seem to be the case in your data
2 {} NaN
If you want a string as output, with all values if multiple:
df['col'] = df['col'].astype(str).str[1:-1]
Output (as new column for clarity):
col col2
0 {1} 1
1 {2, 3} 2, 3
2 {}
CodePudding user response:
It looks like you have sets with a single value in these columns. The problem may be upstream where these values were filled in the first place. But you could clean it up by applying a function that pops a value from the set and converts it to a float.
import pandas as pd
mask = pd.DataFrame({"latitude":[{40.81}, {40.81}],
"longitude":[{-73.04}, {-73.04}]})
print(mask)
columns = ["latitude", "longitude"]
for col in columns:
mask[col] = mask[col].apply(lambda s: float(s.pop()))
print(mask)
You could have pandas handle the for loop by doing a double apply
mask[columns] = mask[columns].apply(
lambda series: series.apply(lambda s: float(s.pop())))
print(mask)