What I want here is for the Nans to ultimately be integer values. Since my dataset is 1000s of columns, I can't just change a couple of columns to make them integer and when I tried df = df.astype('int')
in Dask, after changing the values to 0 floats, for whatever reason, it didn't work. `
While the values below have all reverted to floats in Pandas; in Dask, only some of the columns' zero values reverted to floats. I figure if I can solve this issue in Pandas, then likely it will also solve it in Dask (Fingers crossed).
import pandas as pd
import numpy as np
data = [['tom', 10, 15000], ['nick', 15, 12000], ['juli', 5, 20000]]
# Create the pandas DataFrame
df = pd.DataFrame(data, columns = ['Name', 'Age', 'salary'])
import numpy as np
df = df.replace(5, np.nan)
df = df.replace(12000, np.nan)
expanded = df.replace(np.nan, '0')
expanded = expanded.replace('0', 0)
expanded
CodePudding user response:
IIUC:
from dask.dataframe import from_pandas
ddf = from_pandas(df, npartitions=2)
out = ddf.select_dtypes('number').fillna(0).astype('int64')
Output:
>>> out.compute()
Age salary
0 10 15000
1 15 0
2 0 20000