So, I'm wanting to do some visualization on EPA environmental media sampling data for PFAS. I'm using pandas and matplotlib for this. I've got the following code:
import matplotlib.pyplot as plt
import numpy as np
import pandas as pd
import csv
pd.set_option('display.max_columns', 500)
inputpath="CHI"
col_for_analysis=["Environmental Media Name", "Year", "Result Measure Value (ppt)"]
dataset=pd.read_csv(inputpath,sep=',', dtype={'a': str}, usecols= col_for_analysis, low_memory=False)
dataset.sort_values(by=["Year"], ascending=True, inplace=True)
print(dataset)
dataset["Result Measure Value (ppt)"] = dataset["Result Measure Value (ppt)"].fillna(0, inplace=True)
dataset["Result Measure Value (ppt)"] = dataset["Result Measure Value (ppt)"].astype(int)
The end goal here, at least for now, is to sort everything by Year and then plot the "Year" column on the x-axis and the "Result Measure Value (ppt)" column on the y-axis. When I tried it initially, I was getting error messages indicating that the "Result Measure Value (ppt)" column contained NoneType values, so matplotlib couldn't plot it.
No big deal, I think to myself, I'll just use dataset["Result Measure Value (ppt)"] = dataset["Result Measure Value (ppt)"].fillna(0, inplace=True)
to remove those NoneType values and replace them with a nice, hopefully plottable 0
.
That seemed to work. So I went on to try to change all the values in that column to int values, so they could all be plotted by matplotlib. I tried to do this by adding the line:
dataset["Result Measure Value (ppt)"] = dataset["Result Measure Value (ppt)"].astype(int)
That line of code throws the following, rather lengthy error message:
Traceback (most recent call last):
File "main.py", line 18, in <module>
dataset["Result Measure Value (ppt)"] = dataset["Result Measure Value (ppt)"].astype(int)
File "/home/runner/Fun-Public-Health-Project/venv/lib/python3.8/site-packages/pandas/core/generic.py", line 5912, in astype
new_data = self._mgr.astype(dtype=dtype, copy=copy, errors=errors)
File "/home/runner/Fun-Public-Health-Project/venv/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 419, in astype
return self.apply("astype", dtype=dtype, copy=copy, errors=errors)
File "/home/runner/Fun-Public-Health-Project/venv/lib/python3.8/site-packages/pandas/core/internals/managers.py", line 304, in apply
applied = getattr(b, f)(**kwargs)
File "/home/runner/Fun-Public-Health-Project/venv/lib/python3.8/site-packages/pandas/core/internals/blocks.py", line 580, in astype
new_values = astype_array_safe(values, dtype, copy=copy, errors=errors)
File "/home/runner/Fun-Public-Health-Project/venv/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1292, in astype_array_safe
new_values = astype_array(values, dtype, copy=copy)
File "/home/runner/Fun-Public-Health-Project/venv/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1237, in astype_array
values = astype_nansafe(values, dtype, copy=copy)
File "/home/runner/Fun-Public-Health-Project/venv/lib/python3.8/site-packages/pandas/core/dtypes/cast.py", line 1154, in astype_nansafe
return lib.astype_intsafe(arr, dtype)
File "pandas/_libs/lib.pyx", line 668, in pandas._libs.lib.astype_intsafe
TypeError: int() argument must be a string, a bytes-like object or a number, not 'NoneType'
Now, I thought that the line
dataset["Result Measure Value (ppt)"] = dataset["Result Measure Value (ppt)"].fillna(0, inplace=True)
would get rid of all NoneType values in the "Result Measure Value (ppt)" column by filling in any NoneType values with a 0. Am I wrong in thinking this? If so, how do I rid the column of NoneType values or otherwise get all the values in that column converted into something that I can work with to plot along with Year? Otherwise, how can I fix the code so that all values in this column can be converted to int and then plotted? Thanks!
CodePudding user response:
You should either change it up "inplace
":
dataset["Result Measure Value (ppt)"].fillna(0, inplace=True)
Or assign it without the inplace
argument:
["Result Measure Value (ppt)"] = dataset["Result Measure Value (ppt)"].fillna(0)
But not both at the same time, since using the inplace
argument makes it not return anything (None
)