I have a DataFrame where one column is a dictionary; from which I need to extract certain values.
data = {'tags': [{'access': 'private', 'highway': 'service',
'is_in': 'Bellville Campus, Cape Peninsula University of Technology, Bellville, Western Cape, South Africa',
'lanes': '2', 'maxheight': '3.3', 'maxspeed': '30', 'name': 'Engineering Way'}]}
df = pd.DataFrame(data)
df['lanes'] = df['tags'].apply(lambda x: x.get('lanes'))
df['width'] = df['tags'].apply(lambda x: x.get('width'))
While width
will sometimes have values; lanes
will always have values. When width
has no value I want to calculate one based on lanes
. So:
df['lanes'] = pd.to_numeric(rd['lanes'])
df['width'] = pd.to_numeric(rd['width'])
print(df)
tags lanes width
0 {'access': 'private', 'highway': 'service', 'i... 2 NaN
Then a function to do the business:
##-- https://stackoverflow.com/questions/33883200/pandas-how-to-fill-nan-none-values-based-on-the-other-columns
def calc_width(row):
if np.isnan(row['width']):
#if row['width'] == np.nan:
"""if nan, calculate the width based on lanes"""
return row['lanes'] * 2.4
df['width'] = df.apply(calc_width)
I get:
Traceback (most recent call last):
Input In [49] in <cell line: 11>
df['width'] = df.apply(calc_width)
File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\frame.py:8839 in apply
return op.apply().__finalize__(self, method="apply")
File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\apply.py:727 in apply
return self.apply_standard()
File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\apply.py:851 in apply_standard
results, res_index = self.apply_series_generator()
File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\apply.py:867 in apply_series_generator
results[i] = self.f(v)
Input In [47] in calc_width
if np.isnan(row['width']):
File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\series.py:958 in __getitem__
return self._get_value(key)
File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\series.py:1069 in _get_value
loc = self.index.get_loc(label)
File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\indexes\range.py:389 in get_loc
raise KeyError(key)
KeyError: 'width'
How do I calculate a width
, based on lanes
when the value is NaN
?
CodePudding user response:
here is another way keeping the code that you already have and just updating the line where you're assigning to the width
df['width'] = df['tags'].apply(lambda x: (int(x.get('lanes'))*2.4) if (x.get('width') is None) else x.get('width') )
CodePudding user response:
Try:
def calc_width(row):
if pd.isna(row["width"]):
"""if nan, calculate the width based on lanes"""
return float(row["lanes"]) * 2.4
df["width"] = df.apply(calc_width, axis=1)
print(df)
Prints:
tags lanes width
0 {'access': 'private', 'highway': 'service', 'is_in': 'Bellville Campus, Cape Peninsula University of Technology, Bellville, Western Cape, South Africa', 'lanes': '2', 'maxheight': '3.3', 'maxspeed': '30', 'name': 'Engineering Way'} 2 4.8
Or:
df["lanes"] = df["lanes"].astype(float)
mask = pd.isna(df["width"])
df.loc[mask, "width"] = df.loc[mask, "lanes"] * 2.4