Calculate pandas column only when value is NaN-CodePudding

I have a DataFrame where one column is a dictionary; from which I need to extract certain values.

data = {'tags': [{'access': 'private', 'highway': 'service', 
'is_in': 'Bellville Campus, Cape Peninsula University of Technology, Bellville, Western Cape, South Africa', 
'lanes': '2', 'maxheight': '3.3', 'maxspeed': '30', 'name': 'Engineering Way'}]}

df = pd.DataFrame(data)
df['lanes'] = df['tags'].apply(lambda x: x.get('lanes'))
df['width'] = df['tags'].apply(lambda x: x.get('width'))

While width will sometimes have values; lanes will always have values. When width has no value I want to calculate one based on lanes. So:

df['lanes'] = pd.to_numeric(rd['lanes'])
df['width'] = pd.to_numeric(rd['width'])
print(df)

                                                tags  lanes  width
0  {'access': 'private', 'highway': 'service', 'i...      2    NaN

Then a function to do the business:

##-- https://stackoverflow.com/questions/33883200/pandas-how-to-fill-nan-none-values-based-on-the-other-columns

def calc_width(row):
    if np.isnan(row['width']):
    #if row['width'] == np.nan:
        """if nan, calculate the width based on lanes"""
        return row['lanes'] * 2.4

df['width'] = df.apply(calc_width)

I get:

Traceback (most recent call last):

  Input In [49] in <cell line: 11>
    df['width'] = df.apply(calc_width)

  File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\frame.py:8839 in apply
    return op.apply().__finalize__(self, method="apply")

  File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\apply.py:727 in apply
    return self.apply_standard()

  File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\apply.py:851 in apply_standard
    results, res_index = self.apply_series_generator()

  File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\apply.py:867 in apply_series_generator
    results[i] = self.f(v)

  Input In [47] in calc_width
    if np.isnan(row['width']):

  File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\series.py:958 in __getitem__
    return self._get_value(key)

  File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\series.py:1069 in _get_value
    loc = self.index.get_loc(label)

  File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\indexes\range.py:389 in get_loc
    raise KeyError(key)

KeyError: 'width'

How do I calculate a width, based on lanes when the value is NaN?

CodePudding user response：

here is another way keeping the code that you already have and just updating the line where you're assigning to the width


df['width'] = df['tags'].apply(lambda x: (int(x.get('lanes'))*2.4) if (x.get('width') is None) else x.get('width') )

CodePudding user response：

Try:

def calc_width(row):
    if pd.isna(row["width"]):
        """if nan, calculate the width based on lanes"""
        return float(row["lanes"]) * 2.4


df["width"] = df.apply(calc_width, axis=1)
print(df)

Prints:

                   tags lanes  width
0  {'access': 'private', 'highway': 'service', 'is_in': 'Bellville Campus, Cape Peninsula University of Technology, Bellville, Western Cape, South Africa', 'lanes': '2', 'maxheight': '3.3', 'maxspeed': '30', 'name': 'Engineering Way'}     2    4.8

Or:

df["lanes"] = df["lanes"].astype(float)

mask = pd.isna(df["width"])
df.loc[mask, "width"] = df.loc[mask, "lanes"] * 2.4