Home > Net >  Calculate pandas column only when value is NaN
Calculate pandas column only when value is NaN

Time:06-25

I have a DataFrame where one column is a dictionary; from which I need to extract certain values.

data = {'tags': [{'access': 'private', 'highway': 'service', 
'is_in': 'Bellville Campus, Cape Peninsula University of Technology, Bellville, Western Cape, South Africa', 
'lanes': '2', 'maxheight': '3.3', 'maxspeed': '30', 'name': 'Engineering Way'}]}

df = pd.DataFrame(data)
df['lanes'] = df['tags'].apply(lambda x: x.get('lanes'))
df['width'] = df['tags'].apply(lambda x: x.get('width'))

While width will sometimes have values; lanes will always have values. When width has no value I want to calculate one based on lanes. So:

df['lanes'] = pd.to_numeric(rd['lanes'])
df['width'] = pd.to_numeric(rd['width'])
print(df)

                                                tags  lanes  width
0  {'access': 'private', 'highway': 'service', 'i...      2    NaN

Then a function to do the business:

##-- https://stackoverflow.com/questions/33883200/pandas-how-to-fill-nan-none-values-based-on-the-other-columns

def calc_width(row):
    if np.isnan(row['width']):
    #if row['width'] == np.nan:
        """if nan, calculate the width based on lanes"""
        return row['lanes'] * 2.4

df['width'] = df.apply(calc_width)

I get:

Traceback (most recent call last):

  Input In [49] in <cell line: 11>
    df['width'] = df.apply(calc_width)

  File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\frame.py:8839 in apply
    return op.apply().__finalize__(self, method="apply")

  File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\apply.py:727 in apply
    return self.apply_standard()

  File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\apply.py:851 in apply_standard
    results, res_index = self.apply_series_generator()

  File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\apply.py:867 in apply_series_generator
    results[i] = self.f(v)

  Input In [47] in calc_width
    if np.isnan(row['width']):

  File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\series.py:958 in __getitem__
    return self._get_value(key)

  File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\series.py:1069 in _get_value
    loc = self.index.get_loc(label)

  File ~\miniconda3\envs\osm3D_vc-env\lib\site-packages\pandas\core\indexes\range.py:389 in get_loc
    raise KeyError(key)

KeyError: 'width'

How do I calculate a width, based on lanes when the value is NaN?

CodePudding user response:

here is another way keeping the code that you already have and just updating the line where you're assigning to the width


df['width'] = df['tags'].apply(lambda x: (int(x.get('lanes'))*2.4) if (x.get('width') is None) else x.get('width') )

CodePudding user response:

Try:

def calc_width(row):
    if pd.isna(row["width"]):
        """if nan, calculate the width based on lanes"""
        return float(row["lanes"]) * 2.4


df["width"] = df.apply(calc_width, axis=1)
print(df)

Prints:

                   tags lanes  width
0  {'access': 'private', 'highway': 'service', 'is_in': 'Bellville Campus, Cape Peninsula University of Technology, Bellville, Western Cape, South Africa', 'lanes': '2', 'maxheight': '3.3', 'maxspeed': '30', 'name': 'Engineering Way'}     2    4.8

Or:

df["lanes"] = df["lanes"].astype(float)

mask = pd.isna(df["width"])
df.loc[mask, "width"] = df.loc[mask, "lanes"] * 2.4
  • Related