Home > Net >  Replace value in rows (does not meet condition) with next closest row (meets condition)
Replace value in rows (does not meet condition) with next closest row (meets condition)

Time:11-29

I have a dataframe as shown below. They are ordered Ascendingly by Column A and B. Only Occurrences >= 10 are valid, thus for rows with occurrences with less than 10, I want to replace their values with the next/closest valid row.

Column A Column B Occurrences Value
Cell 1 Cell 2 1 0
Cell 1 Cell 3 2 0
Cell 1 Cell 4 10 5
Cell 1 Cell 5 1 1
Cell 1 Cell 6 12 4
Cell 2 Cell 1 1 7

Here is what the final dataframe should look like. I would like to do this in Bigquery but if its not possible, python would work as well.

Column A Column B Occurrences Value
Cell 1 Cell 2 1 5
Cell 1 Cell 3 2 5
Cell 1 Cell 4 10 5
Cell 1 Cell 5 1 4
Cell 1 Cell 6 12 4
Cell 2 Cell 1 1 4

I have the dataframe all set up, but just having trouble figuring out the logic to apply this.

Logic:

  1. Start from the top and go through each row to check number of occurrences.
  2. If occurrences <10, look for the next valid row and take that value replace the non-valid row.
  3. If the last row is non-valid, it should take the value from previous row that is valid.

CodePudding user response:

Something like this should work in Python:

import pandas as pd
import numpy as np

# example dataframe
dict = {'Column A': ['Cell 1', 'Cell 1', 'Cell 1', 'Cell 1', 'Cell 1', 'Cell 2'],
      'Column B': ['Cell 2', 'Cell 3', 'Cell 4', 'Cell 5', 'Cell 6', 'Cell 1'],
      'Occurrences': [1, 2, 10, 1, 12, 1],
      'Value': [0, 0, 5, 1, 4, 7]}
df = pd.DataFrame(dict)

# if Occurrences < 10, set Value to nan
df.loc[df['Occurrences'] < 10, 'Value'] = np.nan

# use interpolate and fillna to set missing (nan) values to nearest non-missing row value
# (first use bfill or "backfill" method, so take the value from the "next" valid row; then apply the ffill or
# "forwardfill" method for any rows at the end that were not filled)
df.loc[:, 'Value'] = df['Value'].fillna(method='bfill').fillna(method='ffill')

Note that if you just want to fill these values with the nearest row value, you can use enter image description here

  • Related