Replace value in rows (does not meet condition) with next closest row (meets condition)-CodePudding

I have a dataframe as shown below. They are ordered Ascendingly by Column A and B. Only Occurrences >= 10 are valid, thus for rows with occurrences with less than 10, I want to replace their values with the next/closest valid row.

Column A	Column B	Occurrences	Value
Cell 1	Cell 2	1	0
Cell 1	Cell 3	2	0
Cell 1	Cell 4	10	5
Cell 1	Cell 5	1	1
Cell 1	Cell 6	12	4
Cell 2	Cell 1	1	7

Here is what the final dataframe should look like. I would like to do this in Bigquery but if its not possible, python would work as well.

Column A	Column B	Occurrences	Value
Cell 1	Cell 2	1	5
Cell 1	Cell 3	2	5
Cell 1	Cell 4	10	5
Cell 1	Cell 5	1	4
Cell 1	Cell 6	12	4
Cell 2	Cell 1	1	4

I have the dataframe all set up, but just having trouble figuring out the logic to apply this.

Logic:

Start from the top and go through each row to check number of occurrences.
If occurrences <10, look for the next valid row and take that value replace the non-valid row.
If the last row is non-valid, it should take the value from previous row that is valid.

CodePudding user response：

Something like this should work in Python:

import pandas as pd
import numpy as np

# example dataframe
dict = {'Column A': ['Cell 1', 'Cell 1', 'Cell 1', 'Cell 1', 'Cell 1', 'Cell 2'],
      'Column B': ['Cell 2', 'Cell 3', 'Cell 4', 'Cell 5', 'Cell 6', 'Cell 1'],
      'Occurrences': [1, 2, 10, 1, 12, 1],
      'Value': [0, 0, 5, 1, 4, 7]}
df = pd.DataFrame(dict)

# if Occurrences < 10, set Value to nan
df.loc[df['Occurrences'] < 10, 'Value'] = np.nan

# use interpolate and fillna to set missing (nan) values to nearest non-missing row value
# (first use bfill or "backfill" method, so take the value from the "next" valid row; then apply the ffill or
# "forwardfill" method for any rows at the end that were not filled)
df.loc[:, 'Value'] = df['Value'].fillna(method='bfill').fillna(method='ffill')

Note that if you just want to fill these values with the nearest row value, you can use