Home > database >  Conditional forward filling string values in Pandas
Conditional forward filling string values in Pandas

Time:08-31

I am trying to create a column which is a reflection of the current state of an object in a data frame that contains a timeline of actions performed on this object. I have columns reflecting add and remove actions and would like to create a new column reflecting the content that is currently available.

Example: Given the DataFrame

df = pd.DataFrame([(1, None, None, None),(2, None, 'Apple', None),(3, None, 'Banana', None),(4, None, None, 'Apple'),(5, None, None, 'Banana'),(6, None, None, None)],
    columns=['day', 'inventory', 'bought', 'sold'])
print(df)
   day inventory  bought    sold
0    1      None    None    None
1    2      None   Apple    None
2    3      None  Banana    None
3    4      None    None   Apple
4    5      None    None  Banana
5    6      None    None    None

I would like to populate the inventory column like so:

   day      inventory  bought    sold
0    1           None    None    None
1    2           None   Apple    None
2    3          Apple  Banana    None
3    4  Apple, Banana    None   Apple
4    5         Banana    None  Banana
5    6           None    None    None

CodePudding user response:

Assuming you buy/sell unique items, you can use a list comprehension with set operations:

S = set()

df['inventory'] = [','.join(S:=S.union([b] if pd.notna(b) else set())
                                .difference([s] if pd.notna(s) else set()))
                   for b, s in zip(df['bought'].shift(), df['sold'].shift())]

NB. this requires python ≥ 3.8 due to the walrus operator (:=).

output:

   day     inventory  bought    sold
0    1                  None    None
1    2                 Apple    None
2    3         Apple  Banana    None
3    4  Apple,Banana    None   Apple
4    5        Banana    None  Banana
5    6                  None    None

CodePudding user response:

You can create a method:

inv = []
def update_inventory(x):
    if x['bought'] is not None:
        inv.append(x['bought'])
    if x['sold'] is not None:
        inv.remove(x['sold'])
    return ','.join(inv) if inv else None
df['inventory'] = df.apply(update_inventory, axis=1).shift().replace({np.nan:None})

print(df):

   day     inventory  bought    sold
0    1          None    None    None
1    2          None   Apple    None
2    3         Apple  Banana    None
3    4  Apple,Banana    None   Apple
4    5        Banana    None  Banana
5    6          None    None    None
  • Related