I am trying to create a column which is a reflection of the current state of an object in a data frame that contains a timeline of actions performed on this object.
I have columns reflecting add
and remove
actions and would like to create a new column reflecting the content that is currently available.
Example: Given the DataFrame
df = pd.DataFrame([(1, None, None, None),(2, None, 'Apple', None),(3, None, 'Banana', None),(4, None, None, 'Apple'),(5, None, None, 'Banana'),(6, None, None, None)],
columns=['day', 'inventory', 'bought', 'sold'])
print(df)
day inventory bought sold
0 1 None None None
1 2 None Apple None
2 3 None Banana None
3 4 None None Apple
4 5 None None Banana
5 6 None None None
I would like to populate the inventory
column like so:
day inventory bought sold
0 1 None None None
1 2 None Apple None
2 3 Apple Banana None
3 4 Apple, Banana None Apple
4 5 Banana None Banana
5 6 None None None
CodePudding user response:
Assuming you buy/sell unique items, you can use a list comprehension with set
operations:
S = set()
df['inventory'] = [','.join(S:=S.union([b] if pd.notna(b) else set())
.difference([s] if pd.notna(s) else set()))
for b, s in zip(df['bought'].shift(), df['sold'].shift())]
NB. this requires python ≥ 3.8 due to the walrus operator (:=
).
output:
day inventory bought sold
0 1 None None
1 2 Apple None
2 3 Apple Banana None
3 4 Apple,Banana None Apple
4 5 Banana None Banana
5 6 None None
CodePudding user response:
You can create a method:
inv = []
def update_inventory(x):
if x['bought'] is not None:
inv.append(x['bought'])
if x['sold'] is not None:
inv.remove(x['sold'])
return ','.join(inv) if inv else None
df['inventory'] = df.apply(update_inventory, axis=1).shift().replace({np.nan:None})
print(df):
day inventory bought sold
0 1 None None None
1 2 None Apple None
2 3 Apple Banana None
3 4 Apple,Banana None Apple
4 5 Banana None Banana
5 6 None None None