I have a dataframe like this:
>>> df = pd.DataFrame({
'Date': [
pd.to_datetime("2022-07-01"),
pd.to_datetime("2020-07-02"),
pd.to_datetime("2020-07-03"),
],
"Price": [
{24.9, 23.0, 22.5, 23.5},
{24.9, 25.0, 26.5, 23.7},
{25.2, 24.5, 23.6, 23.8},
]})
>>> df
Date Price
0 2022-07-01 {24.9, 23.5, 22.5, 23.0}
1 2020-07-02 {24.9, 25.0, 26.5, 23.7}
2 2020-07-03 {24.5, 25.2, 23.8, 23.6}
I want to add a new column 'intersec' and get the intersection of the Price column and its shift value. But when I use
df['intersec'] = df.price[1:]&df.price.shift()[1:]
It doesn't work, I get the following error:
TypeError: unsupported operand type(s) for &: 'set' and 'bool'
what should I do? My expected result is:
>>> df
Date Price intersec
0 2022/7/1 {24.9, 23.5, 22.5, 23.0} NaN
1 2020/7/2 {24.9, 25.0, 26.5, 23.7} 24.9
2 2020/7/3 {24.5, 25.2, 23.8, 23.6} NaN
CodePudding user response:
shift the price
df["shifted_price"] = df.Price.shift()
find intersect
df["intersec"] = df[1:].apply(lambda x: list(set(x["Price"]) & set(x["shifted_price"])), axis=1)
sample output
Date Price shifted_price intersec
0 2022-07-01 {24.9, 23.5, 22.5, 23.0} NaN NaN
1 2020-07-02 {24.9, 25.0, 26.5, 23.7} {24.9, 23.5, 22.5, 23.0} [24.9]
2 2020-07-03 {24.5, 25.2, 23.8, 23.6} {24.9, 25.0, 26.5, 23.7} []
CodePudding user response:
One way to do this would be:
df["intersec"] = [np.NaN] [p.intersection(pp) if p.intersection(pp) else np.NaN
for p, pp in zip(df["Price"][1:], df["Price"])]
CodePudding user response:
ans = []
for a, b in zip(df['Price'].values, df['Price'].shift().values):
try:
intersection = a & b
except:
intersection = None
ans.append(intersection)
This outputs [None, {24.9}, set()]
CodePudding user response:
Since the values of the cells are sets, there is no vectorised operation for this, so I suggest you use a list comprehension:
import numpy as np
df["intersec"] = [i if pd.notna(q) and (i := p & q) else np.nan for p, q in zip(df["Price"], df["Price"].shift())]
print(df)
Output
Date Price intersec
0 2022-07-01 {24.9, 23.0, 22.5, 23.5} NaN
1 2020-07-02 {24.9, 25.0, 26.5, 23.7} {24.9}
2 2020-07-03 {24.5, 25.2, 23.6, 23.8} NaN