Home > Net >  How to ignore the successive same values of a column considering pandas dataframe?
How to ignore the successive same values of a column considering pandas dataframe?

Time:03-02

I have this pandas dataframe:

df

I want to create a new column "entry_price" that for each day, it considers the first "buy" value in entry, and writes the associated "open" in this column,

This is an example of what dataframe i want to have: (but maybe there's a better way) desidered df

So as you can see, i need to consider only the first "buy" of the day,

I tried with no success this method:

df['entry_price'] = df['open'].where(df['entry'] == "buy")

this method does not ignore the successive "buy" values: does not consider the next "buy" of the same day as a "nan". Any ideas?

CodePudding user response:

We first filter the data that (entry equals to buy) and (not duplicate by date and entry), which we can get the first buy by each date, then we insert value with column open.

tbl = {"date" :["2022-02-28", "2022-02-28", "2022-03-01", "2022-03-01"],
       "entry" : ["no", "buy", "buy", "buy"],
       "open" : [1.12, 1.13, 1.135, 1.132]}
df = pd.DataFrame(tbl)
df.sort_values(by = "date", inplace=True)
df.loc[(df["entry"] == "buy" ) & (~df.duplicated(["date", "entry"])), "entry_price"] = df.loc[(df["entry"] == "buy" ) & (~df.duplicated(["date", "entry"])), "open"]

df
    date        entry   open    entry_price
0   2022-02-28  no      1.120   NaN
1   2022-02-28  buy     1.130   1.130
2   2022-03-01  buy     1.135   1.135
3   2022-03-01  buy     1.132   NaN

Since the sample data I generated is a simple one, make sure to sort the data by date first before you check the duplicate rows.

CodePudding user response:

You should actually filter your dataframe only where entry == 'buy', create a new date format only with day and then use groupby method using only the minimum date

data = {"date": ["2022-02-28 06:00:00", "2022-02-28 06:00:05", "2022-03-01 06:59:35", "2022-03-01 06:59:40"],"entry": ["no", "buy", "buy", "buy"], "open": [1.12, 1.13, 1.135, 1.132]}

df = pd.DataFrame(data)
df["day"] = df["date"].apply(lambda elem: elem.split(" ")[0])
# indentify the dates index
dates = df[df['entry'] == 'buy'].groupby("day")["date"].apply(min)
df[df["date"].isin(dates.values)]

                  date entry   open         day
1  2022-02-28 06:00:05   buy  1.130  2022-02-28
2  2022-03-01 06:59:35   buy  1.135  2022-03-01
  • Related