I have the following DataFrame:
datetime day_fetched col_a col_b
0 2023-01-02 12:00:00 2023-01-01 12:00:00 100 200
1 2023-01-02 12:00:00 2023-01-02 12:00:00 120 400
2 2023-01-03 12:00:00 2023-01-02 12:00:00 140 500
3 2023-01-03 12:00:00 2023-01-03 12:00:00 160 700
4 2023-01-04 12:00:00 2023-01-03 12:00:00 200 300
5 2023-01-04 12:00:00 2023-01-04 12:00:00 430 200
And I want to create a new column that will take the value 2 if there is a difference in the date between datetime
and day_fetched
and value 1 if there is no difference.
So my new Dataframe should look like this:
datetime day_fetched col_a col_b day_ahead
0 2023-01-02 12:00:00 2023-01-01 12:00:00 100 200 2
1 2023-01-02 12:00:00 2023-01-02 12:00:00 120 400 1
2 2023-01-03 12:00:00 2023-01-02 12:00:00 140 500 2
3 2023-01-03 12:00:00 2023-01-03 12:00:00 160 700 1
4 2023-01-04 12:00:00 2023-01-03 12:00:00 200 300 2
5 2023-01-04 12:00:00 2023-01-04 12:00:00 430 200 1
Then based on the column['day_ahead
'], I want to split the col_a
and col_b
, into col_a_1
and col_a_2
and col_b_1
and col_b_2
.
So the final DataFrame will look like this:
datetime day_fetched col_a_1 col_a_2 col_b_1 col_b_2 day_ahead
0 2023-01-02 12:00:00 2023-01-01 12:00:00 NaN 200 NaN 200 2
1 2023-01-02 12:00:00 2023-01-02 12:00:00 120 NaN 100 NaN 1
2 2023-01-03 12:00:00 2023-01-02 12:00:00 NaN 500 NaN 200 2
3 2023-01-03 12:00:00 2023-01-03 12:00:00 160 NaN 100 NaN 1
4 2023-01-04 12:00:00 2023-01-03 12:00:00 NaN 300 NaN 200 2
5 2023-01-04 12:00:00 2023-01-04 12:00:00 430 NaN 100 NaN 1
CodePudding user response:
One solution is to use np.where
:
import pandas as pd
import numpy as np
df = pd.DataFrame(data=
[["2023-01-02 12:00:00", "2023-01-01 12:00:00", 100, 200],
["2023-01-02 12:00:00", "2023-01-02 12:00:00", 120, 400],
["2023-01-03 12:00:00", "2023-01-02 12:00:00", 140, 500],
["2023-01-03 12:00:00", "2023-01-03 12:00:00", 160, 700],
["2023-01-04 12:00:00", "2023-01-03 12:00:00", 200, 300],
["2023-01-04 12:00:00", "2023-01-04 12:00:00", 430, 200]],
columns=["datetime","day_fetched","col_a","col_b"])
# days ahead
df["day_ahead"] = np.where(df["datetime"] == df["day_fetched"], 1, 2)
# column of None's for next section
df["na"] = None
# overwrite dataframe with new df
df = pd.DataFrame(data=np.where(df["day_ahead"] == 1,
[df["datetime"], df["day_fetched"],
df["col_a"], df["na"],
df["col_b"], df["na"],
df["day_ahead"]],
[df["datetime"], df["day_fetched"],
df["na"], df["col_a"],
df["na"], df["col_b"],
df["day_ahead"]]).T,
columns=["datetime", "day_fetched",
"col_a_1", "col_a_2",
"col_b_1", "col_b_2",
"day_ahead"])
df
# datetime day_fetched col_a_1 ... col_b_1 col_b_2 day_ahead
# 0 2023-01-02 12:00:00 2023-01-01 12:00:00 None ... None 200 2
# 1 2023-01-02 12:00:00 2023-01-02 12:00:00 120 ... 400 None 1
# 2 2023-01-03 12:00:00 2023-01-02 12:00:00 None ... None 500 2
# 3 2023-01-03 12:00:00 2023-01-03 12:00:00 160 ... 700 None 1
# 4 2023-01-04 12:00:00 2023-01-03 12:00:00 None ... None 300 2
# 5 2023-01-04 12:00:00 2023-01-04 12:00:00 430 ... 200 None 1
# [6 rows x 7 columns]
When asking a question please provide data that can be easily copied, such as by using df.to_dict()
.