So i have some Trading data, and ultimately i would like to find 1 week return 1 month, 3 month and so on. To do this i think it is easiest if i can get a column with days existed for each asset.
Data is simplified to this:
Id | Date | Price |
---|---|---|
101 | 2014-09-14 | 0.7 |
101 | 2014-09-15 | 0.8 |
101 | 2014-09-16 | 0.9 |
101 | 2014-09-17 | 0.8 |
103 | 2014-10-03 | 1,6 |
103 | 2014-10-04 | 1.7 |
103 | 2014-10-05 | 1.7 |
103 | 2014-10-07 | 1.9 |
107 | 2015-10-17 | 24 |
107 | 2015-10-18 | 21 |
107 | 2015-10-23 | 22 |
I would then like an new column named "Days listed"
Id | Date | Price | Days listed |
---|---|---|---|
101 | 2014-09-14 | 0.7 | 1 |
101 | 2014-09-15 | 0.8 | 2 |
101 | 2014-09-16 | 0.9 | 3 |
101 | 2014-09-17 | 0.8 | 4 |
103 | 2014-10-03 | 1,6 | 1 |
103 | 2014-10-04 | 1.7 | 2 |
103 | 2014-10-05 | 1.7 | 3 |
103 | 2014-10-07 | 1.9 | 5 |
107 | 2015-10-17 | 24 | 1 |
107 | 2015-10-18 | 21 | 2 |
107 | 2015-10-23 | 22 | 7 |
i think it needs to be grouped by id, but other than that i am out of ideas. Hope someone can help me Best regards the confused student
CodePudding user response:
use pd.groupby to get a earliest date for each group, which is then subtracted by the dates to get the days-listed
df['Date']=pd.to_datetime(df['Date'])
df['DaysListed']=(df['Date'] - df.groupby('Id')['Date'].transform('min')).dt.days 1
df
Id Date Price DaysListed
0 101 2014-09-14 0.7 1
1 101 2014-09-15 0.8 2
2 101 2014-09-16 0.9 3
3 101 2014-09-17 0.8 4
4 103 2014-10-03 1.6 1
5 103 2014-10-04 1.7 2
6 103 2014-10-05 1.7 3
7 103 2014-10-07 1.9 5
8 107 2015-10-17 24 1
9 107 2015-10-18 21 2
10 107 2015-10-23 22 7
CodePudding user response:
import pandas as pd
from io import StringIO
data = StringIO("""Id;Date;Price
101;2014-09-14;0.7
101;2014-09-15;0.8
101;2014-09-16;0.9
101;2014-09-17;0.8
103;2014-10-03;1.6
103;2014-10-04;1.7
103;2014-10-05;1.7
103;2014-10-07;1.9
107;2015-10-17;24
107;2015-10-18;21
107;2015-10-23;22
""")
df = pd.read_csv(data, sep=';')
df['Date'] = pd.to_datetime(df['Date'])
df['Price'] = df['Price'].astype(float)
df = df.assign(Days_list = df.groupby('Id').cumcount() 1)
Output:
Id Date Price Days_list
0 101 2014-09-14 0.7 1
1 101 2014-09-15 0.8 2
2 101 2014-09-16 0.9 3
3 101 2014-09-17 0.8 4
4 103 2014-10-03 1.6 1
5 103 2014-10-04 1.7 2
6 103 2014-10-05 1.7 3
7 103 2014-10-07 1.9 4
8 107 2015-10-17 24.0 1
9 107 2015-10-18 21.0 2
10 107 2015-10-23 22.0 3