So I do have a dataframe which have got two columns (Time_stamp & Cost) looks something like below:
Time_stamp Cost
01/26/2006 02:00:00 PM 300
01/26/2002 01:50:00 PM 250
01/26/2001 01:50:00 PM 677
01/26/2006 12:00:00 PM 200
01/26/2006 04:00:00 PM 177
01/26/2007 02:51:00 PM 267
01/26/2006 09:45:00 PM 112
01/26/2008 01:15:00 PM 221
01/26/2009 08:30:00 PM 312
01/26/2003 01:35:00 PM 987
01/26/2006 09:40:00 PM 654
First I want to filter out time from the Time_stamp column and then only choose the rows of the year 2006. So, my update CSV file should look something like that:
Time_stamp Cost
01/26/2006 300
01/26/2006 200
01/26/2006 177
01/26/2006 112
01/26/2006 654
Can anyone help me to do that?
CodePudding user response:
import pandas as pd
# read the data
df = pd.read_csv('data.csv')
# split the first column by space and pick the date (1st)
df['Time_stamp'] = df['Time_stamp'].str.split(' ', expand=True).iloc[:,0]
# pick only the dates that ends with "/2006"
df = df[df['Time_stamp'].str.endswith('/2006')].copy()
# print dataframe
print(df)
CodePudding user response:
Convert the column to pandas datetime object, filter out the rows for desired years, then remove the time:
>>> df['Time_stamp'] = pd.to_datetime(df['Time_stamp'])
>>> df=df[df['Time_stamp'].dt.year.eq(2006)]
>>> df['Time_stamp']=df['Time_stamp'].dt.strftime('%m/%d/%Y')
OUTPUT
Time_stamp Cost
0 01/26/2006 300
3 01/26/2006 200
4 01/26/2006 177
6 01/26/2006 112
10 01/26/2006 654