How to separate time from time_date_stamp in a csv file using Panda?-CodePudding

So I do have a dataframe which have got two columns (Time_stamp & Cost) looks something like below:

Time_stamp                             Cost 
01/26/2006 02:00:00 PM                 300
01/26/2002 01:50:00 PM                 250
01/26/2001 01:50:00 PM                 677
01/26/2006 12:00:00 PM                 200
01/26/2006 04:00:00 PM                 177
01/26/2007 02:51:00 PM                 267
01/26/2006 09:45:00 PM                 112
01/26/2008 01:15:00 PM                 221
01/26/2009 08:30:00 PM                 312
01/26/2003 01:35:00 PM                 987
01/26/2006 09:40:00 PM                 654

First I want to filter out time from the Time_stamp column and then only choose the rows of the year 2006. So, my update CSV file should look something like that:

Time_stamp                  Cost 
01/26/2006                  300
01/26/2006                  200
01/26/2006                  177
01/26/2006                  112
01/26/2006                  654

Can anyone help me to do that?

CodePudding user response：

import pandas as pd

# read the data
df = pd.read_csv('data.csv')

# split the first column by space and pick the date (1st)
df['Time_stamp'] = df['Time_stamp'].str.split(' ', expand=True).iloc[:,0]

# pick only the dates that ends with "/2006"
df = df[df['Time_stamp'].str.endswith('/2006')].copy()

# print dataframe
print(df)

CodePudding user response：

Convert the column to pandas datetime object, filter out the rows for desired years, then remove the time:

>>> df['Time_stamp'] = pd.to_datetime(df['Time_stamp'])
>>> df=df[df['Time_stamp'].dt.year.eq(2006)]
>>> df['Time_stamp']=df['Time_stamp'].dt.strftime('%m/%d/%Y')

OUTPUT

    Time_stamp  Cost
0   01/26/2006   300
3   01/26/2006   200
4   01/26/2006   177
6   01/26/2006   112
10  01/26/2006   654