I am trying to read 'year' and 'month' from a csv to put them into a single column.
i can do this without index_col
or with index_col=1
or more.
but when i try index_col=0
i get an error:
Argument 'values' has incorrect type (expected numpy.ndarray, got Index)
this is my csv file:
No,year,month,gw,temp,evap
1,2010,1,120.92,66.695,54.62
2,2010,2,121.11,67.15,54.62
3,2010,3,121.2,67.11,54.22
4,2010,4,119.33,67.495,53.12
This is my code:
def parse(x):
return datetime.strptime(x, '%Y %m').strftime('%Y %m')
dataset = read_csv('data_test1.csv', parse_dates=[['year', 'month']], index_col=0 ,date_parser=parse)
dataset.drop('No', axis=1, inplace=True)
and this is my dataset.head()
after i execute the above code:
datetime | No | gw | temp | evap |
---|---|---|---|---|
2010 01 | 1 | 120.92 | 66.695 | 54.62 |
2010 02 | 2 | 121.11 | 67.150 | 54.62 |
2010 03 | 3 | 121.20 | 67.110 | 54.22 |
2010 04 | 4 | 119.33 | 67.495 | 53.12 |
2010 05 | 5 | 119.26 | 67.280 | 53.30 |
i want to have it like this:
gw temp evap
date
2010-01-01 120.92 66.695 54.62
2010-01-01 121.11 67.150 54.62
2010-01-01 121.20 67.110 54.22
2010-01-01 119.33 67.495 53.12
2010-01-01 119.26 67.280 53.30
the code below i run:
import pandas as pd
def parse(x):
return pd.to_datetime(x, format='%Y %m')
dfT = pd.read_csv('data_test1.csv')
dfT['datetime']= dfT['datetime'].apply(parse)
dfT['year'], dfT['month'] = dfT['datetime'].dt.year, dfT['datetime'].dt.month
dfT.set_index('datetime', inplace=True)
i insist on having the date with year-month format to use them on my graphs. if anyone can help <3
CodePudding user response:
Your dataset and the output you are looking for does not match, still I believe that the best approach would do the following
import pandas as pd
def parse(x):
return pd.to_datetime(x, format='%Y %m')
dfT = pd.read_csv('test.csv')
dfT['datetime1']= dfT['datetime'].apply(parse)
dfT['year'], dfT['month'], dfT['month_name'] = dfT['datetime1'].dt.year, dfT['datetime1'].dt.month, dfT['datetime1'].dt.month_name()
dfT.set_index('datetime1', inplace=True)
dfT.drop('No', axis=1, inplace=True)
I don't know if you are looking to do it with several datasets, and you need to automatize the process. Your question was not very clear to be honest, still I believe that will solver your current problem. I would advice you create a function to pass only the dataframe, and if you have several you can use a list comprehension while using the function.
CodePudding user response:
with the help of @ReinholdN i fixed the problem and this is the final code:
def parse(x):
return datetime.strptime(x, '%Y %m').strftime('%Y %m')
dataset = read_csv('data_test1.csv', parse_dates={'datetime': ['year', 'month']}, date_parser=parse)
dataset.drop('No', axis=1, inplace=True)
dataset.set_index('datetime', inplace=True)
The last line dataset.set_index('datetime', inplace=True)
is like i say index_col=0