Argument 'values' has incorrect type (expected numpy.ndarray, got Index)-CodePudding

I am trying to read 'year' and 'month' from a csv to put them into a single column. i can do this without index_col or with index_col=1 or more. but when i try index_col=0 i get an error:

Argument 'values' has incorrect type (expected numpy.ndarray, got Index)

this is my csv file:

No,year,month,gw,temp,evap
1,2010,1,120.92,66.695,54.62
2,2010,2,121.11,67.15,54.62
3,2010,3,121.2,67.11,54.22
4,2010,4,119.33,67.495,53.12

This is my code:

def parse(x):
    return datetime.strptime(x, '%Y %m').strftime('%Y %m')
dataset = read_csv('data_test1.csv', parse_dates=[['year', 'month']], index_col=0 ,date_parser=parse)
dataset.drop('No', axis=1, inplace=True)

and this is my dataset.head() after i execute the above code:

datetime	No	gw	temp	evap
2010 01	1	120.92	66.695	54.62
2010 02	2	121.11	67.150	54.62
2010 03	3	121.20	67.110	54.22
2010 04	4	119.33	67.495	53.12
2010 05	5	119.26	67.280	53.30

i want to have it like this:

              gw      temp    evap
date                                                                 
2010-01-01    120.92  66.695  54.62
2010-01-01    121.11  67.150  54.62
2010-01-01    121.20  67.110  54.22
2010-01-01    119.33  67.495  53.12
2010-01-01    119.26  67.280  53.30

the code below i run:

import pandas as pd

def parse(x):
    return pd.to_datetime(x, format='%Y %m')

dfT = pd.read_csv('data_test1.csv')
dfT['datetime']= dfT['datetime'].apply(parse)
dfT['year'], dfT['month'] = dfT['datetime'].dt.year, dfT['datetime'].dt.month
dfT.set_index('datetime', inplace=True)

output image

i insist on having the date with year-month format to use them on my graphs. if anyone can help <3

CodePudding user response：

Your dataset and the output you are looking for does not match, still I believe that the best approach would do the following

import pandas as pd

def parse(x):
    return pd.to_datetime(x, format='%Y %m')

dfT = pd.read_csv('test.csv')
dfT['datetime1']= dfT['datetime'].apply(parse)
dfT['year'], dfT['month'], dfT['month_name'] = dfT['datetime1'].dt.year, dfT['datetime1'].dt.month, dfT['datetime1'].dt.month_name()
dfT.set_index('datetime1', inplace=True)
dfT.drop('No', axis=1, inplace=True)

I don't know if you are looking to do it with several datasets, and you need to automatize the process. Your question was not very clear to be honest, still I believe that will solver your current problem. I would advice you create a function to pass only the dataframe, and if you have several you can use a list comprehension while using the function.

CodePudding user response：

with the help of @ReinholdN i fixed the problem and this is the final code:

def parse(x):
    return datetime.strptime(x, '%Y %m').strftime('%Y %m')
dataset = read_csv('data_test1.csv', parse_dates={'datetime': ['year', 'month']}, date_parser=parse)
dataset.drop('No', axis=1, inplace=True)
dataset.set_index('datetime', inplace=True)

The last line dataset.set_index('datetime', inplace=True) is like i say index_col=0