I am following a tutorial to make a Gantt chart with this tutorial: https://towardsdatascience.com/gantt-charts-with-pythons-matplotlib-395b7af72d72
I have tried to recreate part of the test dataset with the following script:
import pandas as pd
data = [['TSK M', 'IT', '2022-03-17', '2022-03-20', '0.0'], ['TSK N', 'MKT', '2022-03-17', '2022-03-19', '0.0']]
df = pd.DataFrame(data, columns = ['Task', 'Department', 'Start', 'End', 'Completion'])
Then processing the dataframe through the first part of the tutorial, I end up with and error message:
proj_start = df['Start'].min()
df['start_num'] = (df.Start-proj_start).dt.days
TypeError: unsupported operand type(s) for -: 'str' and 'str'
I have tried to convert the data in integer with the function int(), but the error persist. Would anyone know what is wrong here?
CodePudding user response:
You need to convert date column to datetime type first
df['Start'] = pd.to_datetime(df['Start'])
df['End'] = pd.to_datetime(df['End'])
# Or
df[['Start', 'End']] = df[['Start', 'End']].apply(pd.to_datetime)