Recently I've updated featuretools to v1.0.0 and faced the following issue. I have instances that vary within time and I want to build time dependent features for them. Besides, I want to save some historical characteristics of those instances. So my cutoff time dataset consists of such columns as: time, instance_id and feature1, feature2, ..., target
When I tried to to run dfs, I got the error 'NoneType' object has no attribute 'logical_types'
I have found out that it is caused by the inner function get_ww_types_from_features
It tries to get the column types of cutoff time df, assuming it has woodwork type
cutoff_schema = cutoff_time.ww.schema
for column in pass_columns:
logical_types[column] = cutoff_schema.logical_types[column]
semantic_tags[column] = cutoff_schema.semantic_tags[column]
origins[column] = "base"
But originally cutoff time is pandas DataFrame and I haven't found the place in code where it is translated into woodwork. And also it is said in the documentation that it is ok to pass cutoff time as pandas DataFrame
As a result, my question is: what is a proper way to pass cutoff time DataFrame (if it is pandas then is where a mistake in code?) (or if there is no mistake, then should I transform cutoff time to wood work manually in code before dfs?)
CodePudding user response:
cutoff_times = pd.DataFrame()
cutoff_times['customer_id'] = [1, 2, 3, 1]
cutoff_times['time'] = pd.to_datetime(['2014-1-1 04:00',
'2014-1-1 05:00',
'2014-1-1 06:00',
'2014-1-1 08:00'])
cutoff_times['label'] = [True, True, False, True]
cutoff_times
fm, features = ft.dfs(entityset=es,`enter code here`
target_dataframe_name='customers',
cutoff_time=cutoff_times,
cutoff_time_in_index=True)
fm
CodePudding user response:
There is a bug in Featuretools 1.0.0 when using a dataframe for cutoff_time
that has additional columns (e.g. labels) and using multiple workers via the n_jobs
or dask_kwargs
options. This may be the issue you are encountering.
Tracking issue: #1763
A pandas df is meant to be accepted for cutoff_time
in 1.0.0