I keep getting this error when importing top2vec.
TypeError Traceback (most recent call last)
Cell In [1], line 1
----> 1 from top2vec import Top2Vec
File ~\AppData\Roaming\Python\Python39\site-packages\top2vec\__init__.py:1
----> 1 from top2vec.Top2Vec import Top2Vec
3 __version__ = '1.0.27'
File ~\AppData\Roaming\Python\Python39\site-packages\top2vec\Top2Vec.py:12
10 from gensim.models.phrases import Phrases
11 import umap
---> 12 import hdbscan
13 from wordcloud import WordCloud
14 import matplotlib.pyplot as plt
File ~\AppData\Roaming\Python\Python39\site-packages\hdbscan\__init__.py:1
----> 1 from .hdbscan_ import HDBSCAN, hdbscan
2 from .robust_single_linkage_ import RobustSingleLinkage, robust_single_linkage
3 from .validity import validity_index
File ~\AppData\Roaming\Python\Python39\site-packages\hdbscan\hdbscan_.py:509
494 row_indices = np.where(np.isfinite(matrix).sum(axis=1) == matrix.shape[1])[0]
495 return row_indices
498 def hdbscan(
499 X,
500 min_cluster_size=5,
501 min_samples=None,
502 alpha=1.0,
503 cluster_selection_epsilon=0.0,
504 max_cluster_size=0,
505 metric="minkowski",
506 p=2,
507 leaf_size=40,
508 algorithm="best",
--> 509 memory=Memory(cachedir=None, verbose=0),
510 approx_min_span_tree=True,
511 gen_min_span_tree=False,
512 core_dist_n_jobs=4,
513 cluster_selection_method="eom",
514 allow_single_cluster=False,
515 match_reference_implementation=False,
516 **kwargs
517 ):
518 """Perform HDBSCAN clustering from a vector array or distance matrix.
519
520 Parameters
(...)
672 Density-based Cluster Selection. arxiv preprint 1911.02282.
673 """
674 if min_samples is None:
TypeError: __init__() got an unexpected keyword argument 'cachedir'
Python version: 3.9.7 (64-bit)
Have installed MSBuild
No errors when pip installing this package
Does anyone know a solution to this problem or experienced a similar problem?
CodePudding user response:
It looks like you are using latest versions of hdbscan
and joblib
packages available on PyPI.
cachedir
was removed from joblib.Memory
some 8 months ago as depreciated. The latest version on PyPi is 1.2.0 from Sep 16, 2022, i.e. it incorporate this change
hdbscan
source code on GitHub was last updated like 7 days ago. Unfortunately the latest hdbscan
release on PyPi is ver. 0.8.28 as of Feb 8, 2022 and still not updated. It still use memory=Memory(cachedir=None, verbose=0),
One possible solution is to force using joblib
version before cachedir
was removed - ver. 1.1.0 as of Oct 7, 2021