I use the modin library for multiprocessing.
While the library is great for faster processing, it fails at merge
and I would like to revert to default pandas in between the code.
I understand as per PEP 8: E402 conventions, import should be declared once and at the top of the code however my case would need otherwise.
import pandas as pd
import modin.pandas as mpd
import os
import ray
ray.init()
os.environ["MODIN_ENGINE"] = "ray"
df = mpd.read_csv()
do stuff
Then I would like to revert to default pandas within the same code
but how would i do the below in pandas
as there does not seem to be a clear way to switch from pd
and mpd
in the below lines and unfortunately modin seems to take precedence over pandas.
df = df.loc[:, df.columns.intersection(['col1', 'col2'])]
df = df.drop_duplicates()
df = df.sort_values(['col1', 'col2'], ascending=[True, True])
Is it possible? if yes, how?
CodePudding user response:
You can simply do the following :
import modin.pandas as mpd
import pandas as pd
This way you have both modin as well as original pandas in memory and you can efficiently switch as per your need.
CodePudding user response:
You can try pandarallel package instead of modin , It is based on similar concept : https://pypi.org/project/pandarallel/#description
Pandarallel Benchmarks : https://libraries.io/pypi/pandarallel