Given a class that export .csv files to a database:
import luigi
import csv
class CsvToDatabase(luigi.Task):
# (...)
def run(self):
## (...)
with open(self.input().some_attribute, 'r', encoding='utf-8') as some_dataframe:
y = csv.reader(some_dataframe, delimiter=';')
### (...) <several lines of code>
# (...)
I'm having problems trying to export a file with ISO-8859-1 encoding.
When I exclude the encoding
argument from then open()
function, everything works fine, but I cannot make permanent changes in the class definition (firm's other sectors uses it). So I thinked about the possibility of using polymorphism to solve it, like:
from script_of_interest import CsvToDatabase
class LatinCsvToDatabase(CsvToDatabase):
# code that uses everything in `run()` except the `some_dataframe` definition in the with statement
This possibility actually exists? How could I handle it without repeating the "several lines of code" inside the statement?
CodePudding user response:
Thank you @martineau and @buran for the comments. Based on them, I will request a change in the base class definition that didn't affect other sector's work. It would look like this:
import luigi
import csv
class CsvToDatabase(luigi.Task):
# (...)
encoding_param = luigi.Parameter(default='utf-8') # as a class attribute
# (...)
def run(self):
## (...)
with open(self.input().some_attribute, 'r', encoding=self.encoding_param) as some_dataframe:
y = csv.reader(some_dataframe, delimiter=';')
### (...) <several lines of code>
# (...)
And finally, in my script, something like:
from script_of_interest import CsvToDatabase
class LatinCsvToDatabase(CsvToDatabase):
pass
LatinCsvToDatabase.encoding_param = None
CodePudding user response:
You might consider as an alternative modifying the original class to add a new method get_cvs_encoding
, which is used by the run
method:
class CsvToDatabase(luigi.Task):
...
def get_cvs_encoding(self):
# default:
return 'utf-8'
def run(self):
## (...)
with open(self.input().some_attribute, 'r', encoding=self.get_cvs_encoding()) as some_dataframe:
y = csv.reader(some_dataframe, delimiter=';')
...
}
And then subclass this as follows:
class MyCsvToDatabase(CsvToDatabase):
def get_cvs_encoding(self):
return 'ISO-8859-1' # or None
And use an instance of the subclass. I just think this is neater and you can have multiple subclass instances "running" concurrently.