I am using click to create a command line tool that performs some data preprocessing. Until now, I have basically survived using click.option() as flag with some if statements in my code so that I can choose the options I want. However, I am struggling to find an elegant way to solve my next issue. Since I believe the general code structure does not depend on my purposes I will try to be as general as possible without getting into details of what goes into the main code.
I have a list of elements my_list
that I want to loop over and apply some very long code after each iteration. However, I want this to be dependent on a flag (via click, as I said). The general structure would be like this:
@click.option('--myflag', is_flag=True)
#just used click.option to point out that I use click but it is just a boolean
if myflag:
for i in my_list:
print('Function that depends on i')
print('Here goes the rest of the code')
else:
print('Function independent on i')
print('Here goes the rest of the code')
My issue is that I wouldn't like to copy paste twice the rest of the code in the above structure (it is a long code and hard to integrate into a function). Is there a way to do that? That is, is there a way to tell python: "If myflag==True
, run the full code while looping into mylist
. Otherwise, just go to the full code. All of that without having to duplicate the code.
EDIT: I believe it might actually be useful to go a bit more specific.
What I have is :
mylist=['washington','la','houston']
if myflag:
for i in my_list:
train,test = full_data[full_data.city!=i],full_data[full_data.city==i]
print('CODE:Clean,tokenize,train,performance')
else:
def train_test_split2(df, frac=0.2):
# get random sample
test = df.sample(frac=frac, axis=0,random_state=123)
# get everything but the test sample
train = df.drop(index=test.index)
return train, test
train, test = train_test_split2(full_data[['tidy_tweet', 'classify']])
print('CODE: Clean,tokenize,train,performance')
full_data
is a pandas data frame that contains text and classification. Whenever I set my_flag=True
, I pretend the code to train some models an test performance when leaving some city as the test data. Hence the loop gives me an overview on how does my model perform on different cities (some sort of GroupKfold loop).
Under the second option, my_flag=False
, there is a random test-train split and the training is only performed once.
It is the CODE part that I don't want to duplicate.
I hope this helps the previous intuition.
CodePudding user response:
What do you mean by "hard to integrate into a function"? If a function is an option, just use the following code.
def rest_of_the_code(i):
...
if myflag:
for i in my_list:
print('Function that depends on i')
rest_of_the_code(i)
else:
print('Function independent on i')
rest_of_the_code(0)
Otherwise you could do something like this:
if not myflag:
my_list = [0] # if you even need to initialize it, not sure where my_list comes from
for i in my_list:
print('Function that may depend on i')
print('Here goes the rest of the code')
EDIT to answer regarding your clarification: You could use a list which is iterated.
mylist=['washington','la','houston']
list_of_dataframes = []
if myflag:
for i in my_list:
train,test = full_data[full_data.city!=i],full_data[full_data.city==i]
list_of_dataframes.append( (train, test) )
else:
train, test = train_test_split2(full_data[['tidy_tweet', 'classify']])
list_of_dataframes.append( (train, test) )
for train, test in list_of_dataframes:
print('CODE: Clean,tokenize,train,performance')