Home > Software design >  How to print row number inside apply function after a criteria
How to print row number inside apply function after a criteria

Time:05-27

I have a dataframe like as below

Company,year                                   
T123 Inc Ltd,1990
T124 PVT ltd,1991
T345 Ltd,1990
T789 Pvt.LTd,2001
ABC Limited,1992
ABCDE Ltd,1994
ABC Ltd,1997
ABFE,1987
Tesla ltd,1995
AMAZON Inc,2001
Apple ltd,2003

compare = pd.MultiIndex.from_product([tf['Company'].astype(str),tf['Company'].astype(str)]).to_series()
compare = compare[compare.index.get_level_values(0) != compare.index.get_level_values(1)]

my data looks like below

enter image description here

I would like to do the below

a) print the index number (i) at the end of comparison of each input key.for ex: T123 Inc Ltd is compared with ten other strings. So, once it is done and moves to a new/next string/input_key which is T124 PVT ltd, I want the value of i to be incremented by 1 and shown using print function.

I tried the below

def metrics(tup):
print(compare.loc[tup]) # doesn't work. I want the index number 
return list(tup)

I don't know how to increment/iterate within apply function

I expect my output of print statement to be like as below. Print statement should be executed only after each input key's comparison is over.

comparison of 1st input key with 10 strings is done
comparison of 2nd input key with 10 strings is done

update - code

ls = []
ks=[]
for i, k in enumerate(compare, 1):
    ls.append(metrics(k))
    ks.append(k)
    print(f'comparison of {i} input key: {k} with {len(ls)} strings is done')
pd.concat(pd.DataFrame(ks),pd.DataFrame(ls)) # error
pd.concat(ks,ls)  # error

metrics

def metrics(tup):
    return pd.Series([fuzz.ratio(*tup),
                      fuzz.token_sort_ratio(*tup),
                      fuzz.token_set_ratio(*tup),
                      fuzz.QRatio(*tup),
                      fuzz.UQRatio(*tup),
                      fuzz.UWRatio(*tup)],
                     ['ratio', 'token','set','qr','uqr','uwr'])

CodePudding user response:

Let us try enumerating over the unique values in level 0 index:

groups = []
grouper = compare.groupby(level=0, sort=False)

for i, (k, g) in enumerate(grouper, 1):
    # Execute statements here
    groups.append(g.apply(metrics))
    print(f'comparison of {i} input key: {k} with {len(g)} strings is done')

df_out = pd.concat(groups)

comparison of 1 input key: T123 Inc Ltd with 10 strings is done
comparison of 2 input key: T124 PVT ltd with 10 strings is done
...
comparison of 11 input key: Apple ltd with 10 strings is done
  • Related