Adding one extra column in a python file depending on the value of another column-CodePudding

I have a file say temp.rule which has say m rows and n columns where each row looks like att1,att2,att3,...attN,class,fitness. Suppose my file looks something like below:

A,B,C,1,0.67
D,E,F,1,0.84
P,Q,R,2,0.77
S,T,U,2,0.51
G,H,I,1,0.45
J,K,L,1,0.82
M,N,O,2,0.28
V,W,X,2,0.41
Y,Z,A,2,0.51

Where for the 1st row, A,B,C are the attributes and 1 is the class and 0.67 is the fitness. Now I want to sort the rows according to the fitness within each class and want to assign rank. So after this my file will look something like:

P,Q,R,2,0.77,5
S,T,U,2,0.51,3.5
Y,Z,A,2,0.51,3.5
V,W,X,2,0.41,2
M,N,O,2,0.28,1
D,E,F,1,0.84,4
J,K,L,1,0.82,3
A,B,C,1,0.67,2
G,H,I,1,0.45,1

With in class 2 as there are 5 rows so they are sorted according to fitness and rank is assigned from 1 to 5 and same goes for class 1 i.e as there are 4 rows so they are sorted according to fitness and rank is assigned from 1 to 4. I have done the sorting part but unable to assign the rank like this. I have also created the dictionary to keep a count of how many class 1 and class 2 and so on. And the 3.5 is there because in case of a tie I want to take the average of the consecutive ranks.

Below I am giving my try:

rule_file_name = 'temp.rule'
rule_fp = open(rule_file_name)

rule_fit_val = []
for line in rule_fp.readlines():
    rule_fit_val.append(line.replace("\n","").split(","))
            
def convert_fitness_to_float(lst):            
    return lst[:-1]   [float(lst[-1])]
rule_fit_val =[convert_fitness_to_float(i) for i in rule_fit_val]
rule_fit_val = sorted(rule_fit_val, key=lambda x: x[-2:], reverse=True)


item_list = []
for i in rule_fit_val:
    i = list(map(str, i))
    s = ','.join(i).replace("\n","")
    item_list.append(s)
print(*item_list,sep='\n')

with open("check_sorted_fitness.rule", "w") as outfile:
    outfile.write("\n".join(item_list))
 
list1=[]   
for i in rule_fit_val:
    list1.append(i[-2])

freq = {}
for items in list1:
    freq[items] = list1.count(items)
my_dict_new = {k:v for k,v in freq.items()}

print(my_dict_new)

Please help me out saying how I can assign rank like that.

CodePudding user response：

consider using pandas module, then you can get something like this:

import pandas as pd

df = pd.read_csv('temp.rule', names=['att1','att2','att3','class','fitness'])
#-----------------^^^^^^^^^ your file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ column headers
>>> df
'''
  att1 att2 att3  class  fitness
0    A    B    C      1     0.67
1    D    E    F      1     0.84
2    P    Q    R      2     0.77
3    S    T    U      2     0.51
4    G    H    I      1     0.45
5    J    K    L      1     0.82
6    M    N    O      2     0.28
7    V    W    X      2     0.41
8    Y    Z    A      2     0.51
'''
out = (df.assign(rank=df.groupby('class')['fitness'].
                 transform(lambda x: x.rank())).
       sort_values(['class','fitness'], ascending=False))

>>> out
'''
  att1 att2 att3  class  fitness  rank
2    P    Q    R      2     0.77   5.0
3    S    T    U      2     0.51   3.5
8    Y    Z    A      2     0.51   3.5
7    V    W    X      2     0.41   2.0
6    M    N    O      2     0.28   1.0
1    D    E    F      1     0.84   4.0
5    J    K    L      1     0.82   3.0
0    A    B    C      1     0.67   2.0
4    G    H    I      1     0.45   1.0
'''
out.to_csv('out.rule', header=False, index=False)
#-----------^^^^^^^^ new file
>>> out.rule
'''
P,Q,R,2,0.77,5.0
S,T,U,2,0.51,3.5
Y,Z,A,2,0.51,3.5
V,W,X,2,0.41,2.0
M,N,O,2,0.28,1.0
D,E,F,1,0.84,4.0
J,K,L,1,0.82,3.0
A,B,C,1,0.67,2.0
G,H,I,1,0.45,1.0

UPD

now it does not matter how many columns are in your file if two last columns supposed to be 'class' and 'fitness' respectively:

import pandas as pd

df = pd.read_csv('temp.rule', header=None)
df = df.rename(columns={df.columns[-1]:'fitness',df.columns[-2]:'class'})
out = (df.assign(rank=df.groupby('class')['fitness'].
                 transform(lambda x: x.rank())).
       sort_values(['class','fitness'],ascending=False))
out.to_csv('out.rule',header=False,index=False)