I have a file say temp.rule
which has say m
rows and n
columns where each row looks like att1,att2,att3,...attN,class,fitness
. Suppose my file looks something like below:
A,B,C,1,0.67
D,E,F,1,0.84
P,Q,R,2,0.77
S,T,U,2,0.51
G,H,I,1,0.45
J,K,L,1,0.82
M,N,O,2,0.28
V,W,X,2,0.41
Y,Z,A,2,0.51
Where for the 1st row, A,B,C are the attributes and 1 is the class and 0.67 is the fitness. Now I want to sort the rows according to the fitness within each class and want to assign rank. So after this my file will look something like:
P,Q,R,2,0.77,5
S,T,U,2,0.51,3.5
Y,Z,A,2,0.51,3.5
V,W,X,2,0.41,2
M,N,O,2,0.28,1
D,E,F,1,0.84,4
J,K,L,1,0.82,3
A,B,C,1,0.67,2
G,H,I,1,0.45,1
With in class 2 as there are 5 rows so they are sorted according to fitness and rank is assigned from 1 to 5 and same goes for class 1 i.e as there are 4 rows so they are sorted according to fitness and rank is assigned from 1 to 4. I have done the sorting part but unable to assign the rank like this. I have also created the dictionary to keep a count of how many class 1 and class 2 and so on. And the 3.5 is there because in case of a tie I want to take the average of the consecutive ranks.
Below I am giving my try:
rule_file_name = 'temp.rule'
rule_fp = open(rule_file_name)
rule_fit_val = []
for line in rule_fp.readlines():
rule_fit_val.append(line.replace("\n","").split(","))
def convert_fitness_to_float(lst):
return lst[:-1] [float(lst[-1])]
rule_fit_val =[convert_fitness_to_float(i) for i in rule_fit_val]
rule_fit_val = sorted(rule_fit_val, key=lambda x: x[-2:], reverse=True)
item_list = []
for i in rule_fit_val:
i = list(map(str, i))
s = ','.join(i).replace("\n","")
item_list.append(s)
print(*item_list,sep='\n')
with open("check_sorted_fitness.rule", "w") as outfile:
outfile.write("\n".join(item_list))
list1=[]
for i in rule_fit_val:
list1.append(i[-2])
freq = {}
for items in list1:
freq[items] = list1.count(items)
my_dict_new = {k:v for k,v in freq.items()}
print(my_dict_new)
Please help me out saying how I can assign rank like that.
CodePudding user response:
consider using pandas module, then you can get something like this:
import pandas as pd
df = pd.read_csv('temp.rule', names=['att1','att2','att3','class','fitness'])
#-----------------^^^^^^^^^ your file ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ column headers
>>> df
'''
att1 att2 att3 class fitness
0 A B C 1 0.67
1 D E F 1 0.84
2 P Q R 2 0.77
3 S T U 2 0.51
4 G H I 1 0.45
5 J K L 1 0.82
6 M N O 2 0.28
7 V W X 2 0.41
8 Y Z A 2 0.51
'''
out = (df.assign(rank=df.groupby('class')['fitness'].
transform(lambda x: x.rank())).
sort_values(['class','fitness'], ascending=False))
>>> out
'''
att1 att2 att3 class fitness rank
2 P Q R 2 0.77 5.0
3 S T U 2 0.51 3.5
8 Y Z A 2 0.51 3.5
7 V W X 2 0.41 2.0
6 M N O 2 0.28 1.0
1 D E F 1 0.84 4.0
5 J K L 1 0.82 3.0
0 A B C 1 0.67 2.0
4 G H I 1 0.45 1.0
'''
out.to_csv('out.rule', header=False, index=False)
#-----------^^^^^^^^ new file
>>> out.rule
'''
P,Q,R,2,0.77,5.0
S,T,U,2,0.51,3.5
Y,Z,A,2,0.51,3.5
V,W,X,2,0.41,2.0
M,N,O,2,0.28,1.0
D,E,F,1,0.84,4.0
J,K,L,1,0.82,3.0
A,B,C,1,0.67,2.0
G,H,I,1,0.45,1.0
UPD
now it does not matter how many columns are in your file if two last columns supposed to be 'class' and 'fitness' respectively:
import pandas as pd
df = pd.read_csv('temp.rule', header=None)
df = df.rename(columns={df.columns[-1]:'fitness',df.columns[-2]:'class'})
out = (df.assign(rank=df.groupby('class')['fitness'].
transform(lambda x: x.rank())).
sort_values(['class','fitness'],ascending=False))
out.to_csv('out.rule',header=False,index=False)