Hello I have multiple csv files(a lot) that have same names (filename) but have a number at the end. For example I have 4 csv files have same filename and at the end of first file there is no extra number, but for the second file there is a (0) at the end, and for the third there is (1) at the end of the filename and so on.....
I am using pandas read to read the files in a for loop because I have a lot of files in a folder, and to sort them I am using sorted. The problem I have is it sorts the filename fine and the first file good too but I have issue when it has the a filename(0) at the end. It puts it at the last, I want to solve this because these individual files together have the data of a one big file and I am trying to concatenate them automatically. Everything works but the sorting order is not what I wanted and because of that I have same file concatenating(which is what I want) but in wrong order.
How can I rectify this. BTY after reading I am sorting files in a list and it sorts in the wrong order like this ['filename','filename1','filname2','filename0']. But I want it ['Filename','Filename0','Filename1','Filename2'] in this order.
I know the filenames in the list are strings, I have tried converting them to int and float but I have no success I get this value error (ValueError: invalid literal for int() with base 10:)
Any help would be greatly appreciated. I cannot upload code because it has a lot of functions and it is absolutely massive to find these bits it will take a very long time for me. Sorry about that.
CodePudding user response:
Use rsplit and sorted methods with a custom function that does some checking and serves as a key for the sort comparison.
You can try like this :
def function_work(x):
y = x.rsplit('.', 2)[-2]
return ('log' not in x, int(y) if y.isdigit() else float('inf'), x)
csvFiles = ['Filename5.csv', 'Filename0.csv', 'Filename1.csv', 'Filename.csv', 'Filename2.csv']
print(sorted(csvFiles, key=function_work, reverse=False))
#output : ['Filename.csv', 'Filename0.csv', 'Filename1.csv', 'Filename2.csv', 'Filename5.csv']
CodePudding user response:
The sorted
function takes an additional keyword argument called key
that tells it how to sort the items in the iterable
. This argument, key
, is a function that is expected to take each entry from the input iterable
and give it a "rank" or a "sort order" -
In your case, you'll need to define a key
function that will put the "no suffix" file before "0" -
lst = ['abc.csv', 'abc (0).csv', 'abc (1).csv']
filenames_split_lst = [_.rsplit('.', 1) for _ in lst]
# [['abc', 'csv'], ['abc (0)', 'csv'], ['abc (1)', 'csv']]
base_filenames = [_ for _, csv in filenames_split_lst]
# ['abc', 'abc (0)', 'abc (1)']
def sorting_function(base_filename):
if (len(base_filename.split()) == 1):
return 0
elif len(base_filename.split()) == 2:
number_suffix = base_filename.split()[1][1:-1]
return int(number_suffix) 1
sorted(base_filenames, key=sorting_function)
# ['abc', 'abc (0)', 'abc (1)']