I have a master CSV file that looks something like this -- it's only 1 column.
column_name
string1010string
string1013string
string1014string
string1015string
string1016string
string1018string
string1020string
Then I have a temporary CSV that I would like to keep track of separtely but also merge it with the master CSV and have it sorted in ascending order only taking the integers into consideration. I am aware that I have to strip the strings (from the start & the end) of each row to isolate the integers and then sort but I'm not quite sure how to approach it after that.
column_name
string1011string
string1012string
string1017string
string1019string
My function looks something like this:
def output_master_concatenated(list1, list2):
master= pd.concat([list1, list2])
#sorting_system
master.to_csv(str('master' '.csv'), index = False, sep = ' ')
return master
Ideally, this is what I would like it to look like.
column_name
string1010string
string1011string
string1012string
string1013string
string1014string
string1015string
string1016string
string1017string
string1018string
string1019string
string1020string
update: string(integer)string is actually a link, each row is basically the same link with only the integer changing
CodePudding user response:
Use sort_values
with a custom key:
df = pd.concat([df1, df2])
num_sort = lambda x: x.str.extract('(\d )', expand=False).str.zfill(10)
df = df.sort_values('column_name', key=num_sort, ignore_index=True)
print(df)
# Output
column_name
0 string1010string
1 string1011string
2 string1012string
3 string1013string
4 string1014string
5 string1015string
6 string1016string
7 string1017string
8 string1018string
9 string1019string
10 string1020string