Home > Blockchain >  Stripping strings in order to sort rows in Dataframe
Stripping strings in order to sort rows in Dataframe

Time:02-16

I have a master CSV file that looks something like this -- it's only 1 column.

column_name
string1010string
string1013string
string1014string
string1015string
string1016string
string1018string
string1020string

Then I have a temporary CSV that I would like to keep track of separtely but also merge it with the master CSV and have it sorted in ascending order only taking the integers into consideration. I am aware that I have to strip the strings (from the start & the end) of each row to isolate the integers and then sort but I'm not quite sure how to approach it after that.

column_name
string1011string
string1012string
string1017string
string1019string

My function looks something like this:

def output_master_concatenated(list1, list2):
    master= pd.concat([list1, list2])
    #sorting_system
    master.to_csv(str('master'   '.csv'), index = False, sep = ' ')
    return master

Ideally, this is what I would like it to look like.

column_name
string1010string
string1011string
string1012string
string1013string
string1014string
string1015string
string1016string
string1017string
string1018string
string1019string
string1020string

update: string(integer)string is actually a link, each row is basically the same link with only the integer changing

CodePudding user response:

Use sort_values with a custom key:

df = pd.concat([df1, df2])

num_sort = lambda x: x.str.extract('(\d )', expand=False).str.zfill(10)
df = df.sort_values('column_name', key=num_sort, ignore_index=True)
print(df)

# Output
         column_name
0   string1010string
1   string1011string
2   string1012string
3   string1013string
4   string1014string
5   string1015string
6   string1016string
7   string1017string
8   string1018string
9   string1019string
10  string1020string
  • Related