Home > Software design >  How do i sort the numeric values within a cell in Pandas series
How do i sort the numeric values within a cell in Pandas series

Time:05-02

I have a panda series/column where values are like -

Values
101;1001
130;125
113;99
1001;101

I need to sort the values within the cell with an expected outcome like below using python as the dataframe is large with more than 5 million values so any faster way would be appreciated.

Values
101;1001
125;130
99;113
101;1001

CodePudding user response:

Here's one way using str.split explode sort_values to sort the values; then groupby.agg to join them back.

Note that I assumed no integer is more than 10 digits long, so I used str.zfill(10) to pad '0's for sorting.

df['Values_new'] = df['Values'].str.split(';').explode().sort_values(key=lambda x: x.str.zfill(10)).groupby(level=0).agg(';'.join)

Output:

     Values Values_new
0  101;1001   101;1001
1   130;125    125;130
2    113;99     99;113
3  1001;101   101;1001

CodePudding user response:

Convert splitted values to integers, sorting, convert back to strings and join:

df['Values'] = df['Values'].apply(lambda x: ';'.join(map(str, sorted(map(int, x.split(';'))))))

Performance:

#10k rows
df = pd.concat([df] * 10000, ignore_index=True)

#enke solution
In [52]: %timeit df['Values'].str.split(';').explode().sort_values(key=lambda x: x.str.zfill(10)).groupby(level=0).agg(';'.join)
616 ms ± 6.03 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [53]: %timeit df['Values'].apply(lambda x: ';'.join(map(str, sorted(map(int, x.split(';'))))))
70.7 ms ± 420 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)

Solution for 2 columns:

df1 = df['Values'].str.split(';', expand=True).astype(int)
df1 = pd.DataFrame(np.sort(df1, axis=1), index=df1.index, columns=df1.columns)
print (df1)
     0     1
0  101  1001
1  125   130
2   99   113
3  101  1001

CodePudding user response:

You can use a list comprehension, this will be faster on small datasets:

df['Values'] = [';'.join(map(str, sorted(map(int, x.split(';'))))) for x in df['Values']]

output:

     Values
0  101;1001
1   125;130
2    99;113
3  101;1001

timings:

timings

For two columns:

df2 = pd.DataFrame([sorted(map(int, x.split(';'))) for x in df['Values']])

output:

     0     1
0  101  1001
1  125   130
2   99   113
3  101  1001
  • Related