Home > OS >  Splitting a column of a Pandas dataframe using multiple conditions
Splitting a column of a Pandas dataframe using multiple conditions

Time:12-13

Let's say I have this column in Pandas:

df['GPS'][0]: 0 '39.21,38.6;39.23,38.68;39.26,38.68'

I would like to split the column into:

Xcoord1 Ycoord1 Xcoord2,Ycoord2, Xcoord3, Ycoord3
37.21    38.6    37.23   38.68    37.26    38.68

My approach is to first split the column using:

df['GPS_list']=df['GPS'].apply(lambda x: x.split(';'))

df['GPS_list']:

['39.21,38.6','39.23,38.68','39.26,38.68']

Now I would need to split based on , to separate the x and y value which I am not sure how to do for each row. Can I use apply function work here?

Next, I would need to convert each of those values from string to int.

Is there an elegant way to do this in a step or two?

I am new to Python and Pandas so any nudge in the right direction is helpful.

CodePudding user response:

Example

df = pd.DataFrame(['39.21,38.6;39.23,38.68;39.26,38.68'], columns=['GPS'])

df

    GPS
0   39.21,38.6;39.23,38.68;39.26,38.68

Code

col1 = ['Xcoord1', 'Ycoord1', 'Xcoord2', 'Ycoord2', 'Xcoord3', 'Ycoord3']
df['GPS'].str.split(r'[,;]', expand=True).set_axis(col1, axis=1)

result:

    Xcoord1 Ycoord1 Xcoord2 Ycoord2 Xcoord3 Ycoord3
0   39.21   38.6    39.23   38.68   39.26   38.68

CodePudding user response:

It If you always have the same number of coordinates, a simple str.split will work:

out = (df['GPS'].str.split('[,;]', expand=True)
       .set_axis(['Xcoord1', 'Ycoord1',  'Xcoord2', 'Ycoord2', 'Xcoord3', 'Ycoord3'], axis=1)
      )

If you have an arbitrary number of pairs, you can use:

out = (df['GPS'].str.split(';', expand=True).stack()
       .str.split(',', expand=True)
       .set_axis(['Xcoord', 'Ycoord'], axis=1).unstack()
       .sort_index(level=1, axis=1)
      )

out.columns = out.columns.map(lambda x: f'{x[0]}{x[1] 1}')

Output:

  Xcoord1 Ycoord1 Xcoord2 Ycoord2 Xcoord3 Ycoord3
0   39.21    38.6   39.23   38.68   39.26   38.68
  • Related