Home > Software engineering >  How to replace or remove parts from columns in pandas DataFrame which are irregular?
How to replace or remove parts from columns in pandas DataFrame which are irregular?

Time:03-14

One in my columns in my pandas DataFrame has very irregular expressions. I want to remove everything except the coordinates. However, I cannot just use the replace or remove function since the parts of what I want to remove are different in each column. Is there a way of picking just the part of the strings which I actually want to use?

One cell looks like this:

 {'is_geometry': True, 'configuration': 'technologies', 'additional_translations': {}, 'key': 'Map', 'value': '{"type":"FeatureCollection","features":[{"type":"Feature","id":1549869006355,"geometry":{"type":"Point","coordinates":[67.91225703380735,34.69585762863356]},"properties":null}]}', 'map_url': '/en/technologies/view/technologies_1723/map/', 'template': 'raw'}

where the id and the map_url are always different. I would like to only have [67.91225703380735,34.69585762863356] in this example. Further, is there a way of turning the two values around in order that I have [34.69585762863356,67.91225703380735] instead?

CodePudding user response:

I'm not sure exactly what you want, but assuming your dataframe's column contains dicts that are like your example, this should work:

import ast
import json
df['nums'] = df.loc[df['tech_map'].notna(), 'tech_map'].astype(str).apply(ast.literal_eval).str['value'].apply(json.loads).str['features'].str[0].str['geometry'].str['coordinates'].str[::-1]

Two notes: - The above is basically equivalent to doing json.loads(row['value'])['features'][0]['geometry']['coordinates'][::-1] for each row - [::-1] reverse a list

  • Related