One in my columns in my pandas DataFrame has very irregular expressions. I want to remove everything except the coordinates. However, I cannot just use the replace or remove function since the parts of what I want to remove are different in each column. Is there a way of picking just the part of the strings which I actually want to use?
One cell looks like this:
{'is_geometry': True, 'configuration': 'technologies', 'additional_translations': {}, 'key': 'Map', 'value': '{"type":"FeatureCollection","features":[{"type":"Feature","id":1549869006355,"geometry":{"type":"Point","coordinates":[67.91225703380735,34.69585762863356]},"properties":null}]}', 'map_url': '/en/technologies/view/technologies_1723/map/', 'template': 'raw'}
where the id and the map_url are always different. I would like to only have [67.91225703380735,34.69585762863356] in this example. Further, is there a way of turning the two values around in order that I have [34.69585762863356,67.91225703380735] instead?
CodePudding user response:
I'm not sure exactly what you want, but assuming your dataframe's column contains dicts that are like your example, this should work:
import ast
import json
df['nums'] = df.loc[df['tech_map'].notna(), 'tech_map'].astype(str).apply(ast.literal_eval).str['value'].apply(json.loads).str['features'].str[0].str['geometry'].str['coordinates'].str[::-1]
Two notes:
- The above is basically equivalent to doing json.loads(row['value'])['features'][0]['geometry']['coordinates'][::-1]
for each row
- [::-1]
reverse a list