Home > OS >  How to extract elements from a list in pandas through regex?
How to extract elements from a list in pandas through regex?

Time:05-01

I'm looking to extract the string of numbers that come after 'accession' in this Dataframe. My dataframe looks like this:

targets_list = pd.DataFrame(targets_df[['target_components', 'target_chembl_id']])

and the elements in each column of the target_components looks like the following:

[{'accession': 'O43451', 'component_description': 'Maltase-glucoamylase, intestinal', 'component_id': 434, 'component_type': 'PROTEIN', 'relationship': 'SINGLE PROTEIN', 'target_component_synonyms',...}]

I would just like to extract the number code after 'accession'. As I thought it was the first element of the list, I tried to tgt = targets_list['target_components'][0][0], but this returns the first element of that list, but not the accession number.

I can see that it is a list that's in each row, but how to parse that list and get that number and add it to a column is what's missing for me. It should be possible with Regex maybe? But I'm not sure how Regex works at all.

CodePudding user response:

You can use the .findall() function or .extract() to get the id.

Refer to : Use regular expression to extract elements from a pandas data frame

CodePudding user response:

You can try this:

targets_list['target_components'].map(lambda x: x[0]["accession"])

CodePudding user response:

First there is no need to use pd.DataFrame again to create dataframe from existing columns:

targets_list = targets_df[['target_components', 'target_chembl_id']]

Then you can use apply to access the column element

tgt = targets_list['target_components'].apply(lambda x: x[0]['accession'])
  • Related