I'm looking to extract the string of numbers that come after 'accession' in this Dataframe. My dataframe looks like this:
targets_list = pd.DataFrame(targets_df[['target_components', 'target_chembl_id']])
and the elements in each column of the target_components
looks like the following:
[{'accession': 'O43451', 'component_description': 'Maltase-glucoamylase, intestinal', 'component_id': 434, 'component_type': 'PROTEIN', 'relationship': 'SINGLE PROTEIN', 'target_component_synonyms',...}]
I would just like to extract the number code after 'accession'. As I thought it was the first element of the list, I tried to tgt = targets_list['target_components'][0][0]
, but this returns the first element of that list, but not the accession number.
I can see that it is a list that's in each row, but how to parse that list and get that number and add it to a column is what's missing for me. It should be possible with Regex maybe? But I'm not sure how Regex works at all.
CodePudding user response:
You can use the .findall() function or .extract() to get the id.
Refer to : Use regular expression to extract elements from a pandas data frame
CodePudding user response:
You can try this:
targets_list['target_components'].map(lambda x: x[0]["accession"])
CodePudding user response:
First there is no need to use pd.DataFrame
again to create dataframe from existing columns:
targets_list = targets_df[['target_components', 'target_chembl_id']]
Then you can use apply
to access the column element
tgt = targets_list['target_components'].apply(lambda x: x[0]['accession'])