structure of the dataframe-CodePudding

I have a data frame that contains a long list of entries.

here is an image of my dataframe

I've indexed them by protein accession numbers. The problem is, they are repeating because some of the proteins consist of multiple domains. I want to make the protein accession numbers the main entries (and it'd have information about how many domains it has - domain_count) and the domains of those proteins to be subentries. For example when I type:

df_filtered.loc['P43098_e', 'domain_count']

it returns the number 5 for each domain (5 times). I want it to print 5 only once since P43098_e would be the main entry to which information about domain_count is directly assigned. Could someone help me, please?

CodePudding user response：

Is that what you're looking for?

if you share the data as a code, I'll be able to hare the result too


df_filtered.loc[df_filtered['protein_accession'] == 'P43098_e', 'domain_count'][:1].values[0]

(df_filtered.loc[df_filtered['protein_accession'] == 'P43098_e', 'domain_count']
 .head(1)
 .squeeze())

(df_filtered[df_filtered['protein_accession'] == 'P43098_e']['domain_count']
 .head(1)
 .squeeze())

Either you need to reset_index() on df_fitered and run above solutions OR add reset_filtered within the statement, like

(df_filtered.reset_index()[df_filtered.reset_index()['protein_accession'] == 'P43098_e']['domain_count']
 .head(1)
 .squeeze())