How to count specific string from coulmn (.dat file)?-CodePudding

I am trying to calculate specific string from column [donor, acceptor], I got the result but unable to find all numbers. I want to count the total number occurrence of residues (LYS8-Side, VAL2-Main,.. POPE4-Side, POPE4-Side, ... etc. ) from each column.

Input .dat file. https://iitbacin-my.sharepoint.com/:u:/g/personal/20002453_iitb_ac_in/ERCE5FrV6XBNsDneJd4aVuUBX5UVlVZvhh1kudi5vrUl0A?e=6slaUm

Output I am expecting-

Example:

donor Number of residues
ASN15- 5
VAL2 - 3

Same result expecting for acceptor column.

CODE I have written.

import pandas as pd

require_cols = [0,1,2,3]
# read by default 1st sheet of an excel file
df = pd.read_table('/home/user/Desktop/Inter_hbond_Peptide_resid_lipid_protein.dat', usecols = require_cols)
df = pd.DataFrame({'donor': ['LYS8-Side'], 'acceptor': ['POPE4-Main'], 'occupancy': [26.27]})
print(df)

# find count 
AA_Count = df.query('donor=="LYS8-side" \
& acceptor=="POPE4-Main"')['acceptor'].count()

print('Number of donor-', end="")
print(AA_Count)

Thank You!

CodePudding user response：

The following code should help. The values of column 2 are converted according to the logic of the percent_to_float function. As for other columns, no custom converter specified, for there are strings anyway.

import pandas as pd

def percent_to_float(s: str) -> float:
    if s.endswith('%'):
        return float(s.removesuffix('%')) / 100.0
    else:
        return float(s)

df = pd.read_table('DEMO.dat',
                   sep='\s ',  # one or more spaces
                   converters={2: percent_to_float}
                  )

As for the expected output,

for residue in ('ASN15', 'VAL2', ):
    print(residue,
          df.query(f'donor=="{residue}-Side" | donor=="{residue}-Main"').donor.count(),
          sep='\t- ')