I am trying to calculate specific string from column [donor, acceptor], I got the result but unable to find all numbers. I want to count the total number occurrence of residues (LYS8-Side, VAL2-Main,.. POPE4-Side, POPE4-Side, ... etc. ) from each column.
Input .dat file. https://iitbacin-my.sharepoint.com/:u:/g/personal/20002453_iitb_ac_in/ERCE5FrV6XBNsDneJd4aVuUBX5UVlVZvhh1kudi5vrUl0A?e=6slaUm
Output I am expecting-
Example:
donor Number of residues
ASN15- 5
VAL2 - 3
Same result expecting for acceptor column.
CODE I have written.
import pandas as pd
require_cols = [0,1,2,3]
# read by default 1st sheet of an excel file
df = pd.read_table('/home/user/Desktop/Inter_hbond_Peptide_resid_lipid_protein.dat', usecols = require_cols)
df = pd.DataFrame({'donor': ['LYS8-Side'], 'acceptor': ['POPE4-Main'], 'occupancy': [26.27]})
print(df)
# find count
AA_Count = df.query('donor=="LYS8-side" \
& acceptor=="POPE4-Main"')['acceptor'].count()
print('Number of donor-', end="")
print(AA_Count)
Thank You!
CodePudding user response:
The following code should help. The values of column 2 are converted according to the logic of the percent_to_float
function. As for other columns, no custom converter specified, for there are strings anyway.
import pandas as pd
def percent_to_float(s: str) -> float:
if s.endswith('%'):
return float(s.removesuffix('%')) / 100.0
else:
return float(s)
df = pd.read_table('DEMO.dat',
sep='\s ', # one or more spaces
converters={2: percent_to_float}
)
As for the expected output,
for residue in ('ASN15', 'VAL2', ):
print(residue,
df.query(f'donor=="{residue}-Side" | donor=="{residue}-Main"').donor.count(),
sep='\t- ')