I am writing a DNA a translator using Haskell (bytestrings in particular). I have the following code:
import Data.Maybe
import Data.Monoid ((<>))
import System.Environment
import qualified Data.ByteString as B
import Data.ByteString.Lazy.Char8 (ByteString, singleton, splitWith)
import qualified Data.ByteString.Lazy as LB
-- Extract DNA sequence from fasta file
xtractDNA :: [ByteString] -> Maybe ByteString
xtractDNA dna = Just (LB.concat dna)
--xtractDNA = foldr ((<>) . Just) Nothing
-- Reverse Complement DNA
compStrand :: Maybe ByteString -> Maybe ByteString
compStrand = foldr ((<>) . compPairs) Nothing
where
compPairs nt | nt == (singleton 'A') = Just (singleton 'T')
| nt == (singleton 'T') = Just (singleton 'A')
| nt == (singleton 'G') = Just (singleton 'C')
| nt == (singleton 'C') = Just (singleton 'G')
| otherwise = Nothing
main :: IO ()
main = do
putStrLn "Welcome to volcano"
let fname = "/home/russellb/Development/hs_devel/local_data/shbg.fasta"
fid <- LB.readFile fname
let dna = LB.concat $ tail (LB.splitWith (==10) fid)
--putStrLn $ show (LB.length (head dna))
let dsDna = compStrand (Just dna)
print dsDna
When I execute I get Nothing
as answer. Part of the input is
"AATTCTCCATGTGCTTGGATCGTGGGGAAGATGTGATTAAGGTCTAAGGTATGTCTTCCACCAGACAACGGACACAGTCAATTAGAAGCTGGGTAAAGGGGTCTCTCCTGCGGAGCGGGGAGCGCCAAGCCAGGGACAATAATGGCCTGAAGTTCATTCTCCCGGAGATGGGGGTAGAAGCAGGTGCAGGTGCCTTAGAGGGGTCAAAAATAAGAGGAACAGGGTTCACTCTAAGCGGTCTCCCAGGGAAGGCTGCGGGTTGGAGCAAGGGTCCAAGATTCTAAGGGCCAGGACTCAGCTCCAGAAGCTCGATCCCGCCCCACGCGTTCCTGCTCCGGCCAGGGGAGGGGGCTAAGGACCGGCGTCCCCAGTCGGCGCGCCGTCTCACCTTGTAGAAGGCCCCGTTGGAGCCGCGCACCTCGACGGGCAGTCCCGGCTCCACATCCCCCCCAGAGGCCAGGCCGCCCATGGCGCCGCCACCGCCTCCGACTCCCCCGGCGGCGGCTGCAGCAGCAGTCTGAGTGCGGGCCGGGCCAGGCCCCCGGCGTCTCCCCGGAGGAGGAGCCGGAGGGGGAGCCGCGGGGGGCGGGAGCCGGGCCGGCCCCACGGCGGCCCTGCCACAGCCAACGAGCAGGGGGCCGGGGCCGGGCCGCTCCCCGTCCGCCGCCGCCGCCTTGGTCTCCGCC...ACAAGGTCAGAGGCTGGATGTGGACCAGGCCCTGAACAGAAGCCATGAGATCTGGACTCACAGCTGCCCCCAGAGCCCAGGCAATGGCACTGACGCTTCCCATTAAAGCTCCACCTAAGAACCCCC"
I doubt is that my pattern matching gaurds has some problem. How can I figure that out and solve this issue? Any insights would be much appreciated
CodePudding user response:
compStrand
is a very strange function. Why does it take a Maybe ByteString instead of an actual ByteString when you only pass it a Just? Why does it do this wacky foldr ((<>) . compPairs) Nothing
thing which is just a very hard-to-read reimplementation of (>>= compPairs)
?
main
is not much clearer: If your input is GATC strings, why are you splitting on the first byte equal to 10? Why doesn't it use this xtractDNA
function you wrote for it?
Resolving these questions will lead to a simpler function that you can debug more easily on your own. But I will note that it seems like your compStrand
function operates only on singleton strings, and yet you seem to be passing it a string of unknown size.
CodePudding user response:
You are using foldr
with as Foldable
the Maybe
, not the ByteString
. It will thus inspect the Maybe a
. In case it is a Just
it will call comPairs
with the entire ByteString
of DNA, otherwise it will return Nothing
.
Your comPairs
will return Nothing
for any ByteString
that is empty or has two or more bytes, hence it returns Nothing
.
You can work with a mapM :: Monad m => (a -> m b) -> [a] -> m [b]
to construct a Maybe [Word8]
and then convert it back to a ByteString
:
import Data.ByteString.Lazy.Char8 (ByteString, pack, unpack)
compStrand :: Maybe ByteString -> Maybe ByteString
compStrand = (>>= fmap pack . mapM comPairs . unpack)
where comPairs 'A' = Just 'T'
comPairs 'C' = Just 'G'
comPairs 'G' = Just 'C'
comPairs 'T' = Just 'A'
comPairs _ = Nothing