Home > Blockchain >  pattern doesn't match in haskell bytestring
pattern doesn't match in haskell bytestring

Time:12-15

I am writing a DNA a translator using Haskell (bytestrings in particular). I have the following code:

import Data.Maybe
import Data.Monoid ((<>))
import System.Environment

import qualified Data.ByteString as B
import Data.ByteString.Lazy.Char8 (ByteString, singleton, splitWith)
import qualified Data.ByteString.Lazy as LB
  
-- Extract DNA sequence from fasta file
xtractDNA :: [ByteString] -> Maybe ByteString
xtractDNA dna = Just (LB.concat dna)
--xtractDNA = foldr ((<>) . Just) Nothing 

-- Reverse Complement DNA
compStrand :: Maybe ByteString -> Maybe ByteString
compStrand = foldr ((<>) . compPairs) Nothing
  where
    compPairs nt | nt == (singleton 'A') = Just (singleton 'T')
                 | nt == (singleton 'T') = Just (singleton 'A')
                 | nt == (singleton 'G') = Just (singleton 'C')
                 | nt == (singleton 'C') = Just (singleton 'G')
                 | otherwise = Nothing


main :: IO ()
main = do
  putStrLn "Welcome to volcano"
  let fname = "/home/russellb/Development/hs_devel/local_data/shbg.fasta"
  fid <- LB.readFile fname
  let dna = LB.concat $ tail (LB.splitWith (==10) fid)
  --putStrLn $ show (LB.length (head dna))
  let dsDna = compStrand (Just dna)
  print dsDna

When I execute I get Nothing as answer. Part of the input is

"AATTCTCCATGTGCTTGGATCGTGGGGAAGATGTGATTAAGGTCTAAGGTATGTCTTCCACCAGACAACGGACACAGTCAATTAGAAGCTGGGTAAAGGGGTCTCTCCTGCGGAGCGGGGAGCGCCAAGCCAGGGACAATAATGGCCTGAAGTTCATTCTCCCGGAGATGGGGGTAGAAGCAGGTGCAGGTGCCTTAGAGGGGTCAAAAATAAGAGGAACAGGGTTCACTCTAAGCGGTCTCCCAGGGAAGGCTGCGGGTTGGAGCAAGGGTCCAAGATTCTAAGGGCCAGGACTCAGCTCCAGAAGCTCGATCCCGCCCCACGCGTTCCTGCTCCGGCCAGGGGAGGGGGCTAAGGACCGGCGTCCCCAGTCGGCGCGCCGTCTCACCTTGTAGAAGGCCCCGTTGGAGCCGCGCACCTCGACGGGCAGTCCCGGCTCCACATCCCCCCCAGAGGCCAGGCCGCCCATGGCGCCGCCACCGCCTCCGACTCCCCCGGCGGCGGCTGCAGCAGCAGTCTGAGTGCGGGCCGGGCCAGGCCCCCGGCGTCTCCCCGGAGGAGGAGCCGGAGGGGGAGCCGCGGGGGGCGGGAGCCGGGCCGGCCCCACGGCGGCCCTGCCACAGCCAACGAGCAGGGGGCCGGGGCCGGGCCGCTCCCCGTCCGCCGCCGCCGCCTTGGTCTCCGCC...ACAAGGTCAGAGGCTGGATGTGGACCAGGCCCTGAACAGAAGCCATGAGATCTGGACTCACAGCTGCCCCCAGAGCCCAGGCAATGGCACTGACGCTTCCCATTAAAGCTCCACCTAAGAACCCCC"

I doubt is that my pattern matching gaurds has some problem. How can I figure that out and solve this issue? Any insights would be much appreciated

CodePudding user response:

compStrand is a very strange function. Why does it take a Maybe ByteString instead of an actual ByteString when you only pass it a Just? Why does it do this wacky foldr ((<>) . compPairs) Nothing thing which is just a very hard-to-read reimplementation of (>>= compPairs)?

main is not much clearer: If your input is GATC strings, why are you splitting on the first byte equal to 10? Why doesn't it use this xtractDNA function you wrote for it?

Resolving these questions will lead to a simpler function that you can debug more easily on your own. But I will note that it seems like your compStrand function operates only on singleton strings, and yet you seem to be passing it a string of unknown size.

CodePudding user response:

You are using foldr with as Foldable the Maybe, not the ByteString. It will thus inspect the Maybe a. In case it is a Just it will call comPairs with the entire ByteString of DNA, otherwise it will return Nothing.

Your comPairs will return Nothing for any ByteString that is empty or has two or more bytes, hence it returns Nothing.

You can work with a mapM :: Monad m => (a -> m b) -> [a] -> m [b] to construct a Maybe [Word8] and then convert it back to a ByteString:

import Data.ByteString.Lazy.Char8 (ByteString, pack, unpack)

compStrand :: Maybe ByteString -> Maybe ByteString
compStrand = (>>= fmap pack . mapM comPairs . unpack)
    where comPairs 'A' = Just 'T'
          comPairs 'C' = Just 'G'
          comPairs 'G' = Just 'C'
          comPairs 'T' = Just 'A'
          comPairs _ = Nothing
  • Related