Menu

(Solved) : Implement Two Algorithms C Programming Needleman Wunsch Algorithm Computing Optimal Global Q44161512 . . .

Implement two algorithms in C programming:
i) Needleman-Wunsch algorithm for computing OPTIMAL GLOBALALIGNMENT, and
ii) Smith-Waterman algorithm for computing OPTIMAL LOCALALIGNMENT,

both using affine gap penaltyfunction between two input DNA sequences,s1 and s2, of lengths m and nrespectively.

Each cell of your Dynamic Programming table (“DP table”) shouldhave the following structure:

    struct DP_cell {
        int score;
        … // add any otherfield(s) that you may need for the implementation
    }

At the start of the program, you should read the alignment scoreparameters from a user-specified input file (optional). The defaultname of the file, if the user does not specify one, should be”parameters.config” in the presentworking directory. The parameters.config file should allow the userto specify one scoring parameter in each line (space or tabdelimited). For example:

    match    1
    mismatch    -2
    h -5
    g -1

The command prompt usage for your program should look asfollows:

    $ <executable name> <input filecontaining both s1 and s2> <0: global, 1: local><optional: path to parameters config file>

Input File Formats:

The two input sequences should be given as input in one textfile. The text file should be in what is called the “multi-sequenceFASTA format”, which is as follows:

  • The format allows the file to contain any number of sequences,although in this program project you will have only two sequencesas input.
  • Each sequence will first start with a HEADER line, which hasthe sequence name in it. This header line will always starts withthe “>” symbol and is immediately followed (without anywhitespace character) by a word that will serve as the uniqueidentifier (or name) for that sequence. Whatever follows the firstwhitespace character after the identifier is a don’t care and canbe ignored in your program.
  • The header line is followed by the actual DNA sequence which isa string over the alphabet {a,c,g,t}. The sequence can spanmultiple lines and each line can variable number of characters (butno whitespaces or any other special characters).

For example:

An input file called “input.fasta” could look like:

    >s1 sequence
   acatgctacacgtactccgataccccgtaaccgataacgatacacagacct
   cgtacgcttgctacaacgtactctataaccgagaacgattgaca
    tgcctcgtacacatgctacacgtactccgatgaccccgt

    >s2 sequence
   acattctacgaacctctcgataaccccataaccgataacgattgacacctcgt
   acgctttctacaacttactctctcgataaccccataaccgataacgattgacacctc
    gtacacatggtacatacgtactctcgataccccgt

At completion, the program should output/print the followinginformation:

  • Parameter values from the parameters.config file: match score,mismatch penalty, gap penalties (h, g)
  • Display the “names” or identifiers of the two sequencesaligned
  • The final optimal score
  • For global alignment: display any one optimalalignment (using optimal path retrace) s.t. the alignment is shownwrapped up with each line containing at most 60 aligningpositions.
  • Along with each alignment display, report the correspondingnumbers for matches, mismatches and gaps (insertions + deletions)and opening gaps.
  • For local alignment: again, display any one optimal scoringlocal alignment.

All output should be to the standard output.

  • Example Input/Output

As shown above, the alignment is wrapped after every 60alignment positions, and on both sides the starting and endingindices of the aligning positions in the respective strings shouldbe displayed. Then, a pipe symbol “|” is shown in columns whereverthere is a match. (Other columns contain a whitespacecharacter).

After implementing and testing, run your program on thefollowing two input files, redirect the standard output into aglobal alignment and a local alignment output text files and attachthem as part of your submission:

Opsin1_colorblindness_gene.fasta: Contains two sequences -Opsin1 gene in human vs. Opsin1 gene in mouse (this is one of thegenes responsible for colorblindness)

Human-Mouse-BRCA2-cds.fasta: Contains two sequences – BRCA2 genein human vs. BRCA2 gene in mouse (this is one of the breast cancergenes)

You are welcome to create your own additional test cases usingsmall genes from the NCBI GenBank data (e.g., genes related tocolor blindness from human vs. mouse).

Additional references: NCBI GenBank

Please help..

Expert Answer


Answer to Implement two algorithms in C programming: i) Needleman-Wunsch algorithm for computing OPTIMAL GLOBAL ALIGNMENT, and ii)…

OR