Comp Sci

 

Q1) Write a program that asks the user for a file containing a FASTA nucleotide sequence (included is a file called sequence.fasta you can use). Then prompt the user to select from the following menu: 

  1. A. Calculate DNA composition: This will print to the screen the numbers of A, G, C and T nucleotides, and any unknowns (Ns). 
  2. B. Calculate AT content: Prints to the screen the percentage of AT in the sequence. 
  3. C. Calculate GC content: Prints to the screen the percentage of GC in the sequence. 
  4. D. Compliment: Prints to the screen the compliment of the DNA sequence. 
  5. E. Reverse compliment: Prints to the screen the reverse compliment. 

Each menu item above should be implemented in its own function. The function should be called when the user selects the respective menu item. The functions should accept as argument the DNA sequence and then perform the appropriate calculationsalgorithm. 

Input validation: Check to see that the file name entered by the user exists AND that the sequence is in FASTA format. You can assume that there is only one sequence in the file. 

Q2) Write a program that asks the user for a file containing a FASTA nucleotide sequence (you can use the same sequence.fasta file as above). Then prompt the user to select a frame (number 1 through 6). Your program should then find the translation (protein sequence) of the nucleotide sequence in that frame. Print the translation to the screen. 

Input validation: Check to see that the file name entered by the user exists AND that the sequence is in FASTA format. You can assume that there is only one sequence in the file. 

Q3) Write a program that asks the user for a sequence in GenBank format (included is a file called sequence.gb that you can use). Your program should convert the GenBank formatted sequence into FASTA format. Write the FASTA formatted sequence to a file, name of which should include the accession number (i.e. NM_001250672.txt, where NM_001250672 is the accession number). 

Q4) Write a program that asks the user for a file containing a nucleotide sequence AND the name of a restriction enzyme. Your program should return the positions in the sequence where the enzyme cuts. Parse out the enzymes and their cut sites from the attached RestrictionEnzymes.txt file. 

Q5) Read in a whole genome (in FASTA format file called genome.txt, see attached) and compute the background codon frequencies. The background frequency of a codon is computed by the formula: background_frq(codon) = 100 * N(codon)/ Total_codons where N(codon) is the number of occurrence of the codon across the entire genome, and Total_codons is the total number of all codons in the whole genome. Print out the background frequency of each codon, from AAA to TTT. Use a dictionary in your solution. Your program should count codons that appear in all reading frames and then calculate and display the average.