Bioinformatics


  This assignment requires screenshots for the answers.

 

Query the NCBI database using the “GeneID” provided and obtain the FASTA formatted AQP7 protein sequences for:

Homo sapiens -> Human -> GeneID: 364 

Pan troglodytes -> Chimpanzee -> GeneID: 465043 

Mus musculus -> Mouse -> GeneID: 11832

Rattus norvegicus -> Rat -> GeneID: 29171

Bos Taurus -> Cow -> GeneID: 615498

Danio rerio -> Zebrafish -> GeneID: 334529

Canis lupus familiaris -> Dog -> GeneID: 474742

Sus scrofa -> Pig -> GeneID: 100126283

Equus caballus -> Horse -> GeneID: 100068324

Mustela putorius furo -> Ferret -> GeneID: 101683120

Mesocricetus auratus -> Hamster -> GeneID: 101837538

Myotis brandtii -> Bat -> GeneID: 102263763 

For the descriptive comment line found at the beginning of each sequence, replace the common name provided.  Paste/Copy below the sequences in the exact order listed above as your answer. 

Answer = ?

Question 2.

Using the sequences prepared as answer to Question 1, run the Clustal Omega tool (https://www.ebi.ac.uk/Tools/msa/clustalo/) to generate a multiple sequence alignment with “STEP 2” parameters set to “Pearson/FASTA”:

The Clustal Omega tool is a newer improved version of ClustalW2 multiple sequence alignment tool. Under the “Alignments” tab of the Clustal Omega output, copy/paste the unedited form of the alignment as your answer.

Answer = ?

Question 3.

Under the same “Results Summary” tab of the Clustal Omega output, click on the hyperlink found under “Percent Identity Matrix”. The identities in the matrix returned are those values used to “Guide” the order in which the multiple alignment was built. Copy/Paste this matrix as your answer.

Answer = ?

Question 4.

Under the “Phylogenetic Tree” tab of the Clustal Omega output, scroll down and provide a screen shot of the “Phylogenetic Tree”, which represents the “Guide Tree” for the multiple sequence alignment generated.

Answer = ?

Question 5.

What does the “overview” look like for your answer provided to Q2 using JalView?

Answer = ?

Question 6.

When you look at the “Overview” plot provided for your answer to Question 5, notice there appears to be some sequences in the alignment that have vertical gaps of missing color. When you look at the “Local” view, you will notice these sequences are dissimilar enough that they negatively impact the level of “Conservation” across the alignment. In turn, they should/can be removed. Which ones are they? 

Answer = ?

Question 7.

What does the MSA “overview look like after you remove the sequences identified in Q6? VERY IMPORTANT, be sure the sequences you have identified to be removed are the ones highlighted for removal before doing so. Inspect the PCA and the MSA to make sure that is the case before removing. If you have removed correctly, you should not see these sequences in your MSA afterwards.

Answer = ?

Question 8.

When looking over the answer to Q7, there may exist high “Conservation” now after deleting those sequences, but there are empty columns present. These are non-informative and need to be removed. What does the “overview” look like after removing these empty columns? VERY IMPORTANT, if you notice in your MSA that there appears to be a sequence still quite different than the rest, you need to go back to Question 6 and Question 7 and repeat.

Answer = ?

Question 9.

What are the final-now-edited sequences in the MSA at this point? VERY IMPORTANT, if you notice in your MSA what appear to be gaps in your sequence, you have not removed all outlier sequences and need to go back to Question 6, Question 7, Question 8 and repeat.

Answer = ?

Question 10.

When you have a MSA in FASTA sequence format, you can also ask and answer what the secondary structures (i.e., “H”elices, Beta Sh”E”ets) that may exist in the MSA are. What are these structures and where do they occur in the MSA?

Answer = ?