Sequence Alignment for Phylogenetic Analysis: Difference between revisions

From Bridges Lab Protocols
Jump to navigation Jump to search
Added PhyloBayes information
Added info about FASTA code
Line 1: Line 1:
== Locate Sequences and Generate FASTA File ==
== Locate Sequences and Generate FASTA File ==


=== Generating a FASTA File===
* FASTA format is described [https://zhanglab.ccmb.med.umich.edu/FASTA/ here], and [https://en.wikipedia.org/wiki/FASTA_format here] you need each sequence to start with a >SEQUENCENAME followed by a return and then the sequence, in this case the protein sequence.  An example of a FASTA file would be:
<code>
>SEQUENCE_1
MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG
LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK
IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL
MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL
>SEQUENCE_2
SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI
ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH
</code>
* Save sequences in notepad, [https://notepad-plus-plus.org/ notepad++] or [https://www.sublimetext.com/ sublime] (not Word) as a <FILENAME>.fasta file.
* Sequence names cannot have spaces.  Generally its better to name it as mm_Gdf15-NM_004864.4 where mm indicates mouse, Gdf15 is the gene name and NM indicates a [https://www.ncbi.nlm.nih.gov/refseq/ RefSeq mRNA].  If there are multiple mRNA's for the gene, name them


== Create Multiple Sequence Alignment using CLUSTAL Omega ==
== Create Multiple Sequence Alignment using CLUSTAL Omega ==
Line 8: Line 31:
* Generate phlogenetic trees with [http://megasun.bch.umontreal.ca/People/lartillot/www/download.html PhyloBayes] or  Mr. Bayes [[Using Mr Bayes to For Phlyogenetic Analysis]].   
* Generate phlogenetic trees with [http://megasun.bch.umontreal.ca/People/lartillot/www/download.html PhyloBayes] or  Mr. Bayes [[Using Mr Bayes to For Phlyogenetic Analysis]].   


=== PhyloBayes Analysis ==
=== PhyloBayes Analysis ===


* Mark in your notes the software version used.
* Mark in your notes the software version used.
* The PhyloBayes manual can be found [http://megasun.bch.umontreal.ca/People/lartillot/www/phylobayes4.1.pdf here].
* The PhyloBayes manual can be found [http://megasun.bch.umontreal.ca/People/lartillot/www/phylobayes4.1.pdf here].

Revision as of 13:07, 18 April 2019

Locate Sequences and Generate FASTA File

Generating a FASTA File

  • FASTA format is described here, and here you need each sequence to start with a >SEQUENCENAME followed by a return and then the sequence, in this case the protein sequence. An example of a FASTA file would be:

>SEQUENCE_1

MTEITAAMVKELRESTGAGMMDCKNALSETNGDFDKAVQLLREKGLGKAAKKADRLAAEG

LVSVKVSDDFTIAAMRPSYLSYEDLDMTFVENEYKALVAELEKENEERRRLKDPNKPEHK

IPQFASRKQLSDAILKEAEEKIKEELKAQGKPEKIWDNIIPGKMNSFIADNSQLDSKLTL

MGQFYVMDDKKTVEQVIAEKEKEFGGKIKIVEFICFEVGEGLEKKTEDFAAEVAAQL

>SEQUENCE_2

SATVSEINSETDFVAKNDQFIALTKDTTAHIQSNSLQSVEELHSSTINGVKFEEYLKSQI

ATIGENLVVRRFATLKAGANGVVNGYIHTNGRVGVVIAAACDSAEVASKSRDLLRQICMH

  • Save sequences in notepad, notepad++ or sublime (not Word) as a <FILENAME>.fasta file.
  • Sequence names cannot have spaces. Generally its better to name it as mm_Gdf15-NM_004864.4 where mm indicates mouse, Gdf15 is the gene name and NM indicates a RefSeq mRNA. If there are multiple mRNA's for the gene, name them

Create Multiple Sequence Alignment using CLUSTAL Omega

PhyloBayes Analysis

  • Mark in your notes the software version used.
  • The PhyloBayes manual can be found here.