Construction
of Phylogenetic Trees
Phylogenetic
trees serve as visual representations illustrating the evolutionary connections
and relatedness among the species depicted within the tree structure. Phylogenetic
trees are essential tools in biology for understanding
the evolutionary relationships between different species or groups of
organisms. These trees depict the evolutionary
history of organisms, showing the branching patterns that represent common
ancestry. Constructing a phylogenetic tree involves several steps, including
data collection, alignment, tree building, and interpretation. we will discuss
each of these steps in detail.
1.
Data Collection:
Ø The
first step in building a phylogenetic tree is to gather data, typically genetic
sequences such as DNA or protein sequences.
Ø DNA
sequences from specific genes, like ribosomal RNA (rRNA) or mitochondrial DNA,
are often used because they provide valuable information about evolutionary
relationships.
Ø Researchers
may also use protein sequences or morphological characteristics, although
genetic sequences are generally preferred due to their higher information
content.
2.
Sequence Alignment:
Ø Once
the data is collected, the next step is to align the sequences. Sequence
alignment involves arranging the sequences in such a way that similarities and
differences between them become apparent.
Ø There
are various alignment algorithms and software tools available for this purpose,
such as ClustalW, MUSCLE, and MAFFT.
Ø After
alignment, gaps may be introduced to optimize the alignment, ensuring that
homologous regions are properly aligned.
3.
Phylogenetic Tree Building Methods:
Ø There
are several methods for constructing phylogenetic trees, each with its own
advantages and limitations. Some common methods include:
1.
Distance-based
methods: These methods calculate the genetic distance
between sequences and use this information to build a tree. Examples include Neighbor
Joining and UPGMA (Unweighted Pair Group Method with Arithmetic Mean).
2.
Character-based
methods: These methods analyze the discrete characters
(e.g., presence or absence of specific traits) to infer evolutionary
relationships. Parsimony and Maximum Likelihood are examples of character-based
methods.
3.
Bayesian
methods: Bayesian inference uses probability theory to
estimate phylogenetic trees, taking into account prior knowledge and
uncertainty in the data.
The choice of method
depends on factors such as the nature of the data, computational resources
available and the assumptions underlying the method.
4.
Tree Evaluation and Interpretation: Once a phylogenetic
tree is constructed, it needs to be evaluated to assess its reliability and
interpret the evolutionary relationships it represents.
a. Bootstrap analysis and support
values: Bootstrap analysis involves resampling the data to
generate multiple datasets, from which multiple trees are constructed. Support
values, often expressed as bootstrap percentages, indicate the robustness of
the branches in the tree.
b. Consistency with known biology:
The resulting tree should be consistent with what is known about the biology
and evolutionary history of the organisms being studied. For example, closely
related species should cluster together on the tree.
c. Outgroup analysis:
Including an outgroup—a species that is known to be distantly related to the
taxa of interest—can help root the tree and provide additional context for
interpreting evolutionary relationships.
Constructing phylogenetic trees is a complex but
essential task in evolutionary biology. By collecting genetic data, aligning
sequences and using appropriate methods for tree building, researchers can gain
valuable insights into the evolutionary history of organisms. Interpreting
phylogenetic trees requires careful consideration of the data, methodology, and
biological context, but the resulting trees provide a framework for
understanding the diversity of life on Earth.
STEPS
OF MAKING A PHYLOGENETIC TREE
a.
Find and download the sequences to be included
in the tree from the biological database (NCBI: http://www.ncbi.nlm.nih.gov), (EMBL: http://www.ebi.ac.uk/) and (DDBJ: http://www.ddbj.nig.ac.jp/)
For example the
following sequences
1.
>NW_026129590.1:c715037-712204
Labeo rohita strain BAU-BD-2019 unplaced genomic scaffold, IGBB_LRoh.1.0
scaffold_66, whole genome shotgun sequence
GCAGCAATGATATAGTCGCATACATATTGGAGATCATGAACACCCAGGTGCTTAAAAAAGAAGACGCTGT
GCCGCGACTGCAGCTGTTTAATTGCGCTTATACTGAATGTGGCGCCACATTCACTCGTCGATGGCGTTTA
CAGGAGCACGAGACCGTGCATACTGGTGCAGTGAGTCGTATTCATATTTTCCCTTTTTTTAAAAAAAAGG
GGTAGTGTTTATCACGTGCTATTTTATTTATTTATGTAGCCATGATCCGTTTTTTTTTGTTTGTTTGTTT
GTTTGTTTGTTTGTTTGTTTGTTTTTTTGTTTTTTTTTTGTCTTCTGTTTTCTTCAGCGACCTCACAAGT
GTGTTGTAGCGGGGTGTGGACGCAGTTTCACCCGTAAATCGCACCTGAGCCGACATGCTTTGGTACACAG
CGGAGTGAAAAACTTCAAGTAATATGATTATTAATAACAACCAGTTTGACGTGTGTGTGTGTGTGTGTGT
GTGTGTTTTTGAGTGAGGCTTGTATAGGCTTGGTTATTTGGATCCATTTGCATGACCTACTTGGGTGAAC
CCTTTATAGTTTTATATTAAGTTATGGCATTGTATTAACTGAGAGGTTATGTAGAATGTTTATATTTAAA
ATGATCTGGCATGGCGGGAAGGCAATTGTGGTTTCATATTTGTGAATAACCATTTCCCTCCGTTTTGATC
ATCTGAGAATGCTGGTTTCTCGCTGAGCAGGTGCACTGCAGCCGCATGTAGCAAAAGCTTCTGCACCGCT
GATAAACTGAAGAGGCACGTGCGTTACGCTCACAGCGAGAAACGCGAATATTTCAAGGTTTGACCCTTTT
ATTTTTTTTTATTTTCTAAACATATTTATTGGCTGCTTTTTGCTATAAACAGGCTTTATGCTTGTAATTA
AAAAAAAAAAAATAAGACTGAGTGTTTTTCTTTCTTAGTGTAAAGATCCACAGTGCGCAAAGACTTTTAA
GAAACGGCGAGTGTTAAAGCTGCACCTGGCGACGCACGGTACATCAAGTTTCAGGTGCGTCTGCAGCCTG
GAAGTCTTTGTAACCAAGGGTTTATAACTGCTAAAAGTAAAATACAGTATATGTACCTGGGTGTTTTCTG
TTTAAGGTGCTTGAAAAGCGGCTGTGGAATGAGGTTTGAGTCTCACATTGCACGGAAAGCGCATGATAAA
AGGCATTCAGGTAAGGATACATTAACTCTGAAGTGTTTTATCGACTGGCATTAACTGTTTAACCCCAATA
TTTGATCAACATATGGTGATGCTCATTAGCTGTCATGTTAACATGTTACCCAGAGGTTAATCCCAAAGTC
TCTTTACTTGTTCACATTTTTGGTAAGTATGGGTGAACTTTTTTTTTTTTTTTTCCTCCCTTAAAACGAG
TAAAATGTAACTAACTGTTAAGTAACTTTTGTTTGGATGCTTTGGAGTTGAGATTTGCATTTTGAGGTCC
GTGCTACCTTAAAACAAGTTCACCATGCCCACTCAGTATTTGTTTCTTCAGGTTATCGCTGTCTTCATGC
TGAATGCCCCATCAGTGTGCACACCTGGAGGAATTTACAAAAACACATGGCAAGTCACCCAGGTAAACTG
GTGGAACCATCCGTGTTTTCTTATTGGCTGTAAACCGCTGTTTTTCATGAGTGGCATGTTCTTTTTGTGC
CCATGATGGTTTCCACAATAGGTCTACATTCTGTTTTGTTAACAGCATCATTCCCTTGTATGGTATGTAA
AAAGACTTTCAAGAAACGTGACTCTTTGAGGAGACACAAGCGGACGCATGCGTTGCAGAAGCCGGTTTTA
CTCTGTCCCAGTCAGGGCTGCCAGGCTTACTTTTCCACCACTTTCAACCTGCAGCATCATATTCGAAAGG
TCCATCTGCAGCTGCTCTCACACCGCTGTTCCTTCCCTGGCTGTGGCAAAAGTTTTGCCATGCGTGTGAG
TACATTTGAGGAACATGGAAAGGATGGTAAAATGGTCTCATGATATTTTGTGTCCTAAAGCCAAATGATG
TGAAATAAGTAACCAGTGGTTGAATTCTGCAAATCACAACTCCAGTTTTGGGGTGATGTTCAAGTCAGCA
TGTCTTTATAGGTCTGTGGTGTTTGCAGTGTTAGATTGTGAAAAAATTGTGCTAGTAACCCCTGCTTTAC
ATGTGCTGTTTTAACAGGAAAGCCTTGTCCGCCACATGTTACGTCATGAACCTGATGTGGCCAAACTCAA
GGTAGTTCTGTCCCTTAAAGTAGTATTTAAAGGGTTAAATGCTGGTTGGGCTTAATTAACAGCATCTGTG
GCATGTACGTAATCACACCGCTTAAGTTTCATTTCCTCAACTCTCTGGTTGTTTGTGTTGTTTAGCACCC
ACATAAACGGAGCAGCAAGAGCTGGCAGAAACGCTTGGAAGGGCGAAGTCGACGTCCTCTGGTTGAAGAC
CTTCGATCTCTCTTTTCGTTACGCATGAAGATCTCTCCTAGGGCCAAACTGGAAGCTGATCTGACGGGCC
TCTTTAATGAGCGTAAAATCCCCCACCACGTTGATCCGGAAGTAAACCTCAGAGATCTGTTTAATGTCCG
TCCAACTAAGGTGGTCGATAAATAGAATTGTCCAATACTCTTAACCCCCCCTTTTTTTGAATACTGGTGA
CATGCGTTTACCTTAACTGAAGAGAATTTTGTTTCAAGTGTTAGTACTAGAGGTACAGAATTACTTTATA
ACTGCTGAACTATGTTGTTTGAGAACCGTTGTTTGAGAAACAACTGAAATTTAAGTTCTGTATGATGAAA
ATAAAGGTGGTTTGGTTTTGTCAGTCTGGCCAAT
2.
>NC_015193.1:2861-3835
Labeo bata mitochondrion, complete genome
ATGCTAAACATCCTAATAACCCACCTAATTAACCCACTAGCCTACATCGTGCCTGTCCTACTAGCAGTAG
CTTTCCTAACACTAGTTGAACGAAAAGTACTAGGCTATATACAACTACGAAAAGGACCTAACGTAGTAGG
ACCTTACGGACTACTACAGCCCATCGCCGACGGAGTAAAATTATTTATTAAAGAACCAGTCCGCCCATCT
ACATCATCCCCATTCCTATTTCTAGCCACCCCAATACTCGCACTAACACTAGCCATAACCCTATGAGCAC
CTATACCAATACCCCACCCAGTAACTGACCTAAACCTAGGAATCCTCTTCATCCTAGCCCTATCAAGCCT
AGCAGTATATTCAATTTTAGGGTCAGGATGAGCATCAAATTCAAAATACGCACTAATTGGTGCCCTACGG
GCTGTAGCCCAAACAATTTCCTATGAAGTAAGTCTAGGACTTATTCTCCTCTCAGTAATCATCTTCTCTG
GTGGATATACACTACAAACATTTAATATTACCCAAGAAAGCATCTGATTACTCGTACCAGCCTGACCATT
AGCCGCAATATGATATATCTCAACACTAGCTGAAACAAACCGAGCACCATTCGACCTAACAGAGGGAGAA
TCAGAACTAGTATCTGGCTTCAACGTAGAATATGCAGGAGGACCCTTCGCCCTCTTTTTCCTAGCCGAAT
ATGCTAACATTCTACTAATAAATACCTTATCTGCCGTATTATTCCTAGGAGCCTCACACATCCCAAGCAT
TCCCGAACTAACCACAATTAACCTAATAACTAAAGCTGCACTATTATCAATTTTATTCCTATGGGTACGA
GCTTCCTACCCACGATTCCGATATGACCAACTAATACATTTAGTATGAAAAAATTTCCTCCCACTAACAC
TTGCCTTCGTACTATGACATACCGCCCTACCAATTGCACTAGCAGGGCTTCCCCCACAACTATAA
3.
>NC_016892.1:3796-4770
Catla catla mitochondrion, complete genome
ATGCTAAACATCCTAATAACTCACCTAATTAACCCCCTAGCCTACATTGTACCCGTTCTCCTAGCAGTAG
CTTTCCTAACATTAATTGAACGAAAAGTACTAGGTTATATACAACTACGAAAAGGCCCTAACGTAGTAGG
ACCCTACGGACTACTACAACCCATCGCCGATGGAGTTAAACTCTTTATTAAAGAACCAGTCCGCCCCTCC
ACATCATCCCCATTCTTATTCCTCGCCACCCCCATACTCGCACTAACCCTAGCCATAACCCTATGAGCAC
CAATACCCATACCTCACCCCGTAACGGACCTCAACCTGGGAATCCTATTTATCCTAGCCCTATCAAGCCT
AGCAGTATACTCAATCCTGGGGTCAGGATGAGCATCAAATTCAAAATACGCGCTAATCGGGGCCCTACGG
GCCGTAGCCCAAACAATTTCATATGAAGTAAGTCTTGGGCTAATCCTCCTTTCAGTAATCATCTTTTCAG
GAGGTTATACACTACAAACATTCAACACCACCCAAGAAAGCATCTGACTACTCGTACCCGCTTGACCCCT
AGCCGCAATATGGTATATCTCAACACTAGCCGAAACAAACCGAGCACCATTTGACCTAACAGAAGGAGAA
TCCGAACTAGTCTCTGGCTTCAACGTAGAATATGCAGGAGGGCCCTTCGCCCTATTCTTCCTAGCAGAAT
ATGCCAATATTCTACTAATAAACACACTATCAGCCGTACTATTCCTAGGAGCCTCACACATCCCAAGCAT
CCCTGAACTCACAACCATTAACCTAATAACCAAAGCTGCATTATTATCAATTTTATTCCTATGAGTACGA
GCCTCTTATCCACGATTCCGATATGATCAACTAATGCACTTGGTCTGAAAAAACTTCCTCCCCCTCACAC
TAGCCTTCGTTCTATGACACACCGCCCTACCAATTGCACTAGCAGGACTCCCCCCACAACTATAA
4.
>NC_027495.1:11963-13801
Botia lohachata mitochondrion, complete genome
ATGCACACAACAGCCCTAATCTTATCCTCCTCACTAGTACTAGTCCTCACAATCCTCACATACCCGCTAC
TAACCTCACTTAACTCAAAACCCTTAAACCCAAAATGAGCAACCTCTCACGTTAAAACAGCCGTAAGCTG
TGCCTTTTTCATTAGTTTAGTACCTCTCATAATTTTCTTAGACCAAGGGGCCGAAACTATCGTTACAAAC
TGACATTGAATAAATACATCAATATTTGACATCAACATTAGCTTTAAATTTGACCAATACTCCCTTATTT
TTACACCAATTGCTTTATATGTTACTTGATCAATTTTAGAGTTTGCATCATGATATATACACTCCGACCC
ATACATAAACCGTTTTTTCAAATATTTACTTCTATTCTTAGTAGCCATAATTATCTTAGTAACAGCTAAC
AACATATTCCAACTCTTCATTGGCTGAGAAGGGGTAGGAATTATATCATTTCTATTAATCGGATGATGAT
ACGGACGAGCAGATGCCAACACAGCAGCACTCCAAGCCGTACTATATAACCGAGTAGGAGATATTGGACT
AATTATATGCATAGCCTGACTTGCAATAAACATAAACTCATGAGAAATTCAACAAATCTTCTTCCTATCA
AAAAACTTTGACATAACCTTACCTCTCGTAGGACTAATCCTCGCAGCAACAGGAAAATCAGCACAATTTG
GGCTTCACCCGTGATTACCCTCCGCCATGGAGGGCCCCACACCAGTGTCTGCCCTACTTCACTCTAGCAC
AATAGTTGTTGCCGGAATTTTCTTACTTATCCGACTCCACCCCCTAATAGAAAACAACAAAACAGCCTTA
ACAATCTGTCTTTGCCTGGGAGCCCTAACCACATTATTTACAGCTGCTTGTGCCCTCACTCAAAACGACA
TTAAAAAAATTGTAGCTTTCTCAACATCAAGTCAGCTAGGTCTCATAATAGTTACAATTGGACTTAACCA
ACCACAACTAGCATTCCTACACATCTGTACCCACGCTTTCTTCAAAGCCATACTATTTCTATGTTCAGGA
TCAATTATTCATAGCCTAAACGACGAACAAGATATCCGAAAAATAGGGGGCCTACATAACCTAATACCAT
TTACCTCAACCTGCCTCACAATTGGCAGCTTAGCACTTACAGGCACTCCATTCCTAGCCGGCTTTTTCTC
CAAAGATGCCATCATTGAAGCCCTAAATACCTCACACCTAAACGCCTGAGCCCTAACCCTCACACTAATT
GCCACTTCATTTACCGCCGTATATAGCTTCCGAGTCGTATACTTCGTAACTATAGGAACACCACGATTCC
TACCCATATCCCCAATCAACGAAAACGACCCAGCAGTTATTAAACCCATCAAACGACTCGCCTGAGGAAG
CATTTTCGCAGGACTTCTCATCACCTCAAACTTTTTACCTACCAAAACACCCATCATAACTATACCAACA
ACCCTAAAAACAACCGCTCTGCTAGTTACAATTATAGGACTACTAATAGCCATAGGACTAACAGCCTTAA
CAAGTAAACAATTCAAAATCACTCCAACAATAACATCACACCATTTCTCCAACATACTAGGATATTTCCC
AGCAATCATACACCGATTTATTCCAAAGCTAAATTTAGTATTAGGACAATCAATTGCCACCCAACTAGTA
GACCAAACATGATTCGAAGCCGTAGGACCAAAAGGATCCACAGCCGCCCAACTAAAAATAGCCAAAATTA
CTAGTGATGCCCAACGAGGAATAATCAAAACATACTTAACCATTTTCTTCCTGACCCTAACCCTAGCAAC
CCTTTTAGCCACCCTTTAA
b.
Align the acquired sequences, check and trim the
alignment using CLUSTALW
Then
go to the uniprot and paste the
sequences to align
Then
paste the sequences in fasta format and run alignment
Then
download the alignment
c.
For Construct the phylogenetic genetic tree click
on the Tree on the Uniprot
and download the result .
Primary Criteria for Constructing a
Multiple Sequence Alignment (MSA):
1. Structural Homogeneity
2. Evolutionary Consistency
3. Functional Concordance
4. Sequence Alignment Precision
Key Applications of Multiple
Sequence Alignment (MSA):
1. Phylogenetic Reconstructions
2. Pattern Recognition
3. Extrapolative Studies
4. Domain Mapping
5. Identification of DNA Regulatory
Elements
6. Structural Predictions
7. PCR Primer Design