Data
Analysis and
BioInformatics in
real-time qPCR
main
page
subpage 2
subpage
3
|
What
is Bioinformatics?
In
the last few decades, advances in molecular biology and the equipment
available for research in this field have allowed the increasingly
rapid sequencing of large portions of the genomes of several species.
In fact, to date, several bacterial genomes, as well as those of
some simple eukaryotes (e.g., Saccharomyces cerevisiae, or
baker's yeast) and more complex eukaryotes (C. elegans and
Drosophila) have been sequenced in full. The Human Genome Project,
designed to sequence all 24 of the human chromosomes, is also
progressing and a rough draft was completed in the spring of 2000.
http://www.library.csi.cuny.edu/~davis/molbiol/lecture_notes/bioinformatics_genomics/bioinformaticsIntro.html
|
Types
of data
|
|
Analysis and
Interpretation of Data
|
|
|
|
The
various types of data:
Many different types of
data
are collected and stored in databases to facilitate retrieval. Depicted
here
are amino acid sequences, protein domain cartoons, different renderings
of
three-dimensional structures, and protein hydrophobicity data.
Databases
consisting of data derived experimentally such as nucleotide sequences
and
three-dimensional structures are known as primary databases. Those data
that
are derived from the analysis or treatment of primary data such as
secondary
structures, hydrophobicity plots, and domains are stored in secondary
databases.
A protein database consisting of the conceptual translation of
nucleotide
sequences would also be considered a secondary database.
|
|
The analysis and
interpretation of various data types: Illustrated here are
various
ways in which individual entries in sequence and structure databases
can be compiled to reveal patterns and trends in biology. For example,
sequence families or neighborhoods can be defined and annotated based
on the similarity of each sequence to other members of the family.
Common sequence features in sequence families can be identified in
multiple alignments. These motifs may provide clues to the biochemical
function of members of the family. Clustering of sequences into trees
that reflect the degree of similarity between each sequence and all
of the others in the family reveals evolutionary relationships.
Finally, identification of homologs to each gene in well-characterized
metabolic
pathways provides information about the prevalence of that pathway in
other organisms.
|
http://www.ncbi.nlm.nih.gov/Education/Bioinformatics/datatypes.html
|
|
http://www.ncbi.nlm.nih.gov/Education/Bioinformatics/dataanal.html
|
qPCR software
applications:
- Normalisation and
Housekeeping Genes:
Molecular Biology Freeware for Windows
A. General - below
B. Microarray -
next page
C. Java programs - next page
Good places to start are Genamics SoftwareSeek and BioExchange
and eBioinfogen.
For general software see Winsite. The
following sites are arranged in the order that I discovered them. At
some point they will be clustered by poreference:
A. DNA,
RNA and genomic analysis
B. Plasmid
graphic packages
C. Primer
design
D. Protein
analysis
E. Viewing
three dimensional structures
F. Alignments
G. Phylogeny
H. Miscellaneous
Statistical power calculations
R. V. Lenth
Department of Statistics and Actuarial Science, University of Iowa,
Iowa City 52242
ABSTRACT:
This
article focuses on how to do meaningful power calculations and
sample-size determination for common study designs. There are 3
important guiding principles. First, certain types of retrospective
power calculations should be avoided, because they add no new
information to an analysis. Second, effect size should
be specified on the actual scale of measurement, not on a standardized
scale. Third, rarely can a definitive study be done without first doing
a pilot study. Some simple examples as well as a complex example are
given. Power calculations are illustrated using Java applets developed
by the author.

http://www.stat.uiowa.edu/~rlenth/Power/
and
http://www.stat.uiowa.edu/~rlenth/Power/oldversion.html (runs more stable in
Internet Explorer 7)
Java applets for power and sample size
This
software is intended to be useful in planning statistical
studies. It is not intended to be used for analysis of data that
have already been collected. Each selection
provides a graphical interface for studying the power of one or more
tests. They include sliders (convertible to number-entry fields)
for varying parameters, and a simple provision for graphing one
variable against another.
Each
dialog window also offers a Help menu. Please read the Help menus
before contacting me with questions.
The
"Balanced ANOVA" selection provides another dialog with a list of
several popular experimental designs, plus a provision for specifying
your own model.
Note: The dialogs open
in separate
windows. If you're running this on an Apple Macintosh, the applets'
menus are added to the screen menubar -- so, for example, you'll have
two "Help" menus there!
You may
also download this software to run it on your own PC.
Power
Calculator
-
Written in
PHP by Arno Ouwehand,
using the DSTPLAN
distribution by Barry Brown
et al. These calculators extend the functionality of the old Xlisp-Stat
based Power Calculator by not only computing the power for given sample
size, or sample size for given power, but will also compute the other
available items when specified.
Further statistical
calculators here => http://calculators.stat.ucla.edu/
by UCLA
Department of
Statistics
URI
Genomics & Sequencing Center
Calculator for
determining the
number of copies of a template
qPCR-DAMS: a
Database Tool to Analyze, Manage, and Store Both Relative and Absolute
Quantitative Real-Time PCR data.
Jin N, He K, Liu L.
Physiol Genomics. 2006
Physiological Sciences, Oklahoma State University, Stillwater, OK, USA.
Quantitative
real-time PCR is an important high throughput method in biomedical
sciences. However, existing software has limitations in handling both
relative and absolute quantification. We designed qPCR-DAMS
(Quantitative PCR Data Analysis and Management System), a database tool
based on Access 2003, to deal with such shortcomings by the addition of
integrated mathematical procedures. qPCR-DAMA allows a user choose
among four methods for data processing within a single software
package: (I) Ratio relative quantification, (II) Absolute level, (III)
Normalized absolute expression, and (IV) Ratio absolute quantification.
qPCR-DAMS also provides a tool for multiple reference gene
normalization. qPCR-DAMS has three quality control steps and a data
display system to monitor data variation. In summary, qPCR-DAMS is a
handy tool for real-time PCR users.
Availability: This software is free
for academic use and downloadable at http://download.gene-quantification.info/
FastPCR is a free software
for Microsoft Windows and
is based on a new approach in the design of PCR primers for standard
and long PCRs, inverse PCR, direct amino acid sequence degenerate
PCR, multiplex PCR and in silico PCR; for sequence alignments,
clustering and any kind repeat sequence searching.
At this
moment the program is only for OS Microsoft Windows,
but C#
.Net Linux and
Mac program versions are currently under
preparation.
FastPCR
Software can simultaneously work with multiple nucleic acid or protein
sequences (up to 1,000,000). The multiplex PCR primers design and
"in silico" PCR are also supported. The FastPCR program is an
ideal software for personal
databases homology searches which are similar to the basic
local alignment search tool (BLAST) algorithm (a segment-to-segment
alignment principle similar to DIALIGN).
The program includes various bioinformatics tools and supports the
clustering of sequences. A new repeats search theory was developed and
applied to the program, which makes the accomplishment of all DNA
repeat types searches
fast and powerful.
FastPCR
software has several specific, ready-to-use templates for
many PCR and sequencing applications:
- Standard, inverse and
long PCR - Locates optimal primers for
PCR, hybridisation, or sequencing.
- Multiplex PCR
primers design - fast primers
design with a cross-dimers test for high sensitive multiplex
PCRs.
- Design group-specific PCR
primers.
- Degenerate PCR: primers
are designed directly on
an amino acid sequence.
- In Silico PCR
- prediction of
probable PCR products and the mismatche primer location search.
- Primers Secondary
structures - self-dimers and cross-dimers
primer analyses; primer alignment and melting temperatures calculation.
- False priming
- primers checking for multiple
annealing sites using sequence alignment algorythms.
- Primer quality
- a unique way for PCR efficiency
determination.
- Comprehensive primer
report - comprehensive pairs and
individual primers analysis.
The
software supports
several file formats: FASTA, text and Excel files.
Tools:
- Primer
tests and dimer detection;
- Powerful
Repeats Search: Invert, Direct, Simple and others;
- Clustering
Sequences;
- Make
complement, reverse complement and inverted stand;
- Search
the sequence with universal degenerated code with alignment;
- Extract
the sequence from selected sites;
- Protein/DNA
translation;
- Calculation
the annealing temperature of PCR in case unknowns PCR product.
- Database
tools;
- Restriction
analysis.
- Each
application document contains customisable search settings, based on
the latest published
primer selection criteria for those applications.
| Bioinformatics
analysis of
alternative splicing |
| Christopher Lee
& Qi Wang |
| Briefings in
Bioinformatics Volume: 6 Number:
1 Page: 23 -- 33 |
Over
the past few years, the analysis of alternative splicing using
bioinformatics has emerged as an important new field, and has
significantly changed our view of genome function. One exciting front
has been the analysis of microarray data to measure alternative
splicing genome-wide. Pioneering studies of both human and mouse data
have produced algorithms
for discerning evidence of alternative splicing and clustering genes
and samples by their alternative splicing patterns. Moreover, these
data indicate the presence of alternative splice forms in up to 80
per cent of human genes. Comparative genomics studies in both mammals
and insects have demonstrated that alternative splicing can in some
cases
be predicted directly from comparisons of genome sequences, based on
heightened
sequence conservation and exon length. Such studies have also provided
new insights into the connection between alternative splicing and a
variety
of evolutionary processes such as Alu-based exonisation, exon creation
and loss. A number of groups have used a combination of bioinformatics,
comparative genomics and experimental validation to identify new motifs
for splice regulatory factors, analyse the balance of factors that
regulate alternative splicing, and propose a new mechanism for
regulation based
on the interaction of alternative splicing and nonsense-mediated decay.
Bioinformatics studies of the functional impact of alternative splicing
have revealed a wide range of regulatory mechanisms, from NAGNAG sites
that add a single amino acid; to short peptide segments that can play
surprisingly complex roles in switching protein conformation and
function (as in the
Piccolo C2A domain); to events that entirely remove a specific protein
interaction domain or membrane anchoring domain. Common to many
bioinformatics
studies is a new emphasis on graph representations of alternative
splicing
structures, which have many advantages for analysis.
| Comparison of different
melting temperature calculation methods for short DNA sequences. |
| Alejandro
Panjkovich & Francisco Melo |
| Bioinformatics
(21,6): 711 -- 722 |
|
|
|
|
Motivation:
The overall performance
of several molecular biology techniques involving DNA/DNA hybridization
depends on the accurate prediction of the experimental value of a
critical parameter: the melting temperature Tm.
Till date, many computer software programs based on different methods
and/or parameterizations are available for the theoretical estimation
of the experimental Tm value of any
given short oligonucleotide sequence. However, in most cases, large
and significant differences in the estimations of Tm
were obtained while using different methods. Thus, it is difficult to
decide which Tm value is the accurate
one. In addition, it seems that most people who use these methods are
unaware about the limitations, which are well described in the
literature
but not stated properly or restricted the inputs of most of the web
servers and standalone software programs that implement them.
Results: A
quantitative comparison on the
similarities and differences among some of the published DNA/DNA Tm
calculation methods is reported. The comparison was carried out for a
large set of short oligonucleotide sequences ranging from 16 to 30 nt
long, which span the whole range of CG-content. The results showed that
significant differences were observed in all the methods, which in some
cases depend on the oligonucleotide length and CG-content in a
non-trivial manner. Based on these results, the regions of consensus
and disagreement for the methods in the oligonucleotide feature space
were reported. Owing to the lack of sufficient experimental data, a
fair and complete assessment of accuracy for the different methods is
not yet possible. Inspite
of this limitation, a consensus Tm with
minimal error probability was calculated by averaging the values
obtained from two or more methods that exhibit similar behavior to each
particular combination of oligonucleotide length and CG-content class.
Using a
total of 348 DNA sequences in the size range between 16mer and 30mer,
for which the experimental Tm data are
available, we demonstrated that the consensus Tm
is a robust and accurate measure. It is expected that the results of
this work would be constituted as a useful set of guidelines to be
followed for the successful experimental implementation of various
molecular biology techniques, such as quantitative PCR, multiplex PCR
and the design of optimal DNA microarrays.
Availability:
A binary software distribution to
calculate the consensus Tm described in
this work for thousands of oligonucleotides simultaneously for the
LINUX operating system is freely available upon request to
the authors or from our website http://protein.bio.puc.cl/melting-temperatures.html
Supplementary
information: The large set of
oligonucleotides, the detailed results of the comparative and accuracy
benchmarks, and hundreds of comparative graphs generated during this
work are available at our website http://protein.bio.puc.cl/melting-temperatures.html
|

A
data-driven clustering method for time course gene expression data.
Ma P, Castillo-Davis CI, Zhong W, Liu JS.
Nucleic
Acids Res. 2006 Mar 1;34(4):1261-9. Print 2006.
Department
of Statistics, Harvard University, Cambridge, MA 02138, USA.
Gene
expression over
time is, biologically, a continuous process and can thus be represented
by a
continuous function, i.e. a curve. Individual genes often share similar
expression
patterns (functional forms). However, the shape of each function,
the number
of such functions, and the genes that share similar functional
forms are
typically unknown. Here we introduce an approach that allows
direct
discovery of related patterns of gene expression and their underlying
functions
(curves) from data without a priori specification of either cluster
number or
functional form. Smoothing spline clustering (SSC) models natural
properties of
gene expression over time, taking into account natural differences
in gene
expression within a cluster of similarly expressed genes, the
effects of
experimental measurement error, and missing data. Furthermore, SSC
provides a visual
summary of each cluster's gene expression function and goodness-of-fit
by
way of a 'mean curve' construct and its associated confidence bands.
We apply this
method to gene expression data over the life-cycle of Drosophila
melanogaster and Caenorhabditis elegans to discover 17 and 16 unique patterns
of gene
expression in each species, respectively. New and previously described
expression
patterns in both species are discovered, the majority of which
are
biologically meaningful and exhibit statistically significant gene function
enrichment.
Distribution-insensitive
cluster analysis in SAS on real-time PCR gene
expression
data of
steadily expressed genes.
Tichopad
A, Pecen L, Pfaffl MW.
Comput
Methods Programs Biomed. 2006 Apr;82(1):44-50. Epub 2006
Cluster
analysis is a tool often employed in the micro-array techniques but
used less
in the real-time PCR. Herein we present core SAS code that instead of
the Euclidian
distances takes correlation coefficient as a dissimilarity measure. The
dissimilarity
measure is made robust using a rank-order correlation coefficient
rather
than a parametric one. There is no need for an overall probability
adjustment like in scoring methods based on repeated pair-wise
comparisons. The
rank-order correlation matrix gives a good base for the clustering
procedure
of gene expression data obtained by real-time RT-PCR as it disregards
the
different expression levels. Associated with each cluster is a linear
combination of
the variables in the cluster, which is the first principal component.
Large set
of variables can then be replaced by the set of cluster components with
little loss of information. In this way, distinct clusters containing
unregulated housekeeping genes along with other steadily expressed
genes can be
disclosed and utilized for standardization purposes. Simulated data in
parallel with the
data from a biological experiment were taken to validate the SAS macro.
For
both cases, good intuitive results were obtained.
Real-time RT-PCR: Neue Ansätze zur exakten
mRNA Quantifizierung
BioSpektrum
1/2004 (in German)

Die
molekularen Technologien Genomics, Transcriptomics und Proteomics
erobern immer mehr die klassischen Forschungsgebiete der
Biowissenschaften. Die enorme Flut an gewonnenen Daten und
Ergebnissen ist von überproportionalem Nutzen in der molekularen
Diagnostik und Physiologie sowie die „Functional Genomics“. Immer
neue ausgeklügelte Methoden und Anwendungen sind daher nötig
um komplexe physiologische Vorgänge zu beschreiben. Da wir uns
erst an Anfang dieser molekularen Ära befinden, ist es notwendig
diese Techniken zu optimieren und komplett zu verstehen. Eine dieser
technisch ausgefeilten Methoden zur zuverlässigen und exakten
Quantifizierung spezifischer mRNA, stellt die real-time RT-PCR dar.
Dieser Artikel beschreibt im Wesentlichen die effizienzkorrigierte
relative Quantifizierung, die
Normalisierung der Expressionsergebnisse anhand eines nicht regulierten
„Housekeeping Gens“, die Berechnung der real-time PCR Effizienz sowie
die Verrechnung und statistische Auswertung der Expressionsergebnisse.
Alle beschriebenen Themenkomplexe können im Detail auf der
korrespondierenden Internetseite in internationalen publizierten
Originalarbeiten nachgeschlagen werden.
|
Nucleic Acids Research - Recent Hot Papers
|
|
Nucleic Acids
Research 2005 vol 33 (Database issue)
The 2005 Database Issue of Nucleic Acids Research is the twelfth in a
series dedicated to factual databases in the field of molecular
biology. Such databases are an essential resource for working
biologists and this compilation provides descriptions and updates of
the most important of these databases and serves to introduce newly ...
[Full
Text of this Article]
http://nar.oupjournals.org/content/vol33/suppl_1/
Database
Categories List
- Nucleotide
Sequence Databases
- RNA
sequence databases
- Protein
sequence databases
- Structure
Databases
- Genomics
Databases (non-vertebrate)
- Metabolic
and Signaling Pathways
- Human
and other Vertebrate Genomes
- Human
Genes and Diseases
- Microarray
Data and other Gene Expression Databases
- Proteomics
Resources
- Other
Molecular Biology Databases
- Organelle
databases
- Plant
databases
- Immunological
databases
|
|
Nucleic Acids
Research 2004 vol 32 (Web Server issue)
Last year Nucleic
Acids Research
published a special issue devoted to web servers. This issue
complemented the annual Database Issue, which has now appeared in 11
successive years. The Web Server Issue highlights the many servers that
are available on the web to perform useful computations on DNA, RNA and
protein sequences and structures. Between them, the two issues provide
an unparalleled array of useful computational services. The new Web
Server Issue aims to provide a repository in which authors of web
servers can highlight their offerings and readers can find out what is
available.
In the current issue there are reports of 137 web servers that run the
gamut from BLAST services to three-dimensional protein structure
prediction. The servers described have all been subjected to rigorous
peer review, are available free of charge and provide invaluable
resources to the scientific community. The scientists and programmers
who have provided these resources deserve our immense thanks. They
illustrate the very best of the scientific spirit that transcends
national boundaries and promotes cooperation and the sharing of
resources.
http://nar.oupjournals.org/content/vol32/suppl_2/index.dtl
|

|
A web server for
performing electronic PCR
Kirill Rotmistrovsky, Wonhee Jang and Gregory D. Schuler
National Center for Biotechnology Information, National Library of
Medicine, National Institutes of Health, Bethesda, MD 20984, USA
‘Electronic PCR’ (e-PCR) refers to a computational procedure that is
used to search DNA sequences for sequence tagged sites (STSs), each of
which is defined by a pair of primer sequences and an expected PCR
product size. To gain speed, our implementation extracts short ‘words’
from the 3' end of each primer and stores them in a sorted hash table
that can be accessed efficiently during the search. One recent
improvement is the use of overlapping discontinuous words to allow
matches to be found despite the presence of a mismatch. Moreover, it is
possible to allow gaps in the alignment between the primer and the
sequence. The effect of these changes is to improve sensitivity without
significantly affecting specificity. The new software provides a search
mode using a query STS against a sequence database to augment the
previously available mode using a query sequence against an STS
database. Finally, e-PCR may now be used through a web service, with
search results linked to other web resources such as the UniSTS
database and the MapViewer genome browser. The e-PCR web server may be
found at www.ncbi.nlm.nih.gov/sutils/e-pcr
|
|
|
Sequence Mapping by
Electronic PCR
Gregory D. Schuler
Genome Research
Vol. 7, No. 5, pp. 541-550, May 1997
National Center for Biotechnology Information, National Library of
Medicine, National Institutes of Health, Bethesda, Maryland 20984
The highly
specific and sensitive PCR provides the basis for sequence-tagged sites
(STSs), unique landmarks that have been used widely in the construction
of genetic and physical maps of the human genome. Electronic PCR
(e-PCR) refers to the process of recovering these unique sites in DNA
sequences by searching for subsequences that closely match the PCR
primers and have the correct order, orientation, and spacing that they
could plausibly prime the amplification of a PCR product of the correct
molecular weight. A software tool was developed to provide an efficient
implementation of this search strategy and allow the sort of en masse
searching that is required for modern genome analysis. Some sample
searches were performed to demonstrate a number of factors that can
affect the likelihood of obtaining a match. Analysis of one large
sequence database record revealed the presence of several
microsatellite and gene-based markers and allowed the exact base-pair
distances among them to be calculated. This example provides a
demonstration of how e-PCR can be used to integrate the growing body of
genomic sequence data with existing maps, reveal relationships among
markers that existed previously on different maps, and correlate
genetic distances with physical distances.
|
| iPCR |
iPCR
= Virtual PCR
http://www.ch.embnet.org/software/iPCR_form.html
|
| In silico PCR |
In
silico
simulation of molecular biology experiments
http://insilico.ehu.es
In silico experiments with complete genomes
This site has been developed by Dr.
Joseba Bikandi, Dr. Rosario San
Millán and co-workers in the Department of Immunology,
Microbiology and Parasitology, Faculty of Pharmacy, in the University
of the Basque Country.
Some tools included in this site or
their prior versions where
primarily developed to obtain theoretical PCR results with Salmonella
by the group of Dr. Javier Garaizar and Dr. Aitor Rementeria research
group. Latter they were adapted to be used with any bacterial species
sequenced up to date. The list of genomes is updated shortly after
their availability at NCBI, and the number of tools available will also
increase in the near future. Additional databases used by these tools
have been obtained from NCBI and in some cases a link will redirect
users to NCBI in order to obtain specific information.
|
| UCSC
In-Silico PCR |
UCSC In-Silico PCR
http://genome.brc.mcw.edu/cgi-bin/hgPcr/
In-Silico PCR searches a sequence database with a pair of PCR primers,
using an indexing strategy for fast performance.
Configuration
Options
- Genome and Assembly - The sequence
database
to search.
- Forward Primer - Must be at least 15
bases in length.
- Reverse Primer - On the opposite
strand from the
forward primer. Minimum length of 15 bases.
- Max Product Size - Maximum size of
amplified region.
- Min Perfect Match - Number of bases
that match
exactly on 3' end of primers. Minimum match size is 15.
- Min Good Match - Number of bases on
3' end of
primers where at least 2 out of 3 bases match.
- Flip Reverse Primer - Invert the
sequence order of
the reverse primer and complement it.
|
New real-time PCR primer and probe
databases:
more
PRIMER links
=> here
Publication:
PATTYN, F., SPELEMAN, F., DE PAEPE A. & VANDESOMPELE, J. (2003). RTPrimerDB:
the Real-Time PCR primer and probe database. Nucleic Acids Research, 31(1): 122-123)
Publication:
Xiaowei Wang and
Brian Seed (2003) A PCR primer bank for quantitative
gene expression analysis.
Nucleic Acids
Research 31(24): e154; pp.1-8.
-
The Quantitative
PCR Primer Database (QPPD) provides information about primers
and probes that can be used to quantitate human and mouse mRNA by
reverse transcription polymerase chain reaction (RT–PCR) assays. All
data has been gathered from published articles, cited in PubMed.
|