The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata, Nucleic Acids Research, vol.36, issue.Database, pp.475-479, 2008. ,
DOI : 10.1093/nar/gkm884
GenBank, Nucleic Acids Research, vol.37, issue.Database, pp.26-31, 2009. ,
DOI : 10.1093/nar/gkn723
Over- and Underrepresentation of Short DNA Words in Herpesvirus Genomes, Journal of Computational Biology, vol.3, issue.3, pp.345-360, 1996. ,
DOI : 10.1089/cmb.1996.3.345
Oligonucleotide bias in Bacillus subtilis: General trends and taxonomic comparisons, Nucleic Acids Research, vol.26, issue.12, pp.2971-2980, 1998. ,
DOI : 10.1093/nar/26.12.2971
Statistical analyses of counts and distributions of restriction sites in DNA sequences, Nucleic Acids Research, vol.20, issue.6, pp.1363-1370, 1992. ,
DOI : 10.1093/nar/20.6.1363
Identification of the Chi site of Haemophilus influenzae as several sequences related to the Escherichia coli Chi site, Molecular Microbiology, vol.27, issue.5, pp.1021-1029, 1998. ,
DOI : 10.1006/plas.1994.1011
Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals, Nucleic Acids Research, vol.28, issue.4, pp.1000-1010, 2000. ,
DOI : 10.1093/nar/28.4.1000
The 20 years of PROSITE, Nucleic Acids Research, vol.36, issue.Database, pp.245-249, 2008. ,
DOI : 10.1093/nar/gkm977
DNA binding sites: representation and discovery, Bioinformatics, vol.16, issue.1, pp.16-23, 2000. ,
DOI : 10.1093/bioinformatics/16.1.16
URL : http://bioinformatics.oxfordjournals.org/cgi/content/short/16/1/16
The statistical significance of nucleotide positionweight matrix matches, Comput Appl Biosci, vol.12, pp.431-439, 1996. ,
Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences, Nucleic Acids Research, vol.30, issue.14, pp.3214-3224, 2002. ,
DOI : 10.1093/nar/gkf438
Compositional bias in DNA, Current Opinion in Genetics & Development, vol.10, issue.6, pp.656-661, 2000. ,
DOI : 10.1016/S0959-437X(00)00144-1
URL : https://hal.archives-ouvertes.fr/hal-00427084
Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models, Nucleic Acids Research, vol.30, issue.6, pp.1418-1426, 2002. ,
DOI : 10.1093/nar/30.6.1418
URL : http://doi.org/10.1093/nar/30.6.1418
Computational approaches to gene prediction, J Microbiol, vol.44, pp.137-144, 2006. ,
Contribution of Horizontally Acquired Genomic Islands to the Evolution of the Tubercle Bacilli, Molecular Biology and Evolution, vol.24, issue.8, pp.1861-1871, 2007. ,
DOI : 10.1093/molbev/msm111
Analysis of an optimal hidden Markov model for secondary structure prediction, BMC Structural Biology, vol.6, issue.1, p.25, 2006. ,
DOI : 10.1186/1472-6807-6-25
Stochastic models for heterogeneous DNA sequences, Bull Math Biol, vol.268, pp.8-14, 1989. ,
Base compositional structure of genomes, Genomics, vol.13, issue.4, pp.1056-1064, 1992. ,
DOI : 10.1016/0888-7543(92)90019-O
Distributions associated with general runs and patterns in hidden Markov models, The Annals of Applied Statistics, vol.1, issue.2, pp.585-61, 2007. ,
DOI : 10.1214/07-AOAS125SUPP
Couting patterns in degenerated sequences, Lec. Notes in Bioinfo, vol.5780, pp.222-232, 2009. ,
Probabilistic and Statistical Properties of Words: An Overview, Journal of Computational Biology, vol.7, issue.1-2, pp.1-46, 2000. ,
DOI : 10.1089/10665270050081360
Numerical Solutions for Patterns Statistics on Markov Chains, Statistical Applications in Genetics and Molecular Biology, vol.5, issue.1, p.26, 2006. ,
DOI : 10.2202/1544-6115.1219
URL : https://hal.archives-ouvertes.fr/hal-00271482
Distribution theory of runs and patterns associated with a sequence of multi-state trials, Statistica Sinica, vol.6, issue.4, pp.957-974, 1996. ,
Explicit distributional results in pattern formation, The Annals of Applied Probability, vol.7, issue.3, pp.666-678, 1997. ,
DOI : 10.1214/aoap/1034801248
Waiting times for patterns in a sequence of multistate trials, Journal of Applied Probability, vol.I, issue.02, pp.508-518, 2001. ,
DOI : 10.1023/A:1003862225719
Distribution of waiting time until the rth occurrence of a compound pattern, Statistics & Probability Letters, vol.75, issue.1, pp.29-38, 2005. ,
DOI : 10.1016/j.spl.2005.05.007
Assessing the Significance of Sets of Words, Combinatorial Pattern Matching 05, 2005. ,
DOI : 10.1007/11496656_31
Effective p-value computations using Finite Markov Chain Imbedding (FMCI): application to local score and to pattern statistics, Algorithms for Molecular Biology, vol.1, issue.1, p.5, 2006. ,
DOI : 10.1186/1748-7188-1-5
URL : https://hal.archives-ouvertes.fr/hal-00271494
Waiting Time Distributions for Pattern Occurrence in a Constrained Sequence, Discrete Mathematics and Theoretical Computer Science, vol.9, pp.305-320, 2007. ,
URL : https://hal.archives-ouvertes.fr/hal-00966498
Exact p-value calculation for heterotypic clusters of regulatory motifs and its application in computational annotation of cis-regulatory modules, Algorithms for Molecular Biology, vol.2, issue.1, p.13, 2007. ,
DOI : 10.1186/1748-7188-2-13
URL : https://hal.archives-ouvertes.fr/hal-00784463
Linguistics of Nucleotide Sequences I: The Significance of Deviations from Mean Statistical Characteristics and Prediction of the Frequencies of Occurrence of Words, Journal of Biomolecular Structure and Dynamics, vol.15, issue.5, pp.1013-1026, 1989. ,
DOI : 10.1080/07391102.1989.10506529
Expected frequencies of DNA patterns using whittle's formula, Journal of Applied Probability, vol.17, issue.04, pp.886-892, 1991. ,
DOI : 10.1007/BF01732761
First and second moment of counts of words in random texts generated by Markov chains, Bioinformatics, vol.8, issue.5, pp.433-441, 1997. ,
DOI : 10.1093/bioinformatics/8.5.433
Finding words with unexpected frequencies in DNA sequences, J R Statist Soc B, vol.11, pp.190-192, 1995. ,
Poissons approximations for runs and patterns of rare events, Adv Appl Prob, vol.23, 1991. ,
Compound Poisson approximations for word patterns under Markovian hypotheses, Journal of Applied Probability, vol.35, issue.04, pp.877-892, 1995. ,
DOI : 10.1214/aop/1176993517
Compound Poisson and Poisson Process Approximations for Occurrences of Multiple Words in Markov Chains, Journal of Computational Biology, vol.5, issue.2, pp.223-254, 1999. ,
DOI : 10.1089/cmb.1998.5.223
Compound Poisson approximation for counts of rare patterns in Markov chains and extreme sojourns in birth-death chains, The Annals of Applied Probability, vol.10, issue.2 ,
DOI : 10.1214/aoap/1019487356
Cumulative distribution function of a geometric Poisson distribution, J Stat Comp and Sim, vol.78, issue.3, pp.211-220, 2008. ,
Assessing the Statistical Significance of Overrepresented Oligonucleotides, Lecture Notes in Computer Science, vol.2149, pp.85-97, 2001. ,
DOI : 10.1007/3-540-44696-6_7
LD-SPatt: Large Deviations Statistics for Patterns on Markov Chains, Journal of Computational Biology, vol.11, issue.6, pp.1023-1033, 2004. ,
DOI : 10.1089/cmb.2004.11.1023
URL : https://hal.archives-ouvertes.fr/hal-00271507
Approximate probabilities for runs and patterns in i.i.d. and Markov-dependent multistate trials, Advances in Applied Probability, vol.11, issue.01, pp.292-308, 2009. ,
DOI : 10.1214/aoms/1177731421
Motif statistics, Theoretical Computer Science, vol.287, issue.2, pp.593-617, 2002. ,
DOI : 10.1016/S0304-3975(01)00264-X
Waiting time and complexity for matching patterns with automata, Information Processing Letters, vol.87, issue.3, pp.119-125, 2003. ,
DOI : 10.1016/S0020-0190(03)00271-0
URL : https://hal.archives-ouvertes.fr/hal-00619588
Mininal Markov chain embeddings of pattern problems. Information Theory and Applications Workshop, pp.251-255, 2007. ,
Pattern Markov Chains: Optimal Markov Chain Embedding Through Deterministic Finite Automata, Journal of Applied Probability, vol.1, issue.01, pp.226-243, 2008. ,
DOI : 10.1214/aoap/1034801248
URL : https://hal.archives-ouvertes.fr/hal-00271298
Faster exact Markovian probability functions for motif occurrences: a DFA-only approach, Bioinformatics, vol.24, issue.24, pp.2839-2848, 2008. ,
DOI : 10.1093/bioinformatics/btn525
On the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source ,
URL : https://hal.archives-ouvertes.fr/hal-00419038
Distribution Theory of Runs: A Markov Chain Approach, Journal of the American Statistical Association, vol.11, issue.427, pp.1050-1058, 1994. ,
DOI : 10.1214/aoms/1177731421
A hidden Markov model derivated structural alphabet for proteins, J Mol Biol, vol.339, pp.561-605, 2004. ,
Identification of non random motifs in loops using a structural alphabet, 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, pp.92-100, 2006. ,
DOI : 10.1109/CIBCB.2006.331017
Introduction to Automata Theory, Languages, and Computation Addison-Wesley, 2006. ,
RSAT: regulatory sequence analysis tools, Nucleic Acids Research, vol.36, issue.Web Server, pp.119-127, 2008. ,
DOI : 10.1093/nar/gkn304
Waiting times for clumps of patterns and for structured motifs in random sequences, Discrete Applied Mathematics, vol.155, issue.6-7, pp.868-880, 2007. ,
DOI : 10.1016/j.dam.2005.07.016
URL : https://hal.archives-ouvertes.fr/hal-01197504