K. Liolios, K. Mavromatis, N. Tavernarakis, and N. Kyrpides, The Genomes On Line Database (GOLD) in 2007: status of genomic and metagenomic projects and their associated metadata, Nucleic Acids Research, vol.36, issue.Database, pp.475-479, 2008.
DOI : 10.1093/nar/gkm884

D. Benson, I. Karsch-mizrachi, D. Lipman, J. Ostell, and E. Sayers, GenBank, Nucleic Acids Research, vol.37, issue.Database, pp.26-31, 2009.
DOI : 10.1093/nar/gkn723

M. Leung, G. Marsh, and T. Speed, Over- and Underrepresentation of Short DNA Words in Herpesvirus Genomes, Journal of Computational Biology, vol.3, issue.3, pp.345-360, 1996.
DOI : 10.1089/cmb.1996.3.345

E. Rocha, A. Viari, and A. Danchin, Oligonucleotide bias in Bacillus subtilis: General trends and taxonomic comparisons, Nucleic Acids Research, vol.26, issue.12, pp.2971-2980, 1998.
DOI : 10.1093/nar/26.12.2971

S. Karlin, C. Burge, and A. Campbell, Statistical analyses of counts and distributions of restriction sites in DNA sequences, Nucleic Acids Research, vol.20, issue.6, pp.1363-1370, 1992.
DOI : 10.1093/nar/20.6.1363

S. Sourice, V. Biaudet, E. Karoui, M. Ehrlich, S. Gruss et al., Identification of the Chi site of Haemophilus influenzae as several sequences related to the Escherichia coli Chi site, Molecular Microbiology, vol.27, issue.5, pp.1021-1029, 1998.
DOI : 10.1006/plas.1994.1011

J. Van-helden, M. Olmo, and J. Perez-ortin, Statistical analysis of yeast genomic downstream sequences reveals putative polyadenylation signals, Nucleic Acids Research, vol.28, issue.4, pp.1000-1010, 2000.
DOI : 10.1093/nar/28.4.1000

N. Hulo, A. Bairoch, V. Bulliard, L. Cerutti, B. Cuche et al., The 20 years of PROSITE, Nucleic Acids Research, vol.36, issue.Database, pp.245-249, 2008.
DOI : 10.1093/nar/gkm977

G. Stormo, DNA binding sites: representation and discovery, Bioinformatics, vol.16, issue.1, pp.16-23, 2000.
DOI : 10.1093/bioinformatics/16.1.16

URL : http://bioinformatics.oxfordjournals.org/cgi/content/short/16/1/16

J. Claverie and S. Audic, The statistical significance of nucleotide positionweight matrix matches, Comput Appl Biosci, vol.12, pp.431-439, 1996.

M. Frith, J. Spouge, and U. Hansen, Statistical significance of clusters of motifs represented by position specific scoring matrices in nucleotide sequences, Nucleic Acids Research, vol.30, issue.14, pp.3214-3224, 2002.
DOI : 10.1093/nar/gkf438

C. Gautier, Compositional bias in DNA, Current Opinion in Genetics & Development, vol.10, issue.6, pp.656-661, 2000.
DOI : 10.1016/S0959-437X(00)00144-1

URL : https://hal.archives-ouvertes.fr/hal-00427084

P. Nicolas, L. Bize, F. Muri, M. Hoebeke, R. F. Ehrlich et al., Mining Bacillus subtilis chromosome heterogeneities using hidden Markov models, Nucleic Acids Research, vol.30, issue.6, pp.1418-1426, 2002.
DOI : 10.1093/nar/30.6.1418

URL : http://doi.org/10.1093/nar/30.6.1418

J. Do and D. Choi, Computational approaches to gene prediction, J Microbiol, vol.44, pp.137-144, 2006.

J. Becq, M. Gutierrez, V. Rosas-magallanes, J. Rauzier, B. Gicquel et al., Contribution of Horizontally Acquired Genomic Islands to the Evolution of the Tubercle Bacilli, Molecular Biology and Evolution, vol.24, issue.8, pp.1861-1871, 2007.
DOI : 10.1093/molbev/msm111

J. Martin, J. Gibrat, and R. F. , Analysis of an optimal hidden Markov model for secondary structure prediction, BMC Structural Biology, vol.6, issue.1, p.25, 2006.
DOI : 10.1186/1472-6807-6-25

G. Churchill, Stochastic models for heterogeneous DNA sequences, Bull Math Biol, vol.268, pp.8-14, 1989.

J. Fickett, D. Torney, and D. Wolf, Base compositional structure of genomes, Genomics, vol.13, issue.4, pp.1056-1064, 1992.
DOI : 10.1016/0888-7543(92)90019-O

J. Aston and D. Martin, Distributions associated with general runs and patterns in hidden Markov models, The Annals of Applied Statistics, vol.1, issue.2, pp.585-61, 2007.
DOI : 10.1214/07-AOAS125SUPP

G. Nuel, Couting patterns in degenerated sequences, Lec. Notes in Bioinfo, vol.5780, pp.222-232, 2009.

G. Reinert and S. Schbath, Probabilistic and Statistical Properties of Words: An Overview, Journal of Computational Biology, vol.7, issue.1-2, pp.1-46, 2000.
DOI : 10.1089/10665270050081360

G. Nuel, Numerical Solutions for Patterns Statistics on Markov Chains, Statistical Applications in Genetics and Molecular Biology, vol.5, issue.1, p.26, 2006.
DOI : 10.2202/1544-6115.1219

URL : https://hal.archives-ouvertes.fr/hal-00271482

J. Fu, Distribution theory of runs and patterns associated with a sequence of multi-state trials, Statistica Sinica, vol.6, issue.4, pp.957-974, 1996.

V. Stefanov and A. Pakes, Explicit distributional results in pattern formation, The Annals of Applied Probability, vol.7, issue.3, pp.666-678, 1997.
DOI : 10.1214/aoap/1034801248

D. Antzoulakos, Waiting times for patterns in a sequence of multistate trials, Journal of Applied Probability, vol.I, issue.02, pp.508-518, 2001.
DOI : 10.1023/A:1003862225719

Y. Chang, Distribution of waiting time until the rth occurrence of a compound pattern, Statistics & Probability Letters, vol.75, issue.1, pp.29-38, 2005.
DOI : 10.1016/j.spl.2005.05.007

V. Boeva, J. Clément, M. Régnier, and M. Vandenbogaert, Assessing the Significance of Sets of Words, Combinatorial Pattern Matching 05, 2005.
DOI : 10.1007/11496656_31

G. Nuel, Effective p-value computations using Finite Markov Chain Imbedding (FMCI): application to local score and to pattern statistics, Algorithms for Molecular Biology, vol.1, issue.1, p.5, 2006.
DOI : 10.1186/1748-7188-1-5

URL : https://hal.archives-ouvertes.fr/hal-00271494

V. Stefanov and W. Szpankowski, Waiting Time Distributions for Pattern Occurrence in a Constrained Sequence, Discrete Mathematics and Theoretical Computer Science, vol.9, pp.305-320, 2007.
URL : https://hal.archives-ouvertes.fr/hal-00966498

V. Boeva, J. Clement, M. Regnier, M. Roytberg, and V. Makeev, Exact p-value calculation for heterotypic clusters of regulatory motifs and its application in computational annotation of cis-regulatory modules, Algorithms for Molecular Biology, vol.2, issue.1, p.13, 2007.
DOI : 10.1186/1748-7188-2-13

URL : https://hal.archives-ouvertes.fr/hal-00784463

P. Pevzner, M. Borodovski, and A. Mironov, Linguistics of Nucleotide Sequences I: The Significance of Deviations from Mean Statistical Characteristics and Prediction of the Frequencies of Occurrence of Words, Journal of Biomolecular Structure and Dynamics, vol.15, issue.5, pp.1013-1026, 1989.
DOI : 10.1080/07391102.1989.10506529

R. Cowan, Expected frequencies of DNA patterns using whittle's formula, Journal of Applied Probability, vol.17, issue.04, pp.886-892, 1991.
DOI : 10.1007/BF01732761

J. Kleffe and M. Borodovski, First and second moment of counts of words in random texts generated by Markov chains, Bioinformatics, vol.8, issue.5, pp.433-441, 1997.
DOI : 10.1093/bioinformatics/8.5.433

B. Prum, R. F. De-turckheim, and E. , Finding words with unexpected frequencies in DNA sequences, J R Statist Soc B, vol.11, pp.190-192, 1995.

A. Godbole, Poissons approximations for runs and patterns of rare events, Adv Appl Prob, vol.23, 1991.

M. Geske, A. Godbole, A. Schaffner, A. Skrolnick, and G. Wallstrom, Compound Poisson approximations for word patterns under Markovian hypotheses, Journal of Applied Probability, vol.35, issue.04, pp.877-892, 1995.
DOI : 10.1214/aop/1176993517

G. Reinert and S. Schbath, Compound Poisson and Poisson Process Approximations for Occurrences of Multiple Words in Markov Chains, Journal of Computational Biology, vol.5, issue.2, pp.223-254, 1999.
DOI : 10.1089/cmb.1998.5.223

T. Erhardsson, Compound Poisson approximation for counts of rare patterns in Markov chains and extreme sojourns in birth-death chains, The Annals of Applied Probability, vol.10, issue.2
DOI : 10.1214/aoap/1019487356

G. Nuelg, Cumulative distribution function of a geometric Poisson distribution, J Stat Comp and Sim, vol.78, issue.3, pp.211-220, 2008.

A. Denise, M. Régnier, and M. Vandenbogaert, Assessing the Statistical Significance of Overrepresented Oligonucleotides, Lecture Notes in Computer Science, vol.2149, pp.85-97, 2001.
DOI : 10.1007/3-540-44696-6_7

G. Nuel, LD-SPatt: Large Deviations Statistics for Patterns on Markov Chains, Journal of Computational Biology, vol.11, issue.6, pp.1023-1033, 2004.
DOI : 10.1089/cmb.2004.11.1023

URL : https://hal.archives-ouvertes.fr/hal-00271507

J. Fu and J. B. , Approximate probabilities for runs and patterns in i.i.d. and Markov-dependent multistate trials, Advances in Applied Probability, vol.11, issue.01, pp.292-308, 2009.
DOI : 10.1214/aoms/1177731421

P. Nicodème, B. Salvy, and P. Flajolet, Motif statistics, Theoretical Computer Science, vol.287, issue.2, pp.593-617, 2002.
DOI : 10.1016/S0304-3975(01)00264-X

M. Crochemore and V. Stefanov, Waiting time and complexity for matching patterns with automata, Information Processing Letters, vol.87, issue.3, pp.119-125, 2003.
DOI : 10.1016/S0020-0190(03)00271-0

URL : https://hal.archives-ouvertes.fr/hal-00619588

M. Lladser, Mininal Markov chain embeddings of pattern problems. Information Theory and Applications Workshop, pp.251-255, 2007.

G. Nuel, Pattern Markov Chains: Optimal Markov Chain Embedding Through Deterministic Finite Automata, Journal of Applied Probability, vol.1, issue.01, pp.226-243, 2008.
DOI : 10.1214/aoap/1034801248

URL : https://hal.archives-ouvertes.fr/hal-00271298

P. Ribeca and E. Raineri, Faster exact Markovian probability functions for motif occurrences: a DFA-only approach, Bioinformatics, vol.24, issue.24, pp.2839-2848, 2008.
DOI : 10.1093/bioinformatics/btn525

G. Nuel, On the first k moments of the random count of a pattern in a multi-states sequence generated by a Markov source
URL : https://hal.archives-ouvertes.fr/hal-00419038

J. Fu and M. Koutras, Distribution Theory of Runs: A Markov Chain Approach, Journal of the American Statistical Association, vol.11, issue.427, pp.1050-1058, 1994.
DOI : 10.1214/aoms/1177731421

A. Camproux, R. Gautier, and T. Tufféry, A hidden Markov model derivated structural alphabet for proteins, J Mol Biol, vol.339, pp.561-605, 2004.

L. Regad, J. Martin, and A. Camproux, Identification of non random motifs in loops using a structural alphabet, 2006 IEEE Symposium on Computational Intelligence and Bioinformatics and Computational Biology, pp.92-100, 2006.
DOI : 10.1109/CIBCB.2006.331017

J. Hopcroft, R. Motwani, and J. Ullman, Introduction to Automata Theory, Languages, and Computation Addison-Wesley, 2006.

M. Thomas-chollier, O. Sand, J. Turatsinze, R. Janky, M. Defrance et al., RSAT: regulatory sequence analysis tools, Nucleic Acids Research, vol.36, issue.Web Server, pp.119-127, 2008.
DOI : 10.1093/nar/gkn304

V. Stefanov, S. Robin, and S. Schbath, Waiting times for clumps of patterns and for structured motifs in random sequences, Discrete Applied Mathematics, vol.155, issue.6-7, pp.868-880, 2007.
DOI : 10.1016/j.dam.2005.07.016

URL : https://hal.archives-ouvertes.fr/hal-01197504