<!DOCTYPE art SYSTEM 'http://www.biomedcentral.com/xml/article.dtd'>
<art>
	<ui>1471-2105-13-254</ui>
	<ji>1471-2105</ji>
	<fm>
		<dochead>Methodology article</dochead>
		<bibl>
			<title>
				<p>VarioML framework for comprehensive variation data representation and exchange</p>
			</title>
			<aug>
				<au id="A1"><snm>Byrne</snm><fnm>Myles</fnm><insr iid="I1"/><email>myles.byrne@fimm.fi</email></au>
				<au id="A2"><snm>Fokkema</snm><mi>FAC</mi><fnm>Ivo</fnm><insr iid="I2"/><email>I.F.A.C.Fokkema@lumc.nl</email></au>
				<au id="A3"><snm>Lancaster</snm><fnm>Owen</fnm><insr iid="I3"/><email>ol8@leicester.ac.uk</email></au>
				<au id="A4"><snm>Adamusiak</snm><fnm>Tomasz</fnm><insr iid="I4"/><email>tadamusiak@mcw.edu</email></au>
				<au id="A5"><snm>Ahonen-Bishopp</snm><fnm>Anni</fnm><insr iid="I5"/><email>anni.ahonen-bishopp@bcplatforms.com</email></au>
				<au id="A6"><snm>Atlan</snm><fnm>David</fnm><insr iid="I6"/><email>atlan_d@web.de</email></au>
				<au id="A7"><snm>B&#233;roud</snm><fnm>Christophe</fnm><insr iid="I7"/><email>christophe.beroud@inserm.fr</email></au>
				<au id="A8"><snm>Cornell</snm><fnm>Michael</fnm><insr iid="I8"/><email>Michael.Cornell@cmft.nhs.uk</email></au>
				<au id="A9"><snm>Dalgleish</snm><fnm>Raymond</fnm><insr iid="I3"/><email>raymond.dalgleish@le.ac.uk</email></au>
				<au id="A10"><snm>Devereau</snm><fnm>Andrew</fnm><insr iid="I8"/><email>Andrew.Devereau@cmmc.nhs.uk</email></au>
				<au id="A11"><snm>Patrinos</snm><mi>P</mi><fnm>George</fnm><insr iid="I9"/><email>gpatrinos@upatras.gr</email></au>
				<au id="A12"><snm>Swertz</snm><mi>A</mi><fnm>Morris</fnm><insr iid="I10"/><email>m.a.swertz@rug.nl</email></au>
				<au id="A13"><snm>Taschner</snm><mi>EM</mi><fnm>Peter</fnm><insr iid="I2"/><email>Taschner@lumc.nl</email></au>
				<au id="A14"><snm>Thorisson</snm><mi>A</mi><fnm>Gudmundur</fnm><insr iid="I3"/><email>gthorisson@gmail.com</email></au>
				<au id="A15"><snm>Vihinen</snm><fnm>Mauno</fnm><insr iid="I11"/><insr iid="I12"/><insr iid="I13"/><email>mauno.vihinen@med.lu.se</email></au>
				<au id="A16"><snm>Brookes</snm><mi>J</mi><fnm>Anthony</fnm><insr iid="I3"/><email>ajb97@leicester.ac.uk</email></au>
				<au id="A17" ca="yes"><snm>Muilu</snm><fnm>Juha</fnm><insr iid="I1"/><email>juha.muilu@helsinki.fi</email></au>
			</aug>
			<insg>
				<ins id="I1"><p>Institute for Molecular Medicine Finland (FIMM), University of Helsinki, Helsinki, Finland</p></ins>
				<ins id="I2"><p>Department of Human Genetics, Leiden University Medical Center, Leiden, Netherlands</p></ins>
				<ins id="I3"><p>Department of Genetics, University of Leicester, Leicester, UK</p></ins>
				<ins id="I4"><p>Medical College of Wisconsin, Milwaukee, WI, USA</p></ins>
				<ins id="I5"><p>Biocomputing Platforms, Ltd, Espoo, Finland</p></ins>
				<ins id="I6"><p>Phenosystems Inc, Brussels, Belgium</p></ins>
				<ins id="I7"><p>INSERM UMR_S910, Facult&#233; de M&#233;decine La Timone, Marseille, France</p></ins>
				<ins id="I8"><p>National Genetics Reference Laboratory, Manchester, UK</p></ins>
				<ins id="I9"><p>Department of Pharmacy, School of Health Sciences, University of Patras, Patras, Greece</p></ins>
				<ins id="I10"><p>Department of Genetics, Genomics Coordination Center University Medical Center Groningen and Groningen Bioinformatics Center, University of Groningen, Groningen, Netherlands</p></ins>
				<ins id="I11"><p>Department of Experimental Medical Science, Lund University, Lund, Sweden</p></ins>
				<ins id="I12"><p>Institute of Biomedical Technology, University of Tampere, Tampere, Finland</p></ins>
				<ins id="I13"><p>Tampere University Hospital, Tampere, Finland</p></ins>
			</insg>
			<source>BMC Bioinformatics</source>
			<section><title><p>Sequence analysis (applications)</p></title></section><issn>1471-2105</issn>
			<pubdate>2012</pubdate>
			<volume>13</volume>
			<issue>1</issue>
			<fpage>254</fpage>
			<url>http://www.biomedcentral.com/1471-2105/13/254</url>
			<xrefbib><pubidlist><pubid idtype="doi">10.1186/1471-2105-13-254</pubid><pubid idtype="pmpid">23031277</pubid></pubidlist></xrefbib>
		</bibl>
		<history><rec><date><day>14</day><month>5</month><year>2012</year></date></rec><acc><date><day>23</day><month>9</month><year>2012</year></date></acc><pub><date><day>3</day><month>10</month><year>2012</year></date></pub></history>
		<cpyrt><year>2012</year><collab>Byrne et al.; licensee BioMed Central Ltd.</collab><note>This is an Open Access article distributed under the terms of the Creative Commons Attribution License (<url>http://creativecommons.org/licenses/by/2.0</url>), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.</note></cpyrt>
		<kwdg>
			<kwd>LSDB</kwd>
			<kwd>Variation database curation</kwd>
			<kwd>Data collection</kwd>
			<kwd>Distribution</kwd>
		</kwdg>
		<abs>
			<sec>
				<st>
					<p>Abstract</p>
				</st>
				<sec>
					<st>
						<p>Background</p>
					</st><p>Sharing of data about variation and the associated phenotypes is a critical need, yet variant information can be arbitrarily complex, making a single standard vocabulary elusive and re-formatting difficult. Complex standards have proven too time-consuming to implement.</p>
				</sec>
				<sec>
					<st>
						<p>Results</p>
					</st><p>The GEN2PHEN project addressed these difficulties by developing a comprehensive data model for capturing biomedical observations, Observ-OM, and building the VarioML format around it. VarioML pairs a simplified open specification for describing variants, with a toolkit for adapting the specification into one's own research workflow. Straightforward variant data can be captured, federated, and exchanged with no overhead; more complex data can be described, without loss of compatibility. The open specification enables push-button submission to gene variant databases (LSDBs) e.g., the Leiden Open Variation Database, using the Cafe Variome data publishing service, while VarioML bidirectionally transforms data between XML and web-application code formats, opening up new possibilities for open source web applications building on shared data. A Java implementation toolkit makes VarioML easily integrated into biomedical applications. VarioML is designed primarily for LSDB data submission and transfer scenarios, but can also be used as a standard variation data format for JSON and XML document databases and user interface components.</p>
				</sec>
				<sec>
					<st>
						<p>Conclusions</p>
					</st><p>VarioML is a set of tools and practices improving the availability, quality, and comprehensibility of human variation information. It enables researchers, diagnostic laboratories, and clinics to share that information with ease, clarity, and without ambiguity.</p>
				</sec>
			</sec>
		</abs>
	</fm>
	<bdy>
		<sec>
			<st>
				<p>Background</p>
			</st><p>The study of disease-causing and benign variations in the human genome is progressing rapidly. Whole genome and exome sequencing continues to expand, and improved tools for variant calling are becoming available <abbrgrp>
					<abbr bid="B1">1</abbr>
					<abbr bid="B2">2</abbr>
					<abbr bid="B3">3</abbr>
				</abbrgrp>. Cost-effective sequencing, paired with variant discovery, promises to make early detection and intervention accessible for the millions of individuals with genetic diseases.</p><p>However, realizing this potential is blocked by the problem of integrating and coordinating the steps towards &#8220;a pipeline leading from discovery to delivery&#8221; <abbrgrp>
					<abbr bid="B4">4</abbr>
				</abbrgrp>. The GEN2PHEN project was initiated in 2008 to unify human and model organism genetic variation databases, and remove the obstacles to translation of variant data from laboratory to clinic to public <abbrgrp>
					<abbr bid="B5">5</abbr>
				</abbrgrp>. This has involved attempting to unify the divergent data representations of various database communities.</p><p>The focus of this effort is the locus-specific database (LSDB) <abbrgrp>
					<abbr bid="B6">6</abbr>
				</abbrgrp>. LSDBs describe the variants discovered on a single gene, a gene family or a group of genes involved in the similar diseases or traits. As of this writing, 4,111 LSDBs can be easily searched online <abbrgrp>
					<abbr bid="B7">7</abbr>
				</abbrgrp>. LSDBs are curated by experts on their respective loci, and as such are typically the best resources of gene variant information available <abbrgrp>
					<abbr bid="B8">8</abbr>
				</abbrgrp>. A comprehensive 2010 analysis of 1,188 LSDBs provides a useful overview of the domain, providing encouraging results, such as finding only 5.4% to be outdated <abbrgrp>
					<abbr bid="B9">9</abbr>
				</abbrgrp>. However, the study also found that only 8% provided detailed disease and phenotypic descriptions. LSDBs also vary widely in format, diverging to satisfy the immediate requirements of numerous use cases, making comprehensive, global analysis of data concerning a given variant difficult, if not impossible <abbrgrp>
					<abbr bid="B10">10</abbr>
				</abbrgrp>. LSDBs are also typically incomplete, either from a lack of capacity on the part of the data submitters or curators to include all pertinent data, or from the original data lacking key elements altogether <abbrgrp>
					<abbr bid="B9">9</abbr>
					<abbr bid="B11">11</abbr>
				</abbrgrp>. It is well recognised that the data will often be incomplete if you ask too much of the submitters <abbrgrp>
					<abbr bid="B12">12</abbr>
				</abbrgrp>. Into this situation, next-generation sequencing pipelines are rapidly increasing the scale and complexity of data to be managed <abbrgrp>
					<abbr bid="B13">13</abbr>
				</abbrgrp>.</p>
		</sec>
		<sec>
			<st>
				<p>Methods</p>
			</st>
			<sec>
				<st>
					<p>Undesigning a standard</p>
				</st><p>All terms and abbreviations used are explained in the glossary (Table&#8201;<tblr tid="T1">1</tblr>).</p>
				<table id="T1">
					<title>
						<p>Table 1</p>
					</title>
					<caption>
						<p>
							<b>Glossary</b>
						</p>
					</caption>
					<tgroup align="left" cols="3">
						<colspec align="left" colname="c1" colnum="1" colwidth="1*"/>
						<colspec align="left" colname="c2" colnum="2" colwidth="1*"/>
						<colspec align="left" colname="c3" colnum="3" colwidth="1*"/>
						<thead valign="top">
							<row rowsep="1">
								<entry colname="c1">
									<p>
										<b>Name</b>
									</p>
								</entry>
								<entry colname="c2">
									<p>
										<b>Definition</b>
									</p>
								</entry>
								<entry colname="c3">
									<p>
										<b>URL</b>
									</p>
								</entry>
							</row>
						</thead>
						<tbody valign="top">
							<row>
								<entry colname="c1">
									<p>API</p>
								</entry>
								<entry colname="c2">
									<p>Application programming interface</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>-</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>BSVM</p>
								</entry>
								<entry colname="c2">
									<p>Pioneering early LSDB integration standard.</p>
								</entry>
								<entry colname="c3">
									<p>See <it>Tyrelle G, King GC, 2003</it>
										<abbrgrp>
											<abbr bid="B15">15</abbr>
										</abbrgrp>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Caf&#233; Variome</p>
								</entry>
								<entry colname="c2">
									<p>Variation data publishing service</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://cafevariome.org/</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Extended Backus-Naur Form</p>
								</entry>
								<entry colname="c2">
									<p>A notation that expresses the grammar of a computer language.</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://en.wikipedia.org/wiki/Backus-Naur_Form</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>GEN2PHEN</p>
								</entry>
								<entry colname="c2">
									<p>EU project integrating genotype and phenotype data.</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://www.gen2phen.org</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>GSVML</p>
								</entry>
								<entry colname="c2">
									<p>Genomic Sequence Variation Markup Language</p>
								</entry>
								<entry colname="c3">
									<p>See <it>Nakaya J, Kimura M, et al. 2010</it>
										<abbrgrp>
											<abbr bid="B16">16</abbr>
										</abbrgrp>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>HPO</p>
								</entry>
								<entry colname="c2">
									<p>Human Phenotype Ontology</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://www.human-phenotype-ontology.org/</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Jackson</p>
								</entry>
								<entry colname="c2">
									<p>Java JSON library</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://wiki.fasterxml.com/JacksonHome</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>JAVA</p>
								</entry>
								<entry colname="c2">
									<p>General programming language</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://www.java.com</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>JAXB</p>
								</entry>
								<entry colname="c2">
									<p>Java JSON library</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://jaxb.java.net/</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>JSON</p>
								</entry>
								<entry colname="c2">
									<p>Javascript Object Notation</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://en.wikipedia.org/wiki/JSON</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>LSDB</p>
								</entry>
								<entry colname="c2">
									<p>Gene variant database, Locus Specific Database</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>-</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>MAGE-TAB</p>
								</entry>
								<entry colname="c2">
									<p>A tab-delimited format for representing functional genomics data.</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://www.mged.org/mage-tab</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>MIRIAM</p>
								</entry>
								<entry colname="c2">
									<p>The MIRIAM Registry provides a set of online services for the generation of unique and perennial identifiers, in the form of URIs.</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://www.ebi.ac.uk/miriam/main/</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>MOLGENIS</p>
								</entry>
								<entry colname="c2">
									<p>Software generating infrastructure (databases, APIs, GUIs) for life science projects.</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://www.molgenis.org</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Object Model</p>
								</entry>
								<entry colname="c2">
									<p>An abstract representation of a domain&#8217;s concepts, data, and relationships between these, used to design or generate software.</p>
								</entry>
								<entry colname="c3">
									<p>-</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Observ-OM</p>
								</entry>
								<entry colname="c2">
									<p>A simple system to format and exchange observation data.</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://www.molgenis.org/wiki/ObservStart</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>ORCID</p>
								</entry>
								<entry colname="c2">
									<p>Open Researcher and Contributor Identification</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://orcid.org/</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>PML/DVAR</p>
								</entry>
								<entry colname="c2">
									<p>An implementation of the PaGE-OM object model.</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://www.openpml.org/</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>RelaxNG</p>
								</entry>
								<entry colname="c2">
									<p>Schema definition language for use with XML.</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://relaxng.org/</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>RDF</p>
								</entry>
								<entry colname="c2">
									<p>Resource Description Framework</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://www.w3.org/RDF/</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>Schematron</p>
								</entry>
								<entry colname="c2">
									<p>High-level schema definition language for use with XML.</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://www.ascc.net/xml/resource/schematron/Schematron2000.html</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>SKOS</p>
								</entry>
								<entry colname="c2">
									<p>Simple Knowledge Organization System</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://www.w3.org/2004/02/skos/</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>SO</p>
								</entry>
								<entry colname="c2">
									<p>Sequence Ontology</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://www.sequenceontology.org/</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>UML</p>
								</entry>
								<entry colname="c2">
									<p>Unified Modeling Language</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://en.wikipedia.org/wiki/Unified_Modeling_Language</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>VariO</p>
								</entry>
								<entry colname="c2">
									<p>Variation Ontology</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://variationontology.org/</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>VCF</p>
								</entry>
								<entry colname="c2">
									<p>Variant Call Format</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://vcftools.sourceforge.net/specs.html</url>
										</it>
									</p>
								</entry>
							</row>
							<row>
								<entry colname="c1">
									<p>XGAP</p>
								</entry>
								<entry colname="c2">
									<p>XGAP is an open and flexible object model for xQTL, GWL, GWA and mutagenesis data</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://www.xgap.org</url>
										</it>
									</p>
								</entry>
							</row>
							<row rowsep="1">
								<entry colname="c1">
									<p>XML</p>
								</entry>
								<entry colname="c2">
									<p>eXtensible Markup Language</p>
								</entry>
								<entry colname="c3">
									<p>
										<it>
											<url>http://www.w3.org/XML/</url>
										</it>
									</p>
								</entry>
							</row>
						</tbody>
					</tgroup>
				</table><p>We began by incorporating previous work on data requirements <abbrgrp>
						<abbr bid="B6">6</abbr>
						<abbr bid="B8">8</abbr>
						<abbr bid="B14">14</abbr>
					</abbrgrp> and data modelling activities, such as PaGE-OM <abbrgrp>
						<abbr bid="B15">15</abbr>
					</abbrgrp> and its generalization Observ-OM <abbrgrp>
						<abbr bid="B16">16</abbr>
					</abbrgrp>, in the design of VarioML LSDBs specification.</p><p>VarioML was developed by an international collaboration of variation experts, over a series of workshops organised by the GEN2PHEN project. The design has closely followed the work of Tyrelle and King <abbrgrp>
						<abbr bid="B17">17</abbr>
					</abbrgrp> on the now defunct BSVM standard, where they proposed using semantically well-defined XML and RDF elements for LSDB data integration. VarioML is designed to serve the greater part of LSDB use cases directly, complementing formats such as GSVML <abbrgrp>
						<abbr bid="B18">18</abbr>
					</abbrgrp> and PML/DVAR <abbrgrp>
						<abbr bid="B19">19</abbr>
					</abbrgrp>, the latter being an implementation of the PaGE-OM object model <abbrgrp>
						<abbr bid="B15">15</abbr>
					</abbrgrp>. The format is kept consistent with PaGE-OM and Observ-OM by rooting XML element definitions in the same object model. By providing a structured data framework designed close to application domains, VarioML complements tabular data formats such as VCF <abbrgrp>
						<abbr bid="B20">20</abbr>
						<abbr bid="B21">21</abbr>
					</abbrgrp> and MAGE-TAB <abbrgrp>
						<abbr bid="B22">22</abbr>
					</abbrgrp>, which are designed for high-throughput and manual/spreadsheet-based data handling needs.</p><p>The collaboration&#8217;s goal was to readdress these requirements by providing simple data structure components for developing use case specific solutions, defined independently using high-level schema definition languages such as Schematron <abbrgrp>
						<abbr bid="B23">23</abbr>
					</abbrgrp>. While it may sound complex, this approach provides the necessary flexibility to serve simple specifications for straightforward use cases, while simultaneously enabling development of more complex specifications, all the while maintaining a common foundation of terms, logic, and tooling that integrates both.</p>
			</sec>
			<sec>
				<st>
					<p>Ontologies: How much meaning is enough?</p>
				</st><p>Reducing the inherent complexity of annotation formats began by rooting the semantics of the VarioML standard as deeply as possible into base ontologies that underlie science and logic in general. This highlighted the need for a new harmonized model for describing scientific observations in general, providing a common language usable across all domains. For this purpose, a new object model, Observ-OM <abbrgrp>
						<abbr bid="B16">16</abbr>
					</abbrgrp>, was developed.</p><p>In Observ-OM, four basic concept classes represent all elements of any kind of observable data: Targets, Features, Protocols, and Observations. The value this model represents for the variation pipeline is hard to overstate, as it represents what is probably the maximum possible simplification of elements common to all usable scientific observations regarding variants and associated phenotypes.</p><p>Grounded in Observ-OM, VarioML had the task of adding only what is absolutely necessary to provide an intuitive and &#8216;decision-free&#8217; path for researchers and clinicians: the shortest possible path from variation data in all its current forms, to a unified representation, distributed globally. At the same time, this shortest path had to be extensible to describe non-minimalistic data as needed (Figure&#8201;<figr fid="F1">1</figr>).</p>
				<fig id="F1"><title><p>Figure 1</p></title><caption><p>Simplified conceptual UML object model used in VarioML.</p></caption><text>
   <p><b><it>Simplified conceptual UML object model used in VarioML.</it></b> The VarioML object model is derived from Observ-OM (<url>http://www.observ-om.org/wiki/ObservStart</url>), with some modifications to simplify implementation. E.g., <it>Observable Feature</it> (such as <it>phenotype</it> or <it>mutation name</it>) and <it>Observed Value</it> (existence of phenotype or variation) are denormalized into a single XML element. This avoids unnecessary nesting of observation elements which do often have one-to-one relationship, in the XML implementation. Entities are composed into <b>Observations,</b> having properties such as <it>evidence codes</it>, <it>observation protocols</it> and <it>observation time</it>. Associations between elements are described as single lines, where an asterisk means a <it>0-to-many multiplicity</it> relationship; i.e. <it>Observation</it> can have one or many evidence codes. All entities also inherit from Annotatable properties which are needed for database cross references and comments. In this case, the open arrow symbol means inheritance or an <it>is-a</it> relationship.</p>
</text><graphic file="1471-2105-13-254-1"/></fig><p>To achieve this, LOVD-based LSDBs <abbrgrp>
						<abbr bid="B8">8</abbr>
						<abbr bid="B24">24</abbr>
					</abbrgrp> were used as a content model, in addition to modelling done in previous work <abbrgrp>
						<abbr bid="B15">15</abbr>
						<abbr bid="B25">25</abbr>
					</abbrgrp>, and in workshops organized by the GEN2PHEN consortium. This modelling meets the requirements specified previously <abbrgrp>
						<abbr bid="B26">26</abbr>
						<abbr bid="B27">27</abbr>
					</abbrgrp>. The specification aims to be minimalistic, but has room for additions where the need arises. Despite this simplification, the underlying base schema can be too verbose for many use cases. Therefore it is important that the schema can be &#8220;narrowed&#8221;, using separate validation tools for specific cases. This has been done for the Cafe Variome pipeline <abbrgrp>
						<abbr bid="B28">28</abbr>
					</abbrgrp>, where separate Schematron <abbrgrp>
						<abbr bid="B23">23</abbr>
					</abbrgrp> rules are used for defining the content. Schematron was chosen because it allows making complex assertions about the content of XML documents, more complex than are possible using the RelaxNG schema language <abbrgrp>
						<abbr bid="B29">29</abbr>
					</abbrgrp> used for defining the base VarioML schema, which has better tooling support for defining initial schema elements.</p><p>Existing ontologies, such as the Human Phenotype Ontology (HPO) <abbrgrp>
						<abbr bid="B30">30</abbr>
					</abbrgrp>, Sequence Ontology (SO) <abbrgrp>
						<abbr bid="B31">31</abbr>
					</abbrgrp>, and Variation Ontology (VariO) <abbrgrp>
						<abbr bid="B32">32</abbr>
					</abbrgrp>, can be used with VarioML. Separate SKOS (Simple Knowledge Organization System) <abbrgrp>
						<abbr bid="B33">33</abbr>
					</abbrgrp> vocabularies have been provided for elements which do not have online definitions. The semantics of VarioML elements are well defined via the Observ-OM model; content can be relatively easily transformed to RDF representations for linked data approaches <abbrgrp>
						<abbr bid="B34">34</abbr>
					</abbrgrp>. An example XSLT application is provided for converting Cafe Variome XML content to an RDF schema, derived using the Pharmacogenetics ontology <abbrgrp>
						<abbr bid="B35">35</abbr>
					</abbrgrp>.</p><p>A leading example of a submission tool that fulfills this standard is the Cafe Variome platform <abbrgrp>
						<abbr bid="B28">28</abbr>
					</abbrgrp> for announcing and advertising disease-related variations identified by diagnostic laboratories, allowing them to be shared by diverse third parties. This platform, when integrated with diagnostic software, allows push-button submission of data from tables to central databases. For these submissions, the single variant is the agreed-upon central organizing concept <abbrgrp>
						<abbr bid="B26">26</abbr>
					</abbrgrp>. Variants should be submitted in VarioML format, as seen in the example in Figure&#8201;<figr fid="F2">2</figr>.</p>
				<fig id="F2"><title><p>Figure 2</p></title><caption><p>A Cafe Variome submission of a COL1A1 variant.</p></caption><text>
   <p><b><it>A Cafe Variome submission of a COL1A1 variant.</it></b> The different VarioML elements of the data submitted are flanked by the corresponding XML tags and explained in the text.</p>
</text><graphic file="1471-2105-13-254-2"/></fig><p>To date, this functionality has been built into the Leiden Open Variation Database <abbrgrp>
						<abbr bid="B24">24</abbr>
					</abbrgrp>; GenSearch <abbrgrp>
						<abbr bid="B36">36</abbr>
					</abbrgrp>, a tool to detect and interpret variants in DNA sequences obtained by capillary sequences; BC|SNPMax <abbrgrp>
						<abbr bid="B37">37</abbr>
					</abbrgrp>, a data management tool for genomic research; and is currently being testing with Alamut <abbrgrp>
						<abbr bid="B38">38</abbr>
					</abbrgrp>.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Results</p>
			</st>
			<sec>
				<st>
					<p>Composing the format</p>
				</st><p>VarioML is composed from an underlying set of XML elements, which reuse the same structural components. Most of the XML elements like <it>phenotype, consequence and evidence_code</it> are so-called ontology terms, which have necessary properties for making cross references to existing ontologies in a flexible way:</p><p indent="1">&lt;phenotype term = "Autoimmune polyglandular syndrome type 1" accession = "240300" source = "omim"/&gt;</p><p>All ontology terms can be annotated with comments and database cross-references (see Figure&#8201;<figr fid="F1">1</figr>). Elements can be extended by adding new schema elements: <it>Phenotype</it> is an example of an observation element which reuse properties from the <it>ontology term</it> element. Observation elements have additional information related to the observation, such as <it>date</it> and <it>evidence</it> codes For example, an observed &#8220;<it>consequence of mutation</it>&#8221; has the evidence code &#8220;<it>curator inference,</it>&#8221; as defined in the Evidence Code ontology <abbrgrp>
						<abbr bid="B39">39</abbr>
					</abbrgrp>:</p><p indent="1">&lt;consequence term = "translational frameshift" accession = "SO:0001210" source = "obo.so"&gt;</p><p indent="1">&lt;evidence_code term = "curator inference" accession="ECO:0000205" source="obo.eco"/&gt;</p><p indent="1">&lt;/consequence&gt;</p><p>
					<it>Pathogenicity</it>, on the other hand, is a special case of consequence element, having an optional <it>scope</it> attribute for indicating if the variant has been observed in an individual, family, or population. The <it>pathogenicity</it> element also has an optional <it>phenotype</it> element for specifying causal relationships explicitly, where needed:</p><p indent="1">&lt;pathogenicity term = "probably pathogenic"</p><p indent="1">uri = "<url>http://purl.org/varioml/pathogenicity/skos/1.0#p_0003</url>"</p><p indent="1">scope = "family" &gt;</p><p indent="1">&lt;phenotype term = "Osteogenesis Imperfecta, Type I"</p><p indent="1">accession="166200" source="omim"/&gt;</p><p indent="1">&lt;evidence_code term = "curator inference" accession="ECO:0000205" source="obo.eco"/&gt;</p><p indent="1">&lt;/pathogenicity&gt;</p><p>VarioML is currently used as the XML data submission and release format for the Cafe Variome announcement service. An example of this implementation is given in Figure&#8201;<figr fid="F2">2</figr>.</p><p>In the next section, we provide a brief overview of the elements seen in Figure&#8201;<figr fid="F2">2</figr>, an example of the straightforward variant descriptions that make up the bulk of LSDB submissions.</p>
			</sec>
			<sec>
				<st>
					<p>Modular elements for variant annotation</p>
				</st><p>To match raw variation data to the standard descriptions specified in <it>&#8216;Guidelines for establishing locus specific databases&#8217;</it>
					<abbrgrp>
						<abbr bid="B26">26</abbr>
					</abbrgrp>, users simply match their data to VarioML elements. For large data sets, VarioML&#8217;s validation tools can be used to check converted data. Following is a partial list of variant data elements required and validated by VarioML, some of which are used in Figure&#8201;<figr fid="F2">2</figr>.</p>
			</sec>
			<sec>
				<st>
					<p>Source</p>
				</st><p>The <it>source</it> element stores information on the submitting sources, with attributes for submitting <it>instance</it> or <it>database</it>, <it>contact</it> details, and <it>acknowledgements</it>.</p><p>VarioML requires submitter identification using the db_xref element, and recommends that an ORCID ID <abbrgrp>
						<abbr bid="B40">40</abbr>
					</abbrgrp> be obtained for this purpose. ORCID (Open Researcher and Contributor Identification) is a platform building towards automation of authorization and access infrastructure for institutions and federations <abbrgrp>
						<abbr bid="B41">41</abbr>
					</abbrgrp>. This combination of standardization of data and researcher ID are necessary components of a translational information system, in which data discovery, access, and incentives to sharing must be closely integrated, constituting a sustainable ecosystem <abbrgrp>
						<abbr bid="B42">42</abbr>
					</abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Variant</p>
				</st><p>The <it>variant</it> element can be used in a straightforward manner, bounding information reported on a variant described using the HGVS naming scheme <abbrgrp>
						<abbr bid="B43">43</abbr>
					</abbrgrp>, which has recently been formally described as a scientific sub-language in Extended Backus-Naur Form <abbrgrp>
						<abbr bid="B44">44</abbr>
					</abbrgrp>.</p><p>Important<it>Variant</it> also provides recursive sub-elements, for cases where the reporting variant is composed of other variants located on the same or a sister chromosome.</p><p>
					<it>Variant</it> has an optional <it>observation target</it> attribute. For simplicity, the <it>Panel</it> element is used as a generic target for <it>variant</it>: <it>panel</it> can be used to describe any number of individuals, with or without group-specific identifiers, such as <it>family</it> or <it>population</it>.</p>
			</sec>
			<sec>
				<st>
					<p>Gene</p>
				</st><p>The <it>Gene</it> is given as a database cross-reference, where <it>source</it> indicates the database or system (e.g. HUGO), and 'accession' is the gene name (e.g., AGA). HGNC symbols or IDs <abbrgrp>
						<abbr bid="B45">45</abbr>
					</abbrgrp> must be used for the primary name of a gene. Gene is a <it>database cross-reference</it> type, which is conceptually similar to <it>ontology term</it>.</p><p>When specifying sources, the MIRIAM namespace identifiers should be used <abbrgrp>
						<abbr bid="B46">46</abbr>
					</abbrgrp>. For example, the identifier for HGNC gene symbols is <it>hgnc.symbol</it>:</p><p indent="1">&lt;gene source = "hgnc.symbol" accession = "COL1A1"/&gt;</p><p>Use of database identifiers specified in the MIRIAM registry insures consistent naming of sources <abbrgrp>
						<abbr bid="B47">47</abbr>
					</abbrgrp>. Examples of MIRIAM in use are given in Figures&#8201;<figr fid="F2">2</figr>
					<figr fid="F3">3</figr>.</p>
				<fig id="F3"><title><p>Figure 3</p></title><caption><p>VarioML elements extending the core schema.</p></caption><text>
   <p><b><it>VarioML elements extending the core schema</it></b><b>. </b>The VarioML elements describing the effect of an AIRE variant at the transcript and protein levels are flanked by the corresponding XML tags and explained in the text.</p>
</text><graphic file="1471-2105-13-254-3"/></fig>
			</sec>
			<sec>
				<st>
					<p>Reference sequence</p>
				</st><p>Variants must always be submitted in the context of a <it>reference sequence</it>. LRGs are the preferred form for reference sequences <abbrgrp>
						<abbr bid="B48">48</abbr>
					</abbrgrp>. LRG sequences &#8216;provide a stable genomic DNA framework for reporting variations with a permanent ID and core content that never changes&#8217; <abbrgrp>
						<abbr bid="B49">49</abbr>
					</abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>HGVS name</p>
				</st><p>The <it>name</it> element gives the variant name. While <it>name</it> has an optional attribute <it>scheme</it> for indicating the naming scheme used, the primary name of a variant must be given using the HGVS naming scheme <abbrgrp>
						<abbr bid="B43">43</abbr>
						<abbr bid="B44">44</abbr>
					</abbrgrp>. To allow machine-processing, the &#8220;&gt;&#8221; character in an HGVS name must be encoded to &#8220;&amp;gt;&#8221;, as defined in the XML specification <abbrgrp>
						<abbr bid="B50">50</abbr>
					</abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Pathogenicity</p>
				</st><p>
					<it>Pathogenicity</it> has values such as: <it>No known pathogenicity, Probably not pathogenic, Unknown, Probably pathogenic,</it> and <it>Pathogenic.</it> These values meet the guidelines for reporting unclassified variants established in 2007 <abbrgrp>
						<abbr bid="B51">51</abbr>
					</abbrgrp>. These and alternative terms are provided in a separate SKOS vocabulary <abbrgrp>
						<abbr bid="B52">52</abbr>
					</abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Genetic origin</p>
				</st><p>The <it>genetic origin</it> of a variation can be given in its own observation element. The vocabulary defined in the VarioML SKOS vocabulary can be used <abbrgrp>
						<abbr bid="B53">53</abbr>
					</abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Location</p>
				</st><p>A variant can have multiple locations defined on different reference sequences. The <it>location</it> element provides precise standardized positioning of variants, giving possibility to integrate data easily with DAS services <abbrgrp>
						<abbr bid="B54">54</abbr>
					</abbrgrp> and genome browsers. In Figure&#8201;<figr fid="F2">2</figr>, the variant position is given using chromosomal coordinates.</p>
			</sec>
			<sec>
				<st>
					<p>Sharing policy</p>
				</st><p>The inclusion of the <it>sharing policy</it> element in VarioML allows setting fine-grained access control policies per individual variant. Possible values are <it>closedAccess, embargoedAccess, restrictedAccess</it> and <it>openAccess,</it> which are defined in the OpenAIRE guidelines <abbrgrp>
						<abbr bid="B55">55</abbr>
					</abbrgrp>. <it>Embargo end date</it> tells when data can be publicly released. <it>Use permission</it> is an ontology term which can be used for citing licensing terms. The vocabulary describing these policies is taken from the OpenAIRE specification <abbrgrp>
						<abbr bid="B56">56</abbr>
					</abbrgrp>.</p>
			</sec>
			<sec>
				<st>
					<p>Additional XML elements</p>
				</st><p>Additional elements, shown in Figure&#8201;<figr fid="F3">3</figr>, demonstrate a first tier of extensions of the core specification. The following elements are not yet implemented in applications, and may be redefined and modified later according to community needs.</p>
			</sec>
			<sec>
				<st>
					<p>Effects on RNA and AA sequences</p>
				</st><p>Effects on gene products can be given under the <it>seq_changes</it> observation element, which can store information on RNA and AA sequences in a recursive manner, using nested <it>seq_changes</it> elements. For example, a top-level <it>variant</it> element specifies a unique position on the genome, which can contain RNA level variants in a <it>seq_changes</it> sub-element, which in turn can contain corresponding AA changes in a further nested <it>seq_changes</it> sub-element. <it>Consequence</it> annotations can be assigned on these different levels, representing expert agreement about which level is causative of a given consequence.</p><p>The <it>Variant</it> element also has places for <it>aliases</it> and <it>haplotype sets</it>. <it>Aliases</it> are for legacy annotations and variations which have been named using different reference sequences. <it>Haplotypes</it> are sets of variants which are in <it>cis</it> relative to one another. These elements can be used if the main variant represents a larger sequence region containing multiple variations. Implementation of these extensions will be finalised as more experience is gained in handling such variations.</p>
			</sec>
			<sec>
				<st>
					<p>Frequency</p>
				</st><p>Variants can have one or more frequency elements, each of which can use one of three formats: decimal number, number of cases, or categorized value. The decimal number type gives frequency as a floating point value; number of cases type gives frequency as a count; and the categorized value gives frequency as an ontology term, for categorized observations such as &#8220;exists&#8221; or &#8220;less than 100&#8221;. <it>Population, evidence ontology term, evidence code, protocol id</it> and <it>comment</it> attributes provide context for the frequency value.</p>
			</sec>
			<sec>
				<st>
					<p>Implementation</p>
				</st><p>XML remains the reference platform of choice, providing a mature specification, and advanced tools such as schema definition languages <abbrgrp>
						<abbr bid="B57">57</abbr>
					</abbrgrp>. Our use of extensible XML elements encourages implementers to collaborate closely, since extending the format requires formulating a common development strategy. However, we realized that, as the XML schema is extended, a lacuna could easily open between the data model and its implementations. Adapting changing XML schema into applications has tended to be laborious. We reasoned that absolving application developers of the need to reinvent the wheel of data translation across formats was fundamental to easing the effort and cost of adopting the VarioML standard. We further reasoned that, in biomedicine as well as in other scientific domains, the era of big data likely makes it no longer feasible to develop formats separately from the tooling that transports them bidirectionally across the required data languages (recalling the computer science maxim, &#8216;Data = code&#8217; <abbrgrp>
						<abbr bid="B58">58</abbr>
					</abbrgrp>
					<abbrgrp>
						<abbr bid="B59">59</abbr>
					</abbrgrp>).</p><p>In practice, this meant that VarioML schema elements have to work transparently as XML, JSON, and possibly in future as RDF, <it>without incurring a cost of translation to the implementer or user</it>. Providing support for bidirectional translation to JSON was a clear way in which we could enable schema extensions to much more quickly and inexpensively be reflected in applications. To this end, VarioML comes with Java and JSON APIs (application programming interfaces), which developers can plug into their applications to handle conversion and publication.</p><p>JSON is the common data serialization format now recognized as the <it>lingua franca</it> for data exchange over the web (while we find no academic reference to this fact, it is a commonplace in the web application domain. e.g. <abbrgrp>
						<abbr bid="B60">60</abbr>
					</abbrgrp>), proven to be faster and consume fewer resources than XML <abbrgrp>
						<abbr bid="B61">61</abbr>
					</abbrgrp>. Serializing to JSON facilitates applying programming techniques to data, to create interactive content, user interface components, etc. in formats native to the web, simplifying the provision of data access <abbrgrp>
						<abbr bid="B62">62</abbr>
					</abbrgrp>. VarioML provides a JSON implementation, currently defined using JAXB <abbrgrp>
						<abbr bid="B63">63</abbr>
					</abbrgrp> and Jackson <abbrgrp>
						<abbr bid="B64">64</abbr>
					</abbrgrp> annotations. This JSON implementation is made available as a VarioML Java library <abbrgrp>
						<abbr bid="B65">65</abbr>
					</abbrgrp>, which can be used to read and write XML and JSON versions of the format. An API is auto-generated from XML instances, providing Java object representations for all VarioML objects. This API will be kept synchronized with the format, and can be used as a helper tool in Java applications. A JSON example of the source element is shown in Figure&#8201;<figr fid="F4">4</figr>.</p>
				<fig id="F4"><title><p>Figure 4</p></title><caption><p>VarioML in JSON format.</p></caption><text>
   <p><b><it>VarioML in JSON format.</it></b> XML elements are mapped to JSON objects using JAXB and Jackson annotations via VarioML's Java API. Repeating XML elements become pluralised into JSON arrays. Because JSON does not have an equivalent to XML attributes, XML attribute names can clash with inner element names. In these cases, the JSON name for the XML attribute is changed. Otherwise, mapping VarioML from XML to JSON is a direct transformation of the data structure.</p>
</text><graphic file="1471-2105-13-254-4"/></fig><p>In addition to JSON support, the VarioML Java API supports EXI, a binary compression of XML. EXI support leverages the VarioML XML schema, reducing file sizes and the time required for data processing operations by factors of three to ten <abbrgrp>
						<abbr bid="B66">66</abbr>
					</abbrgrp>. While VarioML is primarily focused on curated variant entries produced at the end of HTP pipelines, the use of EXI makes it feasible to use VarioML for earlier stages of production pipelines.</p><p>The JSON tooling provided with VarioML makes it possible for implementers to develop dynamic user interfaces with substantially less effort and cost <abbrgrp>
						<abbr bid="B67">67</abbr>
					</abbrgrp>, expanding on the possibilities demonstrated by the <it>&#8216;Web Analysis of the Variome&#8217;</it> project <abbrgrp>
						<abbr bid="B68">68</abbr>
						<abbr bid="B69">69</abbr>
					</abbrgrp>. A logical next step to contribute to the end-to-end variome pipeline would be to build a variant annotator widget, usable with different database implementations.</p>
			</sec>
		</sec>
		<sec>
			<st>
				<p>Discussion</p>
			</st><p>VarioML has been designed to immediately serve data exchange needs for LSDB&#8217;s, focusing on curated variant entries produced at the end of data production pipelines. However, as an end-to-end variation pipeline comes together, for a common specification to be truly useful, it must be extensible beyond the immediate LSDB use-case. Next-generation sequencing pipelines make possible exon-capture scenarios in which tens to hundreds of patients are sequenced in one or more genes, presenting new challenges in variant calling, annotation, and data sharing <abbrgrp>
					<abbr bid="B70">70</abbr>
				</abbrgrp>. To meet the holistic data integration challenge and realize the grand variation pipeline, we need to harmonize the data models, data standards, and content specifications in use at each step, to encompass the descriptive needs of all LSDBs, standardizing their quality and accuracy, and enabling more comprehensive and high quality data curation <abbrgrp>
					<abbr bid="B26">26</abbr>
				</abbrgrp>. A number of projects have previously attempted to fill these requirements and provide a single multipurpose implementation format for LSDBs, yet have come up against difficulties at multiple levels of design and implementation <abbrgrp>
					<abbr bid="B11">11</abbr>
				</abbrgrp>. Variation data can be arbitrarily complex, making a single standard specification elusive. LSDB use cases vary a great deal in the depth of detail and structure needed for data capture. Complex standards have proven too time-consuming to implement. Solutions designed in one format cannot be readily transferred to another. Further hampering progress towards a common specification are the multiple strong motivations which labs have to keep variant data private <abbrgrp>
					<abbr bid="B71">71</abbr>
				</abbrgrp>.</p><p>As work on a unified variome progresses, genetics research faces a paradox: another attempt at a variation standard will not be enough to surmount these obstacles. No matter how comprehensive our current efforts, new standards will inevitably follow. Our understanding of genomic variation is rapidly evolving; multiple and often conflicting forms of variant annotation seem to be required to serve differing use cases, implementations, and viewpoints. Attempts to comprehensively integrate all such descriptions in a single standard can, at this point, be expected to produce either unmanageable complexity, or inaccurate oversimplification. Furthermore, looking ahead, we can be fairly certain that new discoveries and technologies will arise that cannot be presently designed for.</p><p>In designing VarioML, we therefore turned away from seeking a top-down monolithic solution, choosing instead to make a lightweight framework for composing interoperable, use-case specific &#8216;micro-standards&#8217; around the generalized concepts of <it>observation targets, ontology terms,</it> and <it>observations,</it> adapted from the Observ-OM specification <abbrgrp>
					<abbr bid="B16">16</abbr>
				</abbrgrp>. The core set of VarioML schema elements can be used as building blocks, addressing use cases from the most minimal towards the more complex, while maintaining the underlying interoperability of the data. Implementations can use as many or as few of these blocks as needed, and new elements can be added into the specification as needed. However, with this extensibility also comes the danger of the fragmenting specifications into incompatible versions. While elements are utilized in an increasing number of new representations and schema, at the same time, they are also converging all variation data into a unified variome pipeline. Yet equalizing these needs for divergence and convergence is a task that cannot be planned by a committee. As next generation sequencing continues accelerating both the scale and complexity of the data produced, all producers of variation data have a stake in decreasing the gap between a general variation annotation standard, and the community it serves. Accordingly, VarioML is intended less as a &#8216;completed&#8217; specification, more as a nucleation centre around which new specifications can be developed. All variation data producers are called upon to develop this specification collaboratively.</p><p>To this end, VarioML development has been turned over to the community. The specification lives inside an open collaboration framework, tightly binding new variation reporting structures to the common schema and tooling, maintaining consistent application generation capability and backwards compatibility with earlier applications and data <abbrgrp>
					<abbr bid="B72">72</abbr>
				</abbrgrp>. We chose two forums to realize this collaboration framework: the VarioML forum at the science-centered GEN2PHEN Knowledge Center <abbrgrp>
					<abbr bid="B73">73</abbr>
				</abbrgrp>, where format details are discussed alongside immediate access to a unified catalog of LSDBs and other tools for variation data integration; and VarioML&#8217;s GitHub repository <abbrgrp>
					<abbr bid="B74">74</abbr>
				</abbrgrp>, where the schema and XML, JSON, Java, and RDF tools are available, in addition to UML documentation that clarifies the relationships between specification and implementations <abbrgrp>
					<abbr bid="B75">75</abbr>
				</abbrgrp>. Modified or new compositions using schema elements must be reported in either of these forums and discussed openly, enabling the collaborative extension of the format without breaking existing implementations.</p><p>The open-ended nature of the VarioML specification means there should continuously be elements under active redefinition and modification by the community. These features should not be implemented in applications until consensus on usage is reached. For example, the <it>Variant</it> element allows recursive sub-elements, for cases where the reporting variant is composed of other variants located on the same or a sister chromosome. Yet this and other features (see <it>Additional XML elements</it> section) are not currently implemented in applications.</p>
		</sec>
		<sec>
			<st>
				<p>Conclusions</p>
			</st><p>VarioML enables researchers, diagnostic laboratories, and clinics to improve the quality of human variation information, and to share that information with ease, clarity, and without ambiguity. VarioML resolves the inherent tendency of variation data to diverge in format and meaning through a modular design that lives in an open collaboration framework, composed of two linked community forums.</p><p>With this open collaboration framework, the variome community itself closely binds the evolution of the annotation format and its tooling to the science of the study of human mutations. For example, as new configurations and extensions of the format are developed by various implementers, they can be discussed and improved at the GEN2PHEN Knowledge Center, alongside submissions of relevant data made through Cafe Variome. As community consensus emerges, this agreement translates to changes in the schema and tooling in the common repository. At each step, the provenance of even small contributions are captured and can be used as microattributions <abbrgrp>
					<abbr bid="B42">42</abbr>
				</abbrgrp>.</p><p>For such bottom-up, self-organizing management of a common variation standard to work, teams working at critical junctions in the variation pipeline must translate a passion for the vision of the unified variome, into both implementation and development of the shared standard. To date, VarioML has been implemented in three applications (the Leiden Open Variation Database <abbrgrp>
					<abbr bid="B24">24</abbr>
				</abbrgrp>, GenSearch <abbrgrp>
					<abbr bid="B36">36</abbr>
				</abbrgrp>, and BC|SNPMax <abbrgrp>
					<abbr bid="B37">37</abbr>
				</abbrgrp>), and is currently being tested in a fourth (Alamut <abbrgrp>
					<abbr bid="B38">38</abbr>
				</abbrgrp>). In each case, VarioML is used to enable push-button submission of data through the Cafe Variome service <abbrgrp>
					<abbr bid="B28">28</abbr>
				</abbrgrp>.</p><p>With consensus on a minimal standard, implementation is the remaining bottleneck. Users, from research teams to commercial software producers, need to focus their software-related activity to those tasks in which their resource costs are proportionally smaller than the added value afforded by adopting new tools and data models. VarioML has been designed to minimize the effort required for both implementation and extension, framing the specification itself with Java and JSON APIs on the one hand, and an open collaboration framework on the other. We hope this approach proves useful throughout the variation science community, as it meets the challenge and potential of next generation sequencing, and quickens to open the path from discovery to delivery.</p>
		</sec>
		<sec>
			<st>
				<p>Competing interests</p>
			</st><p>The authors declare no competing interests.</p>
		</sec>
		<sec>
			<st>
				<p>Authors&#8217; contributions</p>
			</st><p>MB collated the contributions of other authors, and wrote the body of the manuscript. IFACF was a central participant in defining and refining the VarioML format, as a participant in GEN2PHEN and as one of the creators of the Leiden Open Variaton Database. OL was a central participant in defining and refining the VarioML format, as a participant in GEN2PHEN and as one of the creators of the Cafe Variome data submission platform. TA participated in defining and refining the VarioML format. AAB participated in refining the VarioML format, as an implementer of the specification in the BC|SNPMax application. DA participated in refining the VarioML format, as an implementer of the specification in the Gensearch application. CB participated in defining and refining the VarioML format. MC participated in defining and refining the VarioML format. RD participated in defining and refining the VarioML format. AD participated in defining and refining the VarioML format. GP participated in defining and refining the VarioML format, as a participant in GEN2PHEN and representing the National Ethnic Mutation Databases in this activity. MS was a central participant in defining and refining the VarioML format, as a participant in GEN2PHEN and as one of the creators of Observ-OM, Pheno-OM, and the Molgenis application platform. PEMT was a central participant in defining and refining the VarioML format, as a participant in GEN2PHEN and as one of the creators of the Leiden Open Variaton Database. GT was a central participant in defining and refining the VarioML format, as a participant in GEN2PHEN and as one of the creators of the ORCID researcher identification platform. MV was a central participant in defining and refining the VarioML format, as a participant in GEN2PHEN and as the creators of the Varioation Ontology. AB was a central participant in defining and refining the VarioML format, as chair of GEN2PHEN and as one of the creators of the Cafe Variome data submission platform. JM was a central participant in defining and refining the VarioML format, as a participant in GEN2PHEN and as the managing creator of the VarioML specification. All authors participated in the design and testing of VarioML. All authors read and approved the final manuscript.</p>
		</sec>
	</bdy>
	<bm>
		<ack>
			<sec>
				<st>
					<p>Acknowledgements</p>
				</st><p>The research leading to these results has received funding from the European Community&#8217;s Seventh Framework Programme (FP7/2007-2013) under grant agreement number 200754 - the GEN2PHEN project.</p>
			</sec>
		</ack>
		<refgrp><bibl id="B1"><title><p>Improving bioinformatic pipelines for exome variant calling</p></title><aug><au><snm>Ji</snm><fnm>H</fnm></au></aug><source>Genome Medicine</source><pubdate>2012</pubdate><volume>4</volume><fpage>7</fpage></bibl><bibl id="B2"><title><p>An integrative variant analysis suite for whole exome next-generation sequencing data</p></title><aug><au><snm>Challis</snm><fnm>D</fnm></au><au><snm>Yu</snm><fnm>J</fnm></au><au><snm>Evani</snm><fnm>US</fnm></au><au><snm>Jackson</snm><fnm>AR</fnm></au><au><snm>Paithankar</snm><fnm>S</fnm></au><au><snm>Coarfa</snm><fnm>C</fnm></au><au><snm>Milosavljevic</snm><fnm>A</fnm></au><au><snm>Gibbs</snm><fnm>RA</fnm></au><au><snm>Yu</snm><fnm>FL</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2012</pubdate><volume>13</volume><fpage>1</fpage><lpage>3</lpage></bibl><bibl id="B3"><title><p>Deriving the consequences of genomic variants with the Ensembl API and SNP Effect Predictor</p></title><aug><au><snm>McLaren</snm><fnm>W</fnm></au><au><snm>Pritchard</snm><fnm>B</fnm></au><au><snm>Rios</snm><fnm>D</fnm></au><au><snm>Chen</snm><fnm>YA</fnm></au><au><snm>Flicek</snm><fnm>P</fnm></au><au><snm>Cunningham</snm><fnm>F</fnm></au></aug><source>Bioinformatics</source><pubdate>2010</pubdate><volume>26</volume><fpage>2069</fpage><lpage>2070</lpage></bibl><bibl id="B4"><title><p>On not reinventing the wheel</p></title><aug><au><cnm>Editors</cnm></au></aug><source>Nat Genet</source><pubdate>2012</pubdate><volume>44</volume><fpage>233</fpage></bibl><bibl id="B5"><title><p>Resources</p></title><aug><au><cnm>GEN2PHEN Knowledge Center</cnm></au></aug><note>
   <url>http://www.gen2phen.org/resources</url>
</note></bibl><bibl id="B6"><title><p>Recommendations for locus-specific databases and their curation</p></title><aug><au><snm>Cotton</snm><fnm>RGH</fnm></au><au><snm>Auerbach</snm><fnm>AD</fnm></au><au><snm>Beckmann</snm><fnm>JS</fnm></au><au><snm>Blumenfeld</snm><fnm>OO</fnm></au><au><snm>Brookes</snm><fnm>AJ</fnm></au><au><snm>Brown</snm><fnm>AF</fnm></au><au><snm>Carrera</snm><fnm>P</fnm></au><au><snm>Cox</snm><fnm>DW</fnm></au><au><snm>Gottlieb</snm><fnm>B</fnm></au><au><snm>Greenblatt</snm><fnm>MS</fnm></au><etal/></aug><source>Hum Mutat</source><pubdate>2008</pubdate><volume>29</volume><fpage>2</fpage><lpage>5</lpage></bibl><bibl id="B7"><title><p>GEN2PHEN LSDB Listing</p></title><note>
   <url>http://www.gen2phen.org/data/lsdbs</url>
</note></bibl><bibl id="B8"><title><p>Sharing Data between LSDBs and Central Repositories</p></title><aug><au><snm>den Dunnen</snm><fnm>JT</fnm></au><au><snm>Sijmons</snm><fnm>RH</fnm></au><au><snm>Andersen</snm><fnm>PS</fnm></au><au><snm>Vihinen</snm><fnm>M</fnm></au><au><snm>Beckmann</snm><fnm>JS</fnm></au><au><snm>Rossetti</snm><fnm>S</fnm></au><au><snm>Talbot</snm><fnm>CC</fnm></au><au><snm>Hardison</snm><fnm>RC</fnm></au><au><snm>Povey</snm><fnm>S</fnm></au><au><snm>Cotton</snm><fnm>RGH</fnm></au></aug><source>Hum Mutat</source><pubdate>2009</pubdate><volume>30</volume><fpage>493</fpage><lpage>495</lpage></bibl><bibl id="B9"><title><p>Locus-specific database domain and data content analysis: evolution and content maturation toward clinical use</p></title><aug><au><snm>Mitropoulou</snm><fnm>C</fnm></au><au><snm>Webb</snm><fnm>AJ</fnm></au><au><snm>Mitropoulos</snm><fnm>K</fnm></au><au><snm>Brookes</snm><fnm>AJ</fnm></au><au><snm>Patrinos</snm><fnm>GP</fnm></au></aug><source>Hum Mutat</source><pubdate>2010</pubdate><volume>31</volume><fpage>1109</fpage><lpage>1116</lpage></bibl><bibl id="B10"><aug><au><snm>Kuntzer</snm><fnm>J</fnm></au><au><snm>Eggle</snm><fnm>D</fnm></au><au><snm>Klostermann</snm><fnm>S</fnm></au><au><snm>Burtscher</snm><fnm>H</fnm></au></aug><source>Human variation databases</source><publisher>Database, Oxford)</publisher><pubdate>2010</pubdate><note>2010:baq015</note></bibl><bibl id="B11"><title><p>DNA, diseases and databases: disastrously deficient</p></title><aug><au><snm>Patrinos</snm><fnm>GP</fnm></au><au><snm>Brookes</snm><fnm>AJ</fnm></au></aug><source>Trends Genet</source><pubdate>2005</pubdate><volume>21</volume><fpage>333</fpage><lpage>338</lpage></bibl><bibl id="B12"><title><p>Curating Gene Variant Databases (LSDBs): Toward a Universal Standard</p></title><aug><au><snm>Celli</snm><fnm>J</fnm></au><au><snm>Dalgleish</snm><fnm>R</fnm></au><au><snm>Vihinen</snm><fnm>M</fnm></au><au><snm>Taschner</snm><fnm>PEM</fnm></au><au><snm>den Dunnen</snm><fnm>JT</fnm></au></aug><source>Hum Mutat</source><pubdate>2012</pubdate><volume>33</volume><fpage>291</fpage><lpage>297</lpage></bibl><bibl id="B13"><title><p>Analysis of next-generation genomic data in cancer: accomplishments and challenges</p></title><aug><au><snm>Ding</snm><fnm>L</fnm></au><au><snm>Wendl</snm><fnm>MC</fnm></au><au><snm>Koboldt</snm><fnm>DC</fnm></au><au><snm>Mardis</snm><fnm>ER</fnm></au></aug><source>Hum Mol Genet</source><pubdate>2010</pubdate><volume>19</volume><fpage>R188</fpage><lpage>196</lpage></bibl><bibl id="B14"><title><p>Planning the human variome project: the Spain report</p></title><aug><au><snm>Kaput</snm><fnm>J</fnm></au><au><snm>Cotton</snm><fnm>RG</fnm></au><au><snm>Hardman</snm><fnm>L</fnm></au><au><snm>Watson</snm><fnm>M</fnm></au><au><snm>Al Aqeel</snm><fnm>AI</fnm></au><au><snm>Al-Aama</snm><fnm>JY</fnm></au><au><snm>Al-Mulla</snm><fnm>JY</fnm></au><au><snm>Alonso</snm><fnm>S</fnm></au><au><snm>Aretz</snm><fnm>S</fnm></au><au><snm>Auerbach</snm><fnm>AD</fnm></au><etal/></aug><source>Hum Mutat</source><pubdate>2009</pubdate><volume>30</volume><fpage>496</fpage><lpage>510</lpage></bibl><bibl id="B15"><title><p>The Phenotype and Genotype Experiment Object Model (PaGE-OM): A Robust Data Structure for Information Related to DNA Variation</p></title><aug><au><snm>Brookes</snm><fnm>AJ</fnm></au><au><snm>Lehvaslaiho</snm><fnm>H</fnm></au><au><snm>Muilu</snm><fnm>J</fnm></au><au><snm>Shigemoto</snm><fnm>Y</fnm></au><au><snm>Oroguchi</snm><fnm>T</fnm></au><au><snm>Tomiki</snm><fnm>T</fnm></au><au><snm>Mukaiyama</snm><fnm>A</fnm></au><au><snm>Konagaya</snm><fnm>A</fnm></au><au><snm>Kojima</snm><fnm>T</fnm></au><au><snm>Inoue</snm><fnm>I</fnm></au><etal/></aug><source>Hum Mutat</source><pubdate>2009</pubdate><volume>30</volume><fpage>968</fpage><lpage>977</lpage></bibl><bibl id="B16"><title><p>Observ-OM and Observ-TAB: Universal syntax solutions for the integration, search, and exchange of phenotype and genotype information</p></title><aug><au><snm>Adamusiak</snm><fnm>T</fnm></au><au><snm>Parkinson</snm><fnm>H</fnm></au><au><snm>Muilu</snm><fnm>J</fnm></au><au><snm>Roos</snm><fnm>E</fnm></au><au><snm>van der Velde</snm><fnm>KJ</fnm></au><au><snm>Thorisson</snm><fnm>GA</fnm></au><au><snm>Byrne</snm><fnm>M</fnm></au><au><snm>Pang</snm><fnm>C</fnm></au><au><snm>Gollapudi</snm><fnm>S</fnm></au><au><snm>Ferretti</snm><fnm>V</fnm></au><etal/></aug><source>Hum Mutat</source><pubdate>2012</pubdate><volume>33</volume><issue>5</issue><fpage>867</fpage><lpage>73</lpage></bibl><bibl id="B17"><aug><au><snm>Tyrelle</snm><fnm>G</fnm></au><au><snm>King</snm><fnm>GC</fnm></au></aug><source>A platform for the description, distribution and analysis of genetic polymorphism data</source><publisher>Proceedings of the First Asia-Pacific bioinformatics conference on, Bioinformatics</publisher><pubdate>2003</pubdate></bibl><bibl id="B18"><title><p>Genomic Sequence Variation Markup Language (GSVML)</p></title><aug><au><snm>Nakaya</snm><fnm>J</fnm></au><au><snm>Kimura</snm><fnm>M</fnm></au><au><snm>Hiroi</snm><fnm>K</fnm></au><au><snm>Ido</snm><fnm>K</fnm></au><au><snm>Yang</snm><fnm>W</fnm></au><au><snm>Tanaka</snm><fnm>H</fnm></au></aug><source>Int J Med Inform</source><pubdate>2010</pubdate><volume>79</volume><fpage>130</fpage><lpage>142</lpage></bibl><bibl id="B19"><title><p>PAGE-OM Markup Language</p></title><note>
   <url>http://www.openpml.org/</url>
</note></bibl><bibl id="B20"><source>VCF (Variant Call Format) Specification</source><note>
   <url>http://vcftools.sourceforge.net/specs.html</url>
</note></bibl><bibl id="B21"><title><p>The variant call format and VCFtools</p></title><aug><au><snm>Danecek</snm><fnm>P</fnm></au><au><snm>Auton</snm><fnm>A</fnm></au><au><snm>Abecasis</snm><fnm>G</fnm></au><au><snm>Albers</snm><fnm>CA</fnm></au><au><snm>Banks</snm><fnm>E</fnm></au><au><snm>DePristo</snm><fnm>MA</fnm></au><au><snm>Handsaker</snm><fnm>RE</fnm></au><au><snm>Lunter</snm><fnm>G</fnm></au><au><snm>Marth</snm><fnm>GT</fnm></au><au><snm>Sherry</snm><fnm>ST</fnm></au><etal/></aug><source>Bioinformatics</source><pubdate>2011</pubdate><volume>27</volume><fpage>2156</fpage><lpage>2158</lpage></bibl><bibl id="B22"><title><p>A simple spreadsheet-based, MIAME-supportive format for microarray data: MAGE-TAB</p></title><aug><au><snm>Rayner</snm><fnm>TF</fnm></au><au><snm>Rocca-Serra</snm><fnm>P</fnm></au><au><snm>Spellman</snm><fnm>PT</fnm></au><au><snm>Causton</snm><fnm>HC</fnm></au><au><snm>Farne</snm><fnm>A</fnm></au><au><snm>Holloway</snm><fnm>E</fnm></au><au><snm>Irizarry</snm><fnm>RA</fnm></au><au><snm>Liu</snm><fnm>J</fnm></au><au><snm>Maier</snm><fnm>DS</fnm></au><au><snm>Miller</snm><fnm>M</fnm></au><etal/></aug><source>BMC Bioinformatics</source><pubdate>2006</pubdate><volume>7</volume><fpage>489</fpage><lpage>489</lpage></bibl><bibl id="B23"><aug><au><snm>Jeliffe</snm><fnm>R</fnm></au></aug><source>The Schematron Assertion Language</source><note>
   <url>http://www.ascc.net/xml/resource/schematron/Schematron2000.html</url>
</note></bibl><bibl id="B24"><title><p>LOVD v.2.0: the next generation in gene variant databases</p></title><aug><au><snm>Fokkema</snm><fnm>IF</fnm></au><au><snm>Taschner</snm><fnm>PE</fnm></au><au><snm>Schaafsma</snm><fnm>GC</fnm></au><au><snm>Celli</snm><fnm>J</fnm></au><au><snm>Laros</snm><fnm>JF</fnm></au><au><snm>den Dunnen</snm><fnm>JT</fnm></au></aug><source>Hum Mutat</source><pubdate>2011</pubdate><volume>32</volume><issue>5</issue><fpage>557</fpage><lpage>563</lpage></bibl><bibl id="B25"><title><p>XGAP: a uniform and extensible data model and software platform for genotype and phenotype experiments</p></title><aug><au><snm>Swertz</snm><fnm>MA</fnm></au><au><snm>Velde</snm><fnm>KJ</fnm></au><au><snm>Tesson</snm><fnm>BM</fnm></au><au><snm>Scheltema</snm><fnm>RA</fnm></au><au><snm>Arends</snm><fnm>D</fnm></au><au><snm>Vera</snm><fnm>G</fnm></au><au><snm>Alberts</snm><fnm>R</fnm></au><au><snm>Dijkstra</snm><fnm>M</fnm></au><au><snm>Schofield</snm><fnm>P</fnm></au><au><snm>Schughart</snm><fnm>K</fnm></au><etal/></aug><source>Genome Biol</source><pubdate>2010</pubdate><volume>11</volume><fpage>R27</fpage></bibl><bibl id="B26"><title><p>Guidelines for establishing locus specific databases</p></title><aug><au><snm>Vihinen</snm><fnm>M</fnm></au><au><snm>den Dunnen</snm><fnm>JT</fnm></au><au><snm>Dalgleish</snm><fnm>R</fnm></au><au><snm>Cotton</snm><fnm>RGH</fnm></au></aug><source>Hum Mutat</source><pubdate>2012</pubdate><volume>33</volume><fpage>298</fpage><lpage>305</lpage></bibl><bibl id="B27"><title><p>How to catch all those mutations&#8211;the report of the third Human Variome Project Meeting, UNESCO Paris, May 2010</p></title><aug><au><snm>Kohonen-Corish</snm><fnm>MRJ</fnm></au><au><snm>Al-Aama</snm><fnm>JY</fnm></au><au><snm>Auerbach</snm><fnm>AD</fnm></au><au><snm>Axton</snm><fnm>M</fnm></au><au><snm>Barash</snm><fnm>CI</fnm></au><au><snm>Bernstein</snm><fnm>I</fnm></au><au><snm>Beroud</snm><fnm>C</fnm></au><au><snm>Burn</snm><fnm>J</fnm></au><au><snm>Cunningham</snm><fnm>F</fnm></au><au><snm>Cutting</snm><fnm>GR</fnm></au><etal/></aug><source>Hum Mutat</source><pubdate>2010</pubdate><volume>31</volume><fpage>1374</fpage><lpage>1381</lpage></bibl><bibl id="B28"><source>Cafe Variome</source><note>
   <url>http://cafevariome.org/</url>
</note></bibl><bibl id="B29"><source>RELAXNG Home Page</source><note>
   <url>http://relaxng.org</url>
</note></bibl><bibl id="B30"><title><p>The human phenotype ontology</p></title><aug><au><snm>Robinson</snm><fnm>PN</fnm></au><au><snm>Mundlos</snm><fnm>S</fnm></au></aug><source>Clin Genet</source><pubdate>2010</pubdate><volume>77</volume><fpage>525</fpage><lpage>534</lpage></bibl><bibl id="B31"><title><p>The Sequence Ontology: a tool for the unification of genome annotations</p></title><aug><au><snm>Eilbeck</snm><fnm>K</fnm></au><au><snm>Lewis</snm><fnm>SE</fnm></au><au><snm>Mungall</snm><fnm>CJ</fnm></au><au><snm>Yandell</snm><fnm>M</fnm></au><au><snm>Stein</snm><fnm>L</fnm></au><au><snm>Durbin</snm><fnm>R</fnm></au><au><snm>Ashburner</snm><fnm>M</fnm></au></aug><source>Genome Biol</source><pubdate>2005</pubdate><volume>6</volume><fpage>R44:1</fpage><lpage>12</lpage></bibl><bibl id="B32"><aug><au><snm>Vihinen</snm><fnm>M</fnm></au></aug><source>Variation Ontology</source><note>
   <url>http://variationontology.org/</url>
</note></bibl><bibl id="B33"><title><p>SKOS (Simple Knowledge Organization System) Home Page</p></title><note>
   <url>http://www.w3.org/2004/02/skos/</url>
</note></bibl><bibl id="B34"><title><p>D2RQ - Treating Non-RDF Databases as Virtual RDF Graphs</p></title><aug><au><snm>Bizer</snm><fnm>AS</fnm></au></aug><source>ISWC2004</source><pubdate>2004</pubdate></bibl><bibl id="B35"><title><p>Towards pharmacogenomics knowledge discovery with the semantic web</p></title><aug><au><snm>Dumontier</snm><fnm>M</fnm></au><au><snm>Villanueva-Rosales</snm><fnm>N</fnm></au></aug><source>Brief Bioinform</source><pubdate>2009</pubdate><volume>10</volume><fpage>153</fpage><lpage>163</lpage></bibl><bibl id="B36"><source>Phenosystems</source><note>
   <url>http://www.phenosystems.com</url>
</note></bibl><bibl id="B37"><source>BC Platforms - Genotype Data Management</source><note>
   <url>http://www.bcplatforms.com/Solutions/Genotype-Data-Management.html</url>
</note></bibl><bibl id="B38"><aug><au><cnm>Interactive Biosoftware</cnm></au></aug><note>
   <url>http://www.interactive-biosoftware.com</url>
</note></bibl><bibl id="B39"><aug><au><cnm>Evidence Ontology</cnm></au></aug><note>
   <url>http://code.google.com/p/evidenceontology/</url>
</note></bibl><bibl id="B40"><aug><au><cnm>Open Researcher and Contributor ID (ORCID)</cnm></au></aug><note>
   <url>http://orcid.org/</url>
</note></bibl><bibl id="B41"><title><p>ORCID: UNIQUE IDENTIFIERS for AUTHORS AND CONTRIBUTORS</p></title><aug><au><snm>Fenner</snm><fnm>M</fnm></au></aug><source>Information Standards Quarterly</source><pubdate>2011</pubdate><volume>23</volume><fpage>10</fpage><lpage>13</lpage></bibl><bibl id="B42"><title><p>Systematic documentation and analysis of human genetic variation in hemoglobinopathies using the microattribution approach</p></title><aug><au><snm>Giardine</snm><fnm>B</fnm></au><au><snm>Borg</snm><fnm>J</fnm></au><au><snm>Higgs</snm><fnm>DR</fnm></au><au><snm>Peterson</snm><fnm>KR</fnm></au><au><snm>Philipsen</snm><fnm>S</fnm></au><au><snm>Maglott</snm><fnm>D</fnm></au><au><snm>Singleton</snm><fnm>BK</fnm></au><au><snm>Anstee</snm><fnm>DJ</fnm></au><au><snm>Basak</snm><fnm>AN</fnm></au><au><snm>Clark</snm><fnm>B</fnm></au><etal/></aug><source>Nat Genet</source><pubdate>2011</pubdate><volume>43</volume><fpage>295</fpage><lpage>301</lpage></bibl><bibl id="B43"><aug><au><snm>den Dunnen</snm><fnm>J</fnm></au></aug><source>Nomenclature for the description of sequence variants</source><note>
   <url>http://www.hgvs.org/mutnomen/</url>
</note></bibl><bibl id="B44"><title><p>A formalized description of the standard human variant nomenclature in Extended Backus-Naur Form</p></title><aug><au><snm>Laros</snm><fnm>JF</fnm></au><au><snm>Blavier</snm><fnm>A</fnm></au><au><snm>den Dunnen</snm><fnm>JT</fnm></au><au><snm>Taschner</snm><fnm>PE</fnm></au></aug><source>BMC Bioinformatics</source><pubdate>2011</pubdate><volume>12</volume><issue>Suppl 4</issue><fpage>S5</fpage></bibl><bibl id="B45"><aug><au><cnm>HGNC Searches</cnm></au></aug><note>
   <url>http://www.genenames.org/hgnc-searches</url>
</note></bibl><bibl id="B46"><title><p>Identifiers.org and MIRIAM Registry: community resources to provide persistent identification</p></title><aug><au><snm>Juty</snm><fnm>N</fnm></au><au><snm>Le Nov&#232;re</snm><fnm>N</fnm></au><au><snm>Laibe</snm><fnm>C</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2012</pubdate><volume>40</volume><fpage>580</fpage><lpage>586</lpage></bibl><bibl id="B47"><aug><au><cnm>MIRIAM Registry</cnm></au></aug><note>
   <url>http://www.ebi.ac.uk/miriam/main/</url>
</note></bibl><bibl id="B48"><aug><au><cnm>Locus Reference Genomic (LRG) sequences</cnm></au></aug><note>
   <url>http://www.lrg-sequence.org</url>
</note></bibl><bibl id="B49"><title><p>Locus Reference Genomic sequences: an improved basis for describing human DNA variants</p></title><aug><au><snm>Dalgleish</snm><fnm>R</fnm></au><au><snm>Flicek</snm><fnm>P</fnm></au><au><snm>Cunningham</snm><fnm>F</fnm></au><au><snm>Astashyn</snm><fnm>A</fnm></au><au><snm>Tully</snm><fnm>RE</fnm></au><au><snm>Proctor</snm><fnm>G</fnm></au><au><snm>Chen</snm><fnm>Y</fnm></au><au><snm>McLaren</snm><fnm>WM</fnm></au><au><snm>Larsson</snm><fnm>P</fnm></au><au><snm>Vaughan</snm><fnm>BW</fnm></au><etal/></aug><source>Genome Med</source><pubdate>2010</pubdate><volume>2</volume><fpage>24</fpage><lpage>24</lpage></bibl><bibl id="B50"><aug><au><cnm>Cafe Variome Minimum Information Specification</cnm></au></aug><source>Variant name element</source><note>
   <url>http://varioml.org/cafevariome_minspec.htm#variant_name</url>
</note></bibl><bibl id="B51"><aug><au><snm>Bell JB</snm><fnm>D</fnm></au><au><snm>Sistermans</snm><fnm>E</fnm></au><au><snm>Ramsden</snm><fnm>SC</fnm></au></aug><source>Practice guidelines for the interpretation and reporting of unclassified variants (UVs) in clinical molecular genetics. Guidelines ratified by the UK CMGS (11th January, 2008) and the VGKL (22nd October, 2007)</source><pubdate>2007</pubdate><note>A CMGS e-publication [<url>http://www.cmgs.org/BPGs/Best_Practice_Guidelines.htm</url>]</note></bibl><bibl id="B52"><aug><au><cnm>SKOS Pathogenicity</cnm></au></aug><source>Turtle RDF file</source><note>
   <url>http://purl.org/varioml/pathogenicity/skos/1.0/</url>
</note></bibl><bibl id="B53"><aug><au><cnm>SKOS Genetic origin</cnm></au></aug><source>Turtle RDF file</source><note>
   <url>http://purl.org/varioml/genetic_origin/skos/1.0/</url>
</note></bibl><bibl id="B54"><title><p>Integrating biological data--the Distributed Annotation System</p></title><aug><au><snm>Jenkinson</snm><fnm>AM</fnm></au><au><snm>Albrecht</snm><fnm>M</fnm></au><au><snm>Birney</snm><fnm>E</fnm></au><au><snm>Blankenburg</snm><fnm>H</fnm></au><au><snm>Down</snm><fnm>T</fnm></au><au><snm>Finn</snm><fnm>RD</fnm></au><au><snm>Hermjakob</snm><fnm>H</fnm></au><au><snm>Hubbard</snm><fnm>TJP</fnm></au><au><snm>Jimenez</snm><fnm>RC</fnm></au><au><snm>Jones</snm><fnm>P</fnm></au><etal/></aug><source>BMC Bioinformatics</source><pubdate>2008</pubdate><volume>9</volume><issue>Suppl 8</issue><fpage>S3:1</fpage><lpage>7</lpage></bibl><bibl id="B55"><source>OpenAIRE Guidelines 1.1 (PDF)</source><note>
   <url>http://www.openaire.eu/en/component/attachments/download/79</url>
</note></bibl><bibl id="B56"><source>OpenAIRE Access Rights</source><note>
   <url>http://wiki.surf.nl/display/standards/info-eu-repo/#info-eu-repo-AccessRights</url>
</note></bibl><bibl id="B57"><title><p>Revolutionary impact of XML on biomedical information interoperability</p></title><aug><au><snm>Shabo</snm><fnm>A</fnm></au><au><snm>Rabinovici-Cohen</snm><fnm>S</fnm></au><au><snm>Vortman</snm><fnm>P</fnm></au></aug><source>Ibm Syst J</source><pubdate>2006</pubdate><volume>45</volume><fpage>361</fpage><lpage>372</lpage></bibl><bibl id="B58"><source>Data is Code</source><note>
   <url>http://wiki.tcl.tk/17869</url>
</note></bibl><bibl id="B59"><aug><au><snm>Abelson</snm><fnm>H</fnm></au><au><snm>Sussman</snm><fnm>J</fnm></au><au><snm>Sussman</snm><fnm>J</fnm></au></aug><source>Structure and Interpretation of Computer Programs</source><pubdate>1984</pubdate></bibl><bibl id="B60"><aug><au><cnm>JSON, data and the REST</cnm></au></aug><note>
   <url>http://webofdata.wordpress.com/2011/08/07/json-data-and-the-rest/</url>
</note></bibl><bibl id="B61"><title><p>Comparison of JSON and XML Data Interchange Formats: A Case Study</p></title><aug><au><snm>Nurseitov</snm><fnm>N</fnm></au><au><snm>Paulson</snm><fnm>M</fnm></au><au><snm>Reynolds</snm><fnm>R</fnm></au><au><snm>Izurieta</snm><fnm>C</fnm></au></aug><source>Scenario</source><pubdate>2009</pubdate><volume>59715</volume><fpage>157</fpage><lpage>162</lpage></bibl><bibl id="B62"><aug><au><cnm>Javascript Object Notation (JSON)</cnm></au></aug><note>
   <url>http://en.wikipedia.org/wiki/JSON</url>
</note></bibl><bibl id="B63"><source>Project JAX-B</source><note>
   <url>http://jaxb.java.net/</url>
</note></bibl><bibl id="B64"><aug><au><cnm>Jackson JSON Processor Home</cnm></au></aug><note>
   <url>http://wiki.fasterxml.com/JacksonHome</url>
</note></bibl><bibl id="B65"><aug><au><cnm>VarioML Java Library</cnm></au></aug><note>
   <url>https://github.com/VarioML/VarioML/tree/master/src/java/varioml</url>
</note></bibl><bibl id="B66"><source>Efficient XML Interchange Working Group</source><note>
   <url>http://www.w3.org/XML/EXI/</url>
</note></bibl><bibl id="B67"><title><p>Semantic-JSON: a lightweight web service interface for Semantic Web contents integrating multiple life science databases</p></title><aug><au><snm>Kobayashi</snm><fnm>N</fnm></au><au><snm>Ishii</snm><fnm>M</fnm></au><au><snm>Takahashi</snm><fnm>S</fnm></au><au><snm>Mochizuki</snm><fnm>Y</fnm></au><au><snm>Matsushima</snm><fnm>A</fnm></au><au><snm>Toyoda</snm><fnm>T</fnm></au></aug><source>Nucleic Acids Res</source><pubdate>2011</pubdate><volume>39</volume><fpage>533</fpage><lpage>540</lpage></bibl><bibl id="B68"><source>Web Analysis of the Variome</source><note>
   <url>http://bioinformatics.ua.pt/WAVe/</url>
</note></bibl><bibl id="B69"><title><p>WAVe: web analysis of the variome</p></title><aug><au><snm>Lopes</snm><fnm>P</fnm></au><au><snm>Dalgleish</snm><fnm>R</fnm></au><au><snm>Oliveira</snm><fnm>JL</fnm></au></aug><source>Hum Mutat</source><pubdate>2011</pubdate><volume>32</volume><fpage>729</fpage><lpage>734</lpage></bibl><bibl id="B70"><title><p>CNVs from exome sequencing</p></title><aug><au><snm>Mak</snm><fnm>C</fnm></au></aug><source>Nat Biotech</source><pubdate>2012</pubdate><volume>30</volume><fpage>626</fpage><lpage>626</lpage></bibl><bibl id="B71"><title><p>When Scientists Don&#8217;t Share: Is Secrecy a Necessary Evil?</p></title><aug><au><snm>Benowitz</snm><fnm>S</fnm></au></aug><source>JNCI</source><pubdate>2002</pubdate><volume>10</volume><fpage>712</fpage><lpage>713</lpage></bibl><bibl id="B72"><title><p>Research issues in database schema evolution: the road not taken</p></title><aug><au><snm>Ram</snm><fnm>S</fnm></au><au><snm>Shankaranarayanan</snm><fnm>G</fnm></au></aug><source>Boston University School of Management, Department of Information Systems, Working Paper. #2003-15</source><pubdate>2003</pubdate></bibl><bibl id="B73"><source>VarioML User and Developer Group</source><note>
   <url>http://www.gen2phen.org/groups/varioml</url>
</note></bibl><bibl id="B74"><source>VarioML Repository</source><note>
   <url>https://github.com/VarioML/VarioML</url>
</note></bibl><bibl id="B75"><source>VarioML Simplified UML Model</source><note>
   <url>https://raw.github.com/VarioML/VarioML/master/xml/lsdb_main/uml/varioml.jpg</url>
</note></bibl></refgrp>
	</bm>
</art>