XML DTD reference
- <!ELEMENT>
- this tag is used to define an XML element type name and its permissible sub-elements;
- the name of an element must be a *legal* XML name:
- Unicode letters and digits;
- four punctuation marks: ".", "-", "_", ":";
- colons ":" should only be used in an XML name as a namespace delimiter;
- the first letter of an XML Name must be a Unicode letter or a colon (":") or an underscore ("_");
- sample usage:
- this element type can have any tag that is defined in the associated schema and XML text
<!ELEMEMT elementName ANY>
- this element cannot have content, but it can have attributes
<!ELEMEMT elementName EMPTY>
- this element can contain only text (it cannot have any child elements)
<!ELEMENT elementName (#PCDATA)>
- this element can have children but it cannot contain text (with the exception of whitespace)
- element's children are specified using a sequence list thus they must appear in the specified order
<!ELEMENT elementName (child1, child2)>
- element's children are specified using a mutually exclusive choice list
<!ELEMENT elementName (child1 | child2)>
- element's children are specified using a sequence list thus they must appear in the specified order
- mixed content model: this element can contain both childrens and text but you cannot specify the order or
the number of its children
<!ELEMENT elementName2 (#PCDATA | Grade | elementName)*>
- this element type can have any tag that is defined in the associated schema and XML text
- Cardinality operators:
- ? - "0 or 1"
- * - "0 to n"
- + - "1 to n"
- Cardinality operator applied to each element type:
For choice lists
- "?" at most one element from the choice list must appear; it is legal if no element will appear
<!ELEMENT elementName (child1 | child2)?>
- "*" any of the elements from the choice list can appear in any order and in any number
<!ELEMENT elementName (child1 | child2)*>
- "+" at least one element from the choice list must appear at least once
<!ELEMENT elementName (child1 | child2)+>
For sequence lists
- "?" the specified sequence list can appear 0 or 1 times; disparate elements from the sequence cannot appear
<!ELEMENT elementName (child1, child2)?>
- "*" the specified sequence list can appear 0 to n times; disparate elements from the sequence cannot appear
<!ELEMENT elementName (child1, child2)*>
- "+" the specified sequence list must appear at least once; disparate elements from the sequence cannot appear
<!ELEMENT elementName (child1, child2)+>
- "?" at most one element from the choice list must appear; it is legal if no element will appear
- <!ENTITY>
- this tag is used to define replaceable content;
- allows references to parsed/unparsed external entities from XML documents; allows references to parsed entities from DTD documents (this kind of entity is called *parameter* entity);
- a parsed entity is defined in the DTD in either the internal subset (internal means in the XML file itself) or external subset (in a separate DTD file);
- if an entity is specified as parsed and it *is* referenced then its content must be valid XML;
- sample usage:
- internal parsed entity
<!ENTITY name "replacement_text">
- external parsed entity
<!ENTITY name SYSTEM "location"> <!ENTITY name PUBLIC "identifier" "location">
Note:
If the external parsed entity is not encoded in UTF-16 or UTF-8 then the external parsed entity must have a declaration on its first line that inform the parser that a specific encoding is used:
<?xml version="1.x" encoding="Big5"?> - external unparsed entity; such an entity is always external;
an unparsed entity is always associated with a notation:
<!ENTITY name SYSTEM "location" NDATA notation_type> <!ENTITY name PUBLIC "ident" "loc" NDATA notation_type>
- the "notation_type" must match a name in a <!NOTATION> declaration;
- the NDATA keyword is used to differentiate between external parsed and external unparsed entities.
- it is *illegal* to have recursive reference declarations:
<!ENTITY self_ref "&self_ref;"> <!ENTITY ref_a "&ref_b;"> <!ENTITY ref_b "&ref_a;"> - parameter entities are used exclusively in DTDs and must always be parsed entities;
- the format of an internal/external *parameter* entity is (this entity can be declared in the internal or the external DTD subset):
<!ENTITY % name "replacement_text"> <!ENTITY % name SYSTEM "location"> <!ENTITY % name PUBLIC "identifier" "location">- can be used to include another DTDs in the current DTD:
<!ENTITY % AnotherDTD SYSTEM "SomeFile.dtd"> %AnotherDTD; - character entity references have the following formats:
&#NNNNN; (decimal representation has up to 5 digits)
&#XXXX; (hexa representation has up to 4 digits)
- example: © == &#A9 (this is the copywright '�' character)
- there are 5 build-in character entity references defined in XML:
- ampersand; (&)
- less than; (<)
- greater than; (>)
- apostrophe (')
- quote (")
- internal parsed entity
- <!ATTLIST>
-defines the attributes of an XML element (permissible and default values);
Attribute definitions:
- the attribute *must* be present in the XML document (is required)
<!ATTLIST AnElement an_attribute CDATA #REQUIRED>
- the attribute is optional:
<!ATTLIST AnElement an_attribute CDATA #IMPLIED>
- the attribute is optional, but if it appears it must have a certain predefined value:
<!ATTLIST AnElement an_attribute CDATA #FIXED "value">
- the attribute is optional and it has a default value; a validating parser will supply the default value if the
attribute is not specified in the respective element:
<!ATTLIST AnElement an_attribute CDATA "value">
- the attribute is optional but it can only have values from a predefined list:
<!ATTLIST Test6 an_attribute (value1 | value2) #IMPLIED>
Attribute types: (there are 10 types)
- CDATA - in CDATA you cannot have external entities, nor contain unescaped "<" signs; the less-than sign must be encoded "<"; for an example of CDATA attribute see above;
- Enumerated values - all the enumerated values must be composed of NameChars; for an example see above;
- ID - a unique identifier in the whole document instance (regardless of the element type):
<!ATTLIST Test6 an_attribute ID #IMPLIED> <!ATTLIST Test6 an_attribute ID #REQUIRED>
- IDREF/IDREFS - the value of such an attribute must be a legal XML name and must match an ID in the same document instance:
<!ATTLIST Test5 ID ID #IMPLIED Ref IDREFS #REQUIRED Ref2 IDREF #IMPLIED > <!ELEMENT Test5 EMPTY> <!-- this element has an optional ID, a required IDREFS and an optional IDREF; the IDREFS attribute has a single value that points to the same element --> <Test5 ID="abc" Ref="abc"/>
- NMTOKEN/NMTOKENS
- the only real difference between NMTOKEN and CDATA is that the former will not allow the whitespace and some punctuation characters;
- NMTOKEN/NMTOKENS only allow NameChar characters;
<!ATTLIST Test5 Year NMTOKEN #IMPLIED Values NMTOKENS #REQUIRED TimeStamp NMTOKEN #FIXED "15:00" Parts NMTOKENS "A37 B100 C90" > - ENTITY/ENTITIES
- the values of such attributes must match the names of *unparsed* entity already declared in the DTD;
<!-- DTD --> <!ELEMENT Test5 EMPTY> <!ATTLIST Test5 Img1 ENTITY #REQUIRED Img2 ENTITY #FIXED "Toto1" Img3 ENTITY #IMPLIED Img4 ENTITY "def" > <!ENTITY Toto1 PUBLIC "id" "loc" NDATA NotNo500> <!NOTATION NotNo500 PUBLIC "ident" "loc"> <!-- XML --> <Test5 Img1="Toto11"/> - NOTATION
- must point to a notation that is explicitely defined in the DTD;
- the attribute *must* be present in the XML document (is required)
- <!NOTATION>
- this tag is used to describe non-xml data; its a hint to the application about handling unparsable data;
<!NOTATION name SYSTEM "location"> <!NOTATION name PUBLIC "identifier" "location"> - conditional sections: IGNORE & INCLUDE directives;
<![INCLUDE [ <!ELEMENT Test7 EMPTY> ]]> <![IGNORE [ <!ELEMENT Test8 EMPTY> ]]>
- parameter entities must be used in order to achieve the effect of conditional sections:
<!ENTITY % TestCondition "INCLUDE"> <![%TestCondition; [ <!ELEMENT Test9 EMPTY> ]]>
Best regards,
Razvan MIHAIU