DTD Tutorial
Please leave a remark at the bottom of each page with your useful suggestion.
DTD Tutorial
DTD Introduction
- Document Type Definitions (DTDs) impose structure on an XML document
- Using DTDs, we can specify what a "valid" document should contain
- DTD specifications require more than being well-formed
- DTDs have limited expressive power, e.g., one cannot specify types
- DTDs can be used to define special languages of XML, i.e., restricted XML for special needs
- A document type declaration is placed in the XML document's preceding the root element
- To begins with
<!DOCTYPE
and ends with>
Internal Subset
To declaration that are inside the XML document, anythiing inside square brackets []
.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE myRoot [
<!ELEMENT myElement ( #PCDATA ) >
]>
External Subset
To declration that are outside the XML document, physically exist in a different file that typically ends with the .dtd
extension. External subsets are specified using either keyword SYSTEM or the keyword PUBLIC
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE SYSTEM "myDTD.dtd">
The PUBLIC keyword indicates that the DTD is widely used and may be available in well-known locations.
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN"
"https://jats.nlm.nih.gov/archiving/1.1d1/JATS-archivearticle1.dtd">
Examples
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE addressbook [
<!ELEMENT addressbook (person*)>
<!ELEMENT person (name, greet?, address*,
(fax | tel)*, email+)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT greet (#PCDATA)>
<!ELEMENT address(#PCDATA)>
<!ELEMENT tel (#PCDATA)>
<!ELEMENT fax (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>
<person>
<name>Mahendran</name>
<great>Mr.</great>
<addr>New Colony Road</addr>
<addr>Springfield USA, 98765</addr>
<tel>(321) 786 2543</tel>
<fax>(321) 786 2544</fax>
<tel>(321) 786 2544</tel>
<email>info@edumasktut.com</email>
</person>
DTD Elements
Elements are the primary building block used in XML documents and are declared in a DTD element type declaration.
An element can be declared as having mixed content like element and PCDATA. The comma (,) plus sign (+) and question mark (?) occurrence indicators cannot be used with mixed content element.
An element delcared as type ANY
can contain any content including PCDATA
, elements, or a combination of elements and PCDATA
. Elements with ANY
content can also be empty elements.
Syntax of a DTD rule to define elements:
Syntax: <!ELEMENT tag_name child_element_specification>
- A combination of child elements according to combination rules
<!ELEMENT page (title, content, comment?)>
- Mixed contents, i.e. child elements plus
#PCDATA
or ANY
<!ELEMENT para (strong | #PCDATA )*>
- #PCDATA (Just data)
<!ELEMENT title (#PCDATA)>
- ANY (only used during development)
<!ELEMENT para (ANY)*>
- EMPTY (the element has no contents)
<!ELEMENT person EMPTY>
Regular Expression
name, greet?, addr*, (tel | fax)*, email+
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE addressbook [
<!ELEMENT addressbook (person*)>
<!ELEMENT person (name, greet?, address*, (fax | tel)*, email+)>
<!ELEMENT name (#PCDATA)>
<!ELEMENT greet (#PCDATA)>
<!ELEMENT address(#PCDATA)>
<!ELEMENT tel (#PCDATA)>
<!ELEMENT fax (#PCDATA)>
<!ELEMENT email (#PCDATA)>
]>
name
= there must be a name elementgreet?
= there is an optional greet element (i.e., 0 or 1 greet elements)name, greet?
= the name element is followed by an optional greet elementaddr*
= there are 0 or more address elementstel | fax
= there is a tel or a fax element(tel | fax)*
= there are 0 or more repeats of tel or faxemail+
= there are 1 or more email elements
Combination rules
<A> and <B> = tags | Explanation | DTD example | XML example |
---|---|---|---|
A, B | A followed by B | <!ELEMENT person (name, email?)> | <person> <name>Joe</name> <email>x@x.x</email> </person> |
A? | A is optional, (it can be present or absent) | <!ELEMENT person (name, email?)> | <person> <name>Joe</name></person> |
A+ | At least one A | <!ELEMENT person (name, email+)> | <person> <name>Joe</name> <email>x@x.x</email></person> <person> <name>Joe</name> <email>x@x.x</email> <email>x@y.x</email> </person> |
A* | Zero, one or several A | <!ELEMENT person (name, email*)> | <person> <name>Joe</name> </person> |
A | B | Either A or B | <!ELEMENT person (email | fax)> | <person> <name>Joe</name> <email>x@x.x</email></person> <person> <name>Joe</name> <fax>123456789</fax></person> |
(A, B) | Parenthesis will group and you can apply the above combination rules to the whole group | <!ELEMENT list (name, email)+ > <list> | <person> <name>Joe</name> <email>x@x.x</email></person> </list> |
#PCDATA | "Parsed Character Data" Text contents of an element. It should not contain any <,>,& etc. | <!ELEMENT email (#PCDATA)> | <email>Daniel.Schneider @tecfa.unige.ch</email> |
EMPTY | No contents | <!ELEMENT br EMTPY> | <br/> |
ANY | Allows any non-specified child elements and parsed character data (avoid this !!!) | <!ELEMENT person ANY> | <person> <c>text</c> <a>some <b>bbb</b> inside </a> </person> |
DTD Attributes
- An attribute list for an element is declared using the
ATTLIST
element type declaration. - An attribute's default values: Keywords
#IMPLIED, #REQUIRED, #FIXED
are attribute default. - Attributes type are classified as either string (CDATA), tokenized, or enumerated
ID, IDREF, ENTITY, NMTOKEN
are all type of tokenized attributes
<!ATTLIST element-name
attribute-name type default-value >
<!ATTLIST height dim CDATA "cm">
<height dim="cm" />
type
is one of the following:
CDATA | character data (i.e., the string as it is) |
(en1 | en2 | …) | value must be one from the given list |
ID | value is a unique id |
IDREF | value is the id of another element |
IDREFS | value is a list of other ids |
NMTOKEN | The value is a valid XML name |
NMTOKENS | The value is a list of valid XML names |
ENTITY | The value is an entity |
ENTITIES | The value is a list of entities |
NOTATION | The value is a name of a notation |
default-value
is one of the following:
#REQUIRED | attribute must always be included in the element |
#IMPLIED | attribute need not be included |
#FIXED value | attribute value is fixed |
Attributes Examples
<!ELEMENT height (#PCDATA)>
<!ATTLIST height
dimension (cm|in) #REQUIRED
accuracy CDATA #IMPLIED
resizable CDATA #FIXED "yes"
>
DTD Entities
Entities are XML macros. Macros expanded when the document is processed.
- Character entities
- Named entities
- External entities
- Parameter entities
Character entities: stand for arbitrary Unicode characters (©)
<!ENTITY copy "©">
Named entities: Entities can reference other entities
<!ENTITY name "Mahendran">
<!ENTITY person "&name; Developer">
Using person in a document expands to: Mahendran Developer
External entities: Represent the content of an external file. Useful when breaking a document down into parts.
[
<!ENTITY chap1 SYSTEM "chapter-1.xml">
<!ENTITY chap2 SYSTEM "chapter-2.xml">
]
<book>
&chap1;&chap2;
</book>
Parameter entities: distinguish themselves from other entities, in the entity declaration Using (%).
<!ENTITY % accept "INCLUDE" >
<![%accept;[
<!ELEMENT message (appproved, sign)>
]]>
<!ELEMENT approved EMPTY>