DTD Tutorial

Please leave a remark at the bottom of each page with your useful suggestion.


DTD Tutorial


DTD Introduction

  • Document Type Definitions (DTDs) impose structure on an XML document
  • Using DTDs, we can specify what a "valid" document should contain
  • DTD specifications require more than being well-formed
  • DTDs have limited expressive power, e.g., one cannot specify types
  • DTDs can be used to define special languages of XML, i.e., restricted XML for special needs
  • A document type declaration is placed in the XML document's preceding the root element
  • To begins with <!DOCTYPE and ends with >

Internal Subset

To declaration that are inside the XML document, anythiing inside square brackets [].

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE myRoot [
 <!ELEMENT myElement ( #PCDATA ) >
]>

External Subset

To declration that are outside the XML document, physically exist in a different file that typically ends with the .dtd extension. External subsets are specified using either keyword SYSTEM or the keyword PUBLIC

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE SYSTEM "myDTD.dtd">

The PUBLIC keyword indicates that the DTD is widely used and may be available in well-known locations.

<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE article PUBLIC "-//NLM//DTD Journal Publishing DTD v3.0 20080202//EN" 
"https://jats.nlm.nih.gov/archiving/1.1d1/JATS-archivearticle1.dtd">

Examples

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE addressbook [ 
  <!ELEMENT addressbook (person*)> 
  <!ELEMENT person  (name, greet?, address*,                               
(fax | tel)*, email+)> 
  <!ELEMENT name   (#PCDATA)> 
  <!ELEMENT greet  (#PCDATA)> 
  <!ELEMENT address(#PCDATA)> 
  <!ELEMENT tel    (#PCDATA)> 
  <!ELEMENT fax    (#PCDATA)> 
  <!ELEMENT email  (#PCDATA)> 
]> 

<person>  
 <name>Mahendran</name> 
 <great>Mr.</great> 
 <addr>New Colony Road</addr> 
 <addr>Springfield USA, 98765</addr> 
 <tel>(321) 786 2543</tel> 
 <fax>(321) 786 2544</fax> 
 <tel>(321) 786 2544</tel> 
 <email>info@edumasktut.com</email>   
</person>  

DTD Elements

Elements are the primary building block used in XML documents and are declared in a DTD element type declaration.

An element can be declared as having mixed content like element and PCDATA. The comma (,) plus sign (+) and question mark (?) occurrence indicators cannot be used with mixed content element.

An element delcared as type ANY can contain any content including PCDATA, elements, or a combination of elements and PCDATA. Elements with ANY content can also be empty elements.

Syntax of a DTD rule to define elements:

Syntax: <!ELEMENT tag_name child_element_specification>
  • A combination of child elements according to combination rules
    <!ELEMENT page (title, content, comment?)>
  • Mixed contents, i.e. child elements plus #PCDATA or ANY
    <!ELEMENT para (strong | #PCDATA )*>
  • #PCDATA (Just data)
    <!ELEMENT title (#PCDATA)>
  • ANY (only used during development)
    <!ELEMENT para (ANY)*>
  • EMPTY (the element has no contents)
    <!ELEMENT person EMPTY>

Regular Expression

name, greet?, addr*, (tel | fax)*, email+

<?xml version="1.0" encoding="UTF-8"?> 
<!DOCTYPE addressbook [ 
  <!ELEMENT addressbook (person*)> 
  <!ELEMENT person  (name, greet?, address*, (fax | tel)*, email+)> 
  <!ELEMENT name   (#PCDATA)> 
  <!ELEMENT greet  (#PCDATA)> 
  <!ELEMENT address(#PCDATA)> 
  <!ELEMENT tel    (#PCDATA)> 
  <!ELEMENT fax    (#PCDATA)> 
  <!ELEMENT email  (#PCDATA)> 
]> 
  • name = there must be a name element
  • greet? = there is an optional greet element (i.e., 0 or 1 greet elements)
  • name, greet? = the name element is followed by an optional greet element
  • addr* = there are 0 or more address elements
  • tel | fax = there is a tel or a fax element
  • (tel | fax)* = there are 0 or more repeats of tel or fax
  • email+ = there are 1 or more email elements

Combination rules

<A> and <B> = tags Explanation DTD example XML example
A, B A followed by B <!ELEMENT person (name, email?)> <person> <name>Joe</name> <email>x@x.x</email> </person>
A? A is optional, (it can be present or absent) <!ELEMENT person (name, email?)> <person> <name>Joe</name></person>
A+ At least one A <!ELEMENT person (name, email+)> <person> <name>Joe</name> <email>x@x.x</email></person> <person> <name>Joe</name> <email>x@x.x</email> <email>x@y.x</email> </person>
A* Zero, one or several A <!ELEMENT person (name, email*)> <person> <name>Joe</name> </person>
A | B Either A or B <!ELEMENT person (email | fax)> <person> <name>Joe</name> <email>x@x.x</email></person> <person> <name>Joe</name> <fax>123456789</fax></person>
(A, B) Parenthesis will group and you can apply the above combination rules to the whole group <!ELEMENT list (name, email)+ > <list> <person> <name>Joe</name> <email>x@x.x</email></person> </list>
#PCDATA "Parsed Character Data" Text contents of an element. It should not contain any <,>,& etc. <!ELEMENT email (#PCDATA)> <email>Daniel.Schneider @tecfa.unige.ch</email>
EMPTY No contents <!ELEMENT br EMTPY> <br/>
ANY Allows any non-specified child elements and parsed character data (avoid this !!!) <!ELEMENT person ANY> <person> <c>text</c> <a>some <b>bbb</b> inside </a> </person>

DTD Attributes

  • An attribute list for an element is declared using the ATTLIST element type declaration.
  • An attribute's default values: Keywords #IMPLIED, #REQUIRED, #FIXED are attribute default.
  • Attributes type are classified as either string (CDATA), tokenized, or enumerated
  • ID, IDREF, ENTITY, NMTOKEN are all type of tokenized attributes
<!ATTLIST element-name  
 attribute-name type default-value > 
 
 <!ATTLIST height  dim  CDATA  "cm"> 
 
 <height dim="cm" />
 

type is one of the following:

CDATAcharacter data (i.e., the string as it is)
(en1 | en2 | …)value must be one from the given list
ID value is a unique id
IDREF value is the id of another element
IDREFS value is a list of other ids
NMTOKENThe value is a valid XML name
NMTOKENSThe value is a list of valid XML names
ENTITYThe value is an entity
ENTITIESThe value is a list of entities
NOTATIONThe value is a name of a notation

default-value is one of the following:

#REQUIREDattribute must always be included in the element
#IMPLIEDattribute need not be included
#FIXED valueattribute value is fixed

Attributes Examples

<!ELEMENT height (#PCDATA)> 
 
<!ATTLIST height  
      dimension  (cm|in)   #REQUIRED 
      accuracy    CDATA    #IMPLIED  
      resizable   CDATA    #FIXED "yes" 
>

DTD Entities

Entities are XML macros. Macros expanded when the document is processed.

  • Character entities
  • Named entities
  • External entities
  • Parameter entities

Character entities: stand for arbitrary Unicode characters (©)

<!ENTITY copy   "&#169;"> 

Named entities: Entities can reference other entities


<!ENTITY name "Mahendran"> 
<!ENTITY person "&name; Developer"> 

Using person in a document expands to:  Mahendran Developer

External entities: Represent the content of an external file. Useful when breaking a document down into parts.

[
<!ENTITY chap1 SYSTEM "chapter-1.xml"> 
<!ENTITY chap2 SYSTEM "chapter-2.xml">
]
<book>
&chap1;&chap2;
</book>

Parameter entities: distinguish themselves from other entities, in the entity declaration Using (%).

<!ENTITY % accept "INCLUDE" >

<![%accept;[
  <!ELEMENT message (appproved, sign)>
]]>

<!ELEMENT approved EMPTY>



Write Your Comments or Suggestion...