XML
Dott. Nicole NOVIELLI
[email protected]
http://www.di.uniba.it/intint/people/nicole.html
XML: eXtensible Markup Language
!  Permits document authors to create markup
language, that is text-based notations for
describing data
!  Enables document authors to create entrely new
markup languages for describing any type of data
Eg.:
! 
! 
! 
! 
! 
! 
! 
Mathematical formulas
Software-configuration instructions
Chemical structures
Music
News
Reports
…
Es.: xml describing a baseball player’s information
<?xml version = "1.0"?>
<!-- Fig. 14.1: player.xml -->
<!-- Baseball player structured with XML -->
<player>
<firstName>John</firstName>
<lastName>Doe</lastName>
<battingAverage>0.375</battingAverage>
</player>
•  XML documents contain text that represent content (in red) and
elements that speciry the document’s structure (tag)
•  XML documents delimit elements with start tags (<tagName>) and
end tags (</tagName>)
•  Every XML document must have a root element hat contains all the
otehr element (‘player’ in the example)
Vocabularies
!  XML-based markup langugage
!  Provide a means for describing particular types of
data in a standard and structured way
!  Some XML vocabularies include:
! 
! 
! 
! 
XHTML
MathML
VoiceXML
CML (chemical markup language)
VoiceXML
Voice Extensible MarkUp Language
Un Tutorial qui: http://www.voicexml.org/tutorials/intro1.html
VoiceXML (VXML) is the W3C's standard XML format for specifying
interactive voice dialogues between a human and a computer.
It allows voice applications to be developed and deployed in an
analogous way to HTML for visual applications.
Just as HTML documents are interpreted by a visual web browser,
VoiceXML documents are interpreted by a voice browser.
VoiceXML has tags that instruct the voice browser to provide speech
synthesis, automatic speech recognition, dialog management, and
audio playback.
An example of a VoiceXML document:
<?xml version="1.0"?>
<vxml version="2.0" xmlns="http://www.w3.org/2001/vxml"> <form>
<block> <prompt> Hello world! </prompt> </block> </form> </vxml>
5
VoiceXML is designed for creating audio dialogs that feature synthesized
speech, digitized audio, recognition of spoken and DTMF key input,
recording of spoken input, telephony, and mixed initiative conversations.
Un esempio
<?xml version="1.0" encoding="UTF-8"?> <vxml xmlns="http://www.w3.org/
2001/vxml" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
xsi:schemaLocation="http://www.w3.org/2001/vxml http://www.w3.org/TR/
voicexml20/vxml.xsd" version="2.0">
<form>
<field name="drink">
<prompt>Would you like coffee, tea, milk, or nothing? </prompt>
<grammar src="drink.grxml" type="application/srgs+xml"/>
</field>
<block>
<submit next="http://www.drink.example.com/drink2.asp"/>
</block>
</form>
</vxml>
6
Validating XML documents: DTD and schema
!  An XML document can refer to a DTD (Document type
Definition) or to a schema
!  Validating parsers can read the DTD/Schema and check that
the XML document conforms to it
!  That is the document has an appropriate structure
!  E.g.: for the player’s information example: we are referencing a
DTD that specified that a player element must have firstName,
lastName and battingAverage elements
!  Omitting one of them would caus invalidation of player.xml,
though the document would still be well-formed because it follows
properly the XML syntax
!  A nonvalidating parser just checks the syntax of an XML
document
XML is highly portable
!  Viewing or modifying an XML file (extension is ‘.xml’) does not
require any specialized software
!  Any text editor that supports ASCII/Unicode characters can open an
XML document for viewing and editing
!  Most web browsers can disply XML documents in a formatted manner
that shows the XML structure
XML parser and syntax
!  Software for processing the XML files:
!  makes the document available to other applications
!  Checks that the document follows the syntax rules specified by W3C’s
XML Recommendation (www.w3.org/XML)
!  XML syntax requires a single root element and a start and end
tag for each eleements
!  Elements must be properly nested
!  If an XML parser can process the document entirely then the
XML document is well-formed
Structuring data
!  XML Schema is a document definition language
!  It specifies the structure of instance documents
!  “elements contained by other elements"
!  It specifies the datatype of each element/attribute
!  "this element shall hold an integer with the range 0 to 12,000"
!  The XML Schema language is also referred to as XML Schema
Definition (XSD)
!  Composed of two parts:
!  Structure: http://www.w3.org/TR/2001/REC-xmlschema-1-20010502/
!  Datatypes: http://www.w3.org/TR/2001/RECxmlschema-2-20010502/
!  XML Schema is an XML based alternative to DTD
DTD
Il Document Type Definition (definizione del tipo di documento): uno
strumento utilizzato dai programmatori il cui scopo è quello di definire le
componenti ammesse nella costruzione di un documento XML.
Il termine non è utilizzato soltanto per i documenti XML ma anche per tutti
i documenti derivati dall'SGML (di cui peraltro XML vuole essere una
semplificazione che ne mantiene la potenza riducendone la complessità)
tra cui famosissimo è l'HTML.
In SGML, un DTD è necessario per la validazione del documento. Anche in
XML, un documento è valido se presenta un DTD ed è possibile validarlo
usando il DTD.
Tuttavia XML permette anche documenti ben formati, ovvero documenti
che, pur essendo privi di DTD, presentano una struttura sufficientemente
regolare e comprensibile da poter essere controllata.
Schema vs. DTD
!  Both are XML document definition languages
!  XML Schema are written using XML
!  Unlike DTDs, XML Schema are Extensible – like
XML
!  More verbose than DTDs
Schema vs. DTD: example
<!ELEMENT BookStore (Book+)>
<!ELEMENT Book (Title, Author, Date, ISBN,
Publisher)>
<!ELEMENT Title (#PCDATA)>
<!ELEMENT Author (#PCDATA)>
<!ELEMENT Date (#PCDATA)>
<!ELEMENT ISBN (#PCDATA)>
<!ELEMENT Publisher (#PCDATA)>
Schema vs. DTD: example
<xml version="1.0"?>
<xsd:schema xmlns:xsd="http://www.w3.org/2001/XMLSchema"
xmlns="http://www.books.org">
<xsd:element name="BookStore">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="Book" minOccurs="1” maxOccurs="unbounded"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="Book">
<xsd:complexType>
<xsd:sequence>
<xsd:element ref="Title" minOccurs="1" maxOccurs="1"/>
<xsd:element ref="Author" minOccurs="1" maxOccurs="1"/>
<xsd:element ref="Date" minOccurs="1" maxOccurs="1"/>
<xsd:element ref="ISBN" minOccurs="1" maxOccurs="1"/>
<xsd:element ref="Publisher" minOccurs="1" maxOccurs="1"/>
</xsd:sequence>
</xsd:complexType>
</xsd:element>
<xsd:element name="Title" type="xsd:string"/>
<xsd:element name="Author" type="xsd:string"/>
<xsd:element name="Date" type="xsd:string"/>
<xsd:element name="ISBN" type="xsd:string"/>
<xsd:element name="Publisher" type="xsd:string"/>
</xsd:schema>
Referencing a schema in an XML
instance document (simple form)
<?xml version="1.0"?>
<BookStore xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance”
xsi:noNamespaceSchemaLocation="BookStore.xsd">
<Book>
<Title>My Life and Times</Title>
<Author>Paul McCartney</Author>
<Date>1998</Date>
<ISBN>1-56592-235-2</ISBN>
<Publisher>McMillin Publishing</Publisher>
</Book>
…
</BookStore>
Un esempio: markup for a
business letter
Riferimento ad un dtd esterno
È possibile, in alternativa, dichiarare
il dtd nel file XML (inline)
letter.dtd
<!-- Fig. 14.9: letter.dtd
-->
<!-- DTD document for letter.xml -->
<!ELEMENT letter ( contact+, salutation, paragraph+, closing, signature )>
<!ELEMENT contact ( name, address1, address2, city, state, zip, phone, flag )>
<!ATTLIST contact type CDATA #IMPLIED>
<!ELEMENT name ( #PCDATA )>
<!ELEMENT address1 ( #PCDATA )>
<!ELEMENT address2 ( #PCDATA )>
<!ELEMENT city ( #PCDATA )>
<!ELEMENT state ( #PCDATA )>
<!ELEMENT zip ( #PCDATA )>
<!ELEMENT phone ( #PCDATA )>
<!ELEMENT flag EMPTY>
<!ATTLIST flag gender (M | F) "M”>
<!ELEMENT salutation ( #PCDATA )>
<!ELEMENT closing ( #PCDATA )>
<!ELEMENT paragraph ( #PCDATA )>
<!ELEMENT signature ( #PCDATA )>
DTD
Il Document Type Definition (definizione del tipo di documento): uno
strumento utilizzato dai programmatori il cui scopo è quello di definire le
componenti ammesse nella costruzione di un documento XML.
Il termine non è utilizzato soltanto per i documenti XML ma anche per tutti
i documenti derivati dall'SGML (di cui peraltro XML vuole essere una
semplificazione che ne mantiene la potenza riducendone la complessità)
tra cui famosissimo è l'HTML.
In SGML, un DTD è necessario per la validazione del documento. Anche in
XML, un documento è valido se presenta un DTD ed è possibile validarlo
usando il DTD.
Tuttavia XML permette anche documenti ben formati, ovvero documenti
che, pur essendo privi di DTD, presentano una struttura sufficientemente
regolare e comprensibile da poter essere controllata.
Referencing a schema in an XML
instance document (simple form)
<?xml version="1.0"?>
<BookStore xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance”
xsi:noNamespaceSchemaLocation="BookStore.xsd">
<Book>
<Title>My Life and Times</Title>
<Author>Paul McCartney</Author>
<Date>1998</Date>
<ISBN>1-56592-235-2</ISBN>
<Publisher>McMillin Publishing</Publisher>
</Book>
…
</BookStore>
Fonte: slide Prof. Filippo Lanubile
XLM Namespaces
La possibilità di creare elementi personalizzati con
XML, potrebbe portare a conflitti nella gestione dei
nomi
Naming collision: lo stesso nome è usato per indicare
elementi diversi
An XML namespace is a collection of element and
attribute names
XML namespaces provide a means for document
author to unambiguosly refer to the elements with
the same name (i.e. prevent collision)
esempio
Problem:
and
<subject>Geometry</subject>
<subject>Cardiology</subject>
both use ‘subject’ to markup data.
In the first case, the subject is something one studies in school, whereas
in the second case, teh subject is a field of medicine
Solution: differentiation using namespaces
<highschool:subject>Geometry</subject>
and
<medicalschool:subject>Cardiology</subject>
Differentiating elements with namespaces
<?xml version = "1.0"?>
<!-- Fig. 14.7: namespace.xml -->
<!-- Demonstrating namespaces -->
<text:directory
xmlns:text = "urn:deitel:textInfo"
xmlns:image = "urn:deitel:imageInfo">
<text:file filename = "book.xml">
<text:description>A book list</text:description>
</text:file>
<image:file filename = "funny.jpg">
<image:description>A funny picture</image:description>
<image:size width = "200" height = "100" />
</image:file>
</text:directory>
- The xmlns reserved attribute is used to create two namespace prefixes: texts and image
-  Each namespace prefi is boud to a URI (Uniform Resource Identifier)
-  Document authors create their own namespace prefixes and URI
-  To ensure that namespaces are unique, we must provide unique URIs
-  In this example we use URN: Uniform Resource Name
Differentiating elements with namespaces
<?xml version = "1.0"?>
<!-- Fig. 14.7: namespace.xml -->
<!-- Demonstrating namespaces -->
<text:directory
xmlns:text = ”http://www.deitel.com/xmlns-text"
xmlns:image = "http://www.deitel.com/xmlns-text">
<text:file filename = "book.xml">
<text:description>A book list</text:description>
</text:file>
<image:file filename = "funny.jpg">
<image:description>A funny picture</image:description>
<image:size width = "200" height = "100" />
</image:file>
</text:directory>
-  Another common practice si to use URLs, which specify the location of resources on the
Internet
-  Using URLs guarantees that the namespaces are unique because the domain names are
guaranteed to be unique
-  the parser does not visi thte URL: it doesn’t have to be a an actual web pages
e.g. xmlns:text = ”abcdefgkjle" is allowed
XML Schema Types
!  Built-in Datatypes
!  Primitive Datatypes
!  string, double, recurringDuration, decimal, Boolean, ...
!  Derived Datatypes:
!  CDATA, integer, nonPositiveinteger, date, time, ...
!  Derived from the primitive types
!  Example: integer is derived from decimal, CDATA is derived
from string, time is derived from recurringDuration
!  User-defined Datatypes
!  Simple Types
!  Derived from built-in or other user-defined datatypes
!  Structured
!  Complex Types
!  Needed to define child elements and/or attributes of an element
Creating an XML Schema Document
!  XML Schema enables authors to specify what
specific type of data (e.g. numeric, text) an element
can contain
!  XML Schema are XML documents themselves and
the same parser can be used for both Schema and
documents
!  A document may be schema valid or schema invalid
if, respectively, conforms or not to a schema
document
book.xml
a schema valid document describing a list of books
The books element havs the deitel prefix indicating that the books element
is a part of the namespace ‘http://www.deitelcom/booklist’
book.xsd
-  Creating the XML Schema document: defining the ‘vocabulary’ for writing XML documents about
collection of books
-  The schema defines the elements, attrubutes and parent/child relationships that such a
document can (or must) include.
-  It also specifies the type of data that these elements and attributes may contain
book.xsd
Binding the name space prefix
deitel and defining the target
namespace
root
root
book.xsd
book.xml
Connecting the XML document with the schema that defines its
structure. When an XML schema validator examines book.xml and
book.xsd, it will recognize that books.xml uses elements and
attributes form ‘http://www.deitel.com/booklist’ namespace
book.xsd
Defining an element called
‘books’ of type ‘BooksType’
Definition of
‘BooksType’:
Complex Type
is used to
define a
child/parent
relation (not
possible with
simpleType)
Types
!  Every element in an XML Schema has a type
!  Types include the bult-in types provided by XML Schema
or user-defind types, as for SingleBookType
!  Every simple type defines a restriction on an XML on a
type (either built-in or user-defined). Restriction limit the
possible values that an element can hold
!  Complex types may be with
!  Simple content: can contain attributes and must restrict some
other existing type
!  Complex content: can contain attributes and child elements
Creating a simpleType
<simpleType name = "gigahertz”>
<restriction base = "decimal”>
<minInclusive value = "2.1"/>
</restriction>
</simpleType>
simpleType are restrictions of a type typically called a base type. In
this case, the base type is the decimal that is restricted to be at least
2.1 by using the minInclusive element
Creating a complexType with simpleContent
<complexType name = "CPU”>
<simpleContent>
<extension base = "string”>
<attribute name = "model" type = "string"/>
</extension>
</simpleContent>
</complexType>
A complexType with simple content can have attributes but not child
elements.
Also, they must extend or restrict some XML Schema type or user-defined
type.
The extension element with attribute base sets the base type as string. In
this example the string type is extended with the attribute model
Creating a complexType with complex content
<complexType name = "portable">
<all>
<element name = "processor" type = "computer:CPU"/>
<element name = "monitor" type = "int"/>
<element name = "CPUSpeed" type = "computer:gigahertz"/>
<element name = "RAM" type = "int"/>
</all>
<attribute name = "manufacturer" type = "string"/>
</complexType>
A complexType with complex content is allowed to have both attributes
and child elements.
The element all encloses elements that mus each be included once in the
corresponding XML instance document, in any order.
When using types CPU and gigahertz we must include the prefix computer
because thee user-defined types are part of the computer namespace
xmlns:computer = "http://www.deitel.com/computer”
targetNamespace = "http://www.deitel.com/computer">
<element name = "laptop" type = "computer:portable"/>
This line declares the actual element that uses the three types defined in the
schema.
The element is called laptop and is of type portable
We have now created an element named laptop that contains child elements
processor, monitor, CPUSpeed and RAM and the attribute manufacturer
laptop.xml: an XML file using the laptop.xsd schema defined
<?xml version = "1.0"?>
<!-- Laptop components marked up as XML -->
<computer:laptop xmlns:computer = "http://www.deitel.com/computer"
manufacturer = "IBM">
<processor model = "Centrino">Intel</processor>
<monitor>17</monitor>
<CPUSpeed>2.4</CPUSpeed>
<RAM>256</RAM>
</computer:laptop>
Riferimenti
!  Harvey M. Deitel and Paul J. Deitel, Internet & World
Wide Web: How to Program, Ed. Pearson International
Edition
!  http://www.w3.org/
!  www.deitel.com/books/iw3htp4 (per il codice di
esempio degli esercizi)
Scarica

XML: eXtensible Markup Language