Nota de Traduction
Le version in Interlingua de iste traduction es disponibile a:
http://www.nautilus.com.br/~ensjo/ia/w3.org/TR/2000/REC-xml-20001006.
Altere traductiones in Interlingua se trova a http://www.nautilus.com.br/~ensjo/ia/w3.org.
Traductor: Emerson José Silveira da Costa <ensjo@nautilus.com.br>.
Le version in Interlingua pote continer errores. Le version anglese de iste specification es le unic version normative. Illo es disponibile a: http://www.w3.org/TR/2000/REC-xml-20001006.
Ultime version: http://www.w3.org/TR/REC-xml.

W3C

Linguage de Marcation Extensibile (XML) 1.0 (secunde edition)

Recommendation del W3C de 6 de octobre 2000

Iste version:
http://www.w3.org/TR/2000/REC-xml-20001006 (XHTML, XML, PDF, XHTML version reviste con indicatores de revision codificate in colores)
Ultime version:
http://www.w3.org/TR/REC-xml
Versiones precedente:
http://www.w3.org/TR/2000/WD-xml-2e-20000814
http://www.w3.org/TR/1998/REC-xml-19980210
Editores:
Tim Bray, Textuality e Netscape <tbray@textuality.com>
Jean Paoli, Microsoft <jeanpa@microsoft.com>
C. M. Sperberg-McQueen, University of Illinois at Chicago e Text Encoding Initiative <cmsmcq@uic.edu>
Eve Maler, Sun Microsystems, Inc. <eve.maler@east.sun.com> - Secunde Edition

Summario

Le Linguage de Marcation Extensibile (Extensible Markup Language - XML) es un subcollection de SGML que es completemente descripte in iste documento. Su objectivo es render possibile que documentos SGML generic sia fornite, recipite e processate super le Web del mesme maniera como hodie illo es possibile con HTML. XML ha essite projectate pro presentar facilitate de implementation e interoperabilitate con SGML e HTML.

Stato de iste documento

Iste documento ha essite reviste per membros del W3C e altere interessatos e ha essite indorsate per le Director como un Recommendation del W3C. Illo es un documento stabile e pote esser usate como un material de referentia o citate como un referentia normative in altere documentos. Le rolo del W3C in facer le Recommendation es attraher le attention al specification e promover su ample adoption. Illo extende le functionalitate e le interoperabilitate del Web.

Iste documento specifica un syntaxe create per extraher un subcollection de un standard international de processamento de textos de ample utilization (Standard Generalized Markup Language - SGML, ISO 8879:1986(E) emendate e corrigite) pro uso super le World Wide Web. Illo es un producto del Activitate del W3C pro XML, detalios del qual pote esser trovate a http://www.w3.org/XML. Le version anglese de iste specification es le unic version normative. Totevia, pro traductiones de iste documento, vide http://www.w3.org/XML/#trans. Un lista del Recommendationes actual del W3C e altere documentos technic es disponibile a http://www.w3.org/TR.

Iste secunde edition non es un nove version de XML (publicate initialmente in 10 de februario 1998); illo solmente incorpora le modificationes dictate per le errata del prime edition (disponibile a http://www.w3.org/XML/xml-19980210-errata) pro le commoditate del lectores. Le lista de errata pro le secunde edition es disponibile a http://www.w3.org/XML/xml-V10-2e-errata.

Per favor relata errores in iste documento a xml-editor@w3.org; archivos es disponibile.

Nota:

Le affiliation de C. M. Sperberg-McQueen ha cambiate desde le publication del prime edition. Ora ille es al World Wide Web Consortium, e pote esser contactate a cmsmcq@w3.org.

Indice

1 Introduction
    1.1 Origine e objectivos
    1.2 Terminologia
2 Documentos
    2.1 Documentos XML ben formate
    2.2 Characteres
    2.3 Constructos syntactic commun
    2.4 Datos textual e marcation
    2.5 Commentarios
    2.6 Instructiones de processamento
    2.7 Sectiones CDATA
    2.8 Prologo e declaration de typo de documento
    2.9 Declaration de documento autonome
    2.10 Tractamento de spatio in blanco
    2.11 Tractamento de fin de linea
    2.12 Identification de lingua
3 Structuras logic
    3.1 Etiquettas de initio, de fin, e de elemento vacue
    3.2 Declarationes de typo de elemento
        3.2.1 Contento elementar
        3.2.2 Contento mixte
    3.3 Declarationes de lista de attributos
        3.3.1 Typos de attributo
        3.3.2 Valores implicite de attributos
        3.3.3 Normalization de valores de attributo
    3.4 Sectiones conditional
4 Structuras physic
    4.1 Referentias a character e entitate
    4.2 Declarationes de entitate
        4.2.1 Entitates interne
        4.2.2 Entitates externe
    4.3 Entitates analysate
        4.3.1 Le declaration de texto
        4.3.2 Entitates analysate ben formate
        4.3.3 Codification de characteres in entitates
    4.4 Le tractamento de entitates e referentias per le processator XML
        4.4.1 Non recognoscite
        4.4.2 Includite
        4.4.3 Includite si validante
        4.4.4 Prohibite
        4.4.5 Includite in litteral
        4.4.6 Notificar
        4.4.7 Ignorate
        4.4.8 Includite como EP
    4.5 Construction de texto de substitution de un entitate interne
    4.6 Entitates predefinite
    4.7 Declarationes de notation
    4.8 Entitate-documento
5 Conformitate
    5.1 Processatores validante e non validante
    5.2 Utilization de processatores XML
6 Notation

Appendices

A Referentias
    A.1 Referentias normative
    A.2 Altere referentias
B Classes de character
C XML e SGML (non normative)
D Expansion de referentias a entidade e character (non normative)
E Modellos de contento deterministic (non normative)
F Autodetection de codificationes de character (non normative)
    F.1 Detection sin information de codification externe
    F.2 Prioritates in le presentia de information de codification externe
G Gruppo de Travalio del W3C pro XML (non normative)
H Gruppo Central del W3C pro XML (non normative)
I Notas de production (non normative)


1 Introduction

Le Linguage de Marcation Extensibile (Extensible Markup Language, abbreviate como XML) describe un classe de objectos de datos denominate documentos XML e describe partialmente le comportamento de programmas que los processa. XML es un profilo de application o un forma restricte de SGML, le Linguage de Marcation Standard Generalizate (Standard Generalized Markup Language [ISO 8879]). Per construction, le documentos XML es automaticamente conforme al SGML.

Le documentos XML consiste de unitates de stockage denominate entitates, que contine datos analysate ("parsed") o non analysate. Datos analysate consiste de characteres, del quales alcunes forma datos textual, e le restante forma marcation. Le marcation codifica un description del structuras logic e de stockage del documento. XML offere un mechanismo pro imponer restrictiones a iste structuras.

[Definition: Un modulo de software denominate processator XML es utilizate pro leger documentos XML e dar accesso a lor contento e structura.] [Definition: Uno assume que le processator XML age in favor de un altere modulo, denominate application.] Iste specification describe le comportamento obligatori de un processator XML in terminos de como illo debe leger le datos XML e le information que illo debe fornir al application.

1.1 Origine e objectivos

XML ha essite disveloppate per un Gruppo de Travalio pro XML (originalmente cognoscite como Commission de Revision Editorial de SGML) formate sub le auspicios del World Wide Web Consortium (W3C) in 1996. Illo ha essite presidite per Jon Bosak de Sun Microsystems con le participation active de un Gruppo de Interesse Special pro XML (previemente cognoscite como le Gruppo de Travalio pro SGML) equalmente organizate per le W3C. Le lista de membros del Gruppo de Travalio pro XML es date in un appendice. Dan Connolly ha servite como le contacto del GT con le W3C.

Le objectivos de projecto de XML es:

  1. XML debe esser facilemente usabile super le Internet.
  2. XML debe supportar un ample varietate de applicationes.
  3. XML debe esser compatibile con SGML.
  4. Debe esser facile scriber programmas pro processar documentos XML.
  5. Le quantitate de functionalitates optional in XML debe esser reducite al minimo absolute, idealmente a zero.
  6. Documentos XML deberea esser legibile per le homine e esser rationabilemente clar.
  7. Le projecto XML deberea esser preparate rapidemente.
  8. Le projecto de XML sera formal e concise.
  9. Documentos XML sera facile de crear.
  10. Le laconismo in marcation XML es de minime importantia.

Iste specification, junctemente con standards associate (Unicode e ISO/IEC pro characteres, Internet RFC 1766 pro etiquettas de identification de lingua, ISO 639 pro codices de nomines de linguas, e ISO 3166 pro codices de nomines de paises), provide tote le information necessari pro comprender XML version 1.0 e construer programmas pro processar lo.

Iste version del specification XML pote esser distribuite liberemente, a condition que tote le texto e notas legal remane intacte.

1.2 Terminologia

Le terminologia usate pro describer documentos XML es definite in le corpore de iste specification. Le terminos definite in le lista sequente es usate in le redaction de aquelle definitiones e in le description del actiones de un processator XML:

pote
[Definition: Documentos e processatores XML conforme ha le permission, ma non le obligation, de presentar le comportamento descripte.]
debe
[Definition: Documentos e processatores XML conforme ha le obligation de presentar le comportamento descripte; su absentia constitue un error. ]
error
[Definition: Un violation del regulas de iste specification; le resultatos es indefinite. Un software conforme pote detectar e relatar un error e pote recuperar se de illo.]
error fatal
[Definition: Un error que un processator XML conforme debe detectar e relatar al application. Post incontrar un error fatal, le processator pote continuar a processar le datos pro recercar errores ulterior e pote relatar los al application. A fin de auxiliar le correction de errores, le processator pote fornir al application datos non processate del documento (con datos textual e marcation intercalate). Post que un error fatal es detectate, totevia, le processator non debe continuar le processamento normal (i.e., illo non debe continuar a passar datos textual e informationes super le structura logic del documento al application de maniera normal).]
a grado del utilizator
[Definition: Un software conforme pote o debe (secundo le contexto del phrase) presentar le comportamento descripte; si illo lo face, illo debe offerer al utilizatores un medio de activar o disactivar le comportamento descripte.]
requisito de validitate
[Definition: Un regula que se applica a tote le documentos XML valide. Violationes de requisitos de validitate constitue errores; illos debe, a grado del utilizator, esser relatate per processatores XML validante.]
requisito de bon formation
[Definition: Un regula que se applica a tote le documentos XML ben formate. Violationes de requisitos de bon formation constitue errores fatal.]
correspondentia
[Definition: (De catenas de characteres o nomines:) Duo catenas de characteres o nomines sub comparation debe esser identic. Characteres con multiple representationes possibile in ISO/IEC 10646 (p.ex. characteres con un forma precomposite e un forma composite per base+diacritico) es correspondente solmente si illos ha le mesme representation in ambe catenas. Majusculas e minusculas non es correspondente. (De catenas de characteres e regulas in un grammatica:) Un catena corresponde a un production grammatical si illo pertine al linguage generate per aquelle production. (De contento e modellos de contento:) Un elemento corresponde a su declaration quando illo es conforme secundo le maniera descripte in le requisito [RV: Elemento valide].]
a fines de compatibilitate
[Definition: Indica un phrase que describe un functionalitate de XML includite solmente pro assecurar que XML remane compatibile con SGML.]
a fines de interoperabilitate
[Definition: Indica un phrase que describe un recommendation non obligatori includite pro augmentar le chances que documentos XML pote esser processate per le base installate actual de processatores SGML anterior al Annexo de Adaptationes WebSGML al ISO 8879.]

2 Documentos

[Definition: Un objecto de datos es un documento XML si illo es ben formate, secundo le definition presente in iste specification. Un documento XML ben formate pote additionalmente esser valide si illo satisface certe requisitos ulterior.]

Cata documento XML ha un structura logic e un structura physic. Physicamente, le documento es composite de unitates denominate entitates. Un entitate pote facer referentia a altere entitates pro causar lor inclusion in le documento. Un documento comencia in un "radice" o entitate-documento. Logicamente, le documento es composite de declarationes, elementos, commentarios, referentias a characteres, e instructiones de processamento, tote le quales es indicate in le documento per marcation explicite. Le structuras logic e physic debe annidar se correctemente, secundo le description in 4.3.2 Entitates analysate ben formate.

2.1 Documentos XML ben formate

[Definition: Un objecto textual es un documento XML ben formate si:]

  1. Considerate como un toto, illo corresponde al production denominate documento.
  2. Illo satisface tote le requisitos de bon formation presentate in iste specification.
  3. Cata un del entitates analysate al qual le documento face referentia directemente o indirectemente es ben formate.
Documento
[1]    document    ::=    prolog element Misc*

Le correspondentia al production document implica que:

  1. Le documento contine un o plus elementos.
  2. [Definition: Existe un sol elemento, denominate radice, o elemento-documento, del qual necun parte appare in le contento de qualcunque altere elemento.] Pro tote le altere elementos, si le etiquetta de initio es in le contento de un altere elemento, le etiquetta de fin es in le contento del mesme elemento. Dicite de maniera plus simple, le elementos, delimitate per etiquettas de initio e de fin, se annida correctemente le unes intra le alteres.

[Definition: Consequentemente, pro cata elemento non-radice F in le documento, existe un altere elemento P in le documento tal que F es in le contento de P, ma non es in le contento de qualcunque altere elemento que es in le contento de P. P es denominate patre de F, e F es denominate filio de P.]

2.2 Characteres

[Definition: Un entitate analysate contine texto, un sequentia de characteres, que pote representar marcation o datos textual.] [Definition: Un character es un unitate atomic de texto secundo le specification ISO/IEC 10646 [ISO/IEC 10646] (vide in addition [ISO/IEC 10646-2000]). Characteres legal es tabulation, retorno de carro, avantiamento de linea, e le characteres legal de Unicode e ISO/IEC 10646. Le versiones de iste standards citate in A.1 Referentias normative era currente durante le preparation de iste documento. Nove characteres pote esser additionate a iste standards per emendamentos o nove editiones. Consequentemente, le processatores XML debe acceptar qualcunque character intra le gamma specificate pro Char. Le uso de "characteres de compatibilitate", secundo le definition in section 6.8 de [Unicode] (vide in addition D21 in le section 3.6 de [Unicode3]), es discoragiate.]

Gamma de characteres
[2]    Char    ::=    #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] /* qualcunque character Unicode, excepte le bloccos {?surrogate?}, FFFE, e FFFF. */

Le mechanismo pro codificar positiones de codice de characteres in patronos de bits pote variar de entitate a entitate. Tote le processatores XML debe acceptar le codificationes UTF-8 e UTF-16 de 10646; le mechanismos pro signalar qual del duo es in uso, o pro activar altere codificationes, es discutite posteriormente, in 4.3.3 Codification de characteres in entitates.

2.3 Constructos syntactic commun

Iste section defini alcun symbolos amplemente usate in le grammatica.

Un S (spatio in blanco) consiste de un o plus characteres de spatio (#x20), retornos de carro, avantiamentos de linea, o tabulationes.

Spatio in blanco
[3]    S    ::=    (#x20 | #x9 | #xD | #xA)+

Characteres es classificate pro commoditate como litteras, digitos, o altere characteres. Un littera consiste de un character de base alphabetic o syllabic o un character ideographic. Definitiones complete del characteres specific in cata classe es date in B Classes de characteres.

[Definition: Un Name (nomine) es un symbolo comenciante con un littera o un stricte gamma de characteres de punctuation, e continuante con litteras, digitos, tractos de union, tractos de sublineamentos, duo punctos, o punctos, collectivemente cognoscite como characteres de nomine.] Nomines comenciante con le catena "xml", o qualcunque catena correspondente a (('X'|'x') ('M'|'m') ('L'|'l')), es reservate a fines de standardization in iste o in futur versiones de iste specification.

Nota:

Le Recommendation Spatios nominal in XML [XML Names] attribue un significato a nomines que contine characteres de duo punctos (:). Consequentemente, le autores non deberea usar le duo punctos in nomines XML salvo pro propositos de spatio nominal, ma le processatores debe acceptar le duo punctos como un character de nomine.

Un NmToken (symbolo nominal) es qualcunque mixtura de characteres de nomine.

Nomines e symbolos
[4]    NameChar    ::=    Letter | Digit | '.' | '-' | '_' | ':' | CombiningChar | Extender
[5]    Name    ::=    (Letter | '_' | ':') (NameChar)*
[6]    Names    ::=    Name (S Name)*
[7]    Nmtoken    ::=    (NameChar)+
[8]    Nmtokens    ::=    Nmtoken (S Nmtoken)*

Datos litteral es qualcunque catena delimitate per virgulettas (") o apostropho (') que non contine le mesme character utilizate como delimitator pro aquelle catena. Le litterales es usate pro specificar le contento de entitates interne (EntityValue), le valores de attributos (AttValue), e identificatores externe (SystemLiteral). Nota que un SystemLiteral pote esser analysate sin un examine pro trovar marcation.

Litterales
[9]    EntityValue    ::=    '"' ([^%&"] | PEReference | Reference)* '"'
|  "'" ([^%&'] | PEReference | Reference)* "'"
[10]    AttValue    ::=    '"' ([^<&"] | Reference)* '"'
|  "'" ([^<&'] | Reference)* "'"
[11]    SystemLiteral    ::=    ('"' [^"]* '"') | ("'" [^']* "'")
[12]    PubidLiteral    ::=    '"' PubidChar* '"' | "'" (PubidChar - "'")* "'"
[13]    PubidChar    ::=    #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%]

Nota:

Ben que le production EntityValue permitte le definition de un entitate consistente de un unic < explicite in le litteral (p.ex., <!ENTITY minorque "<">), nos consilia vehementemente que iste practica sia evitate, viste que qualcunque referentia a aquelle entitate va provocar un error de bon formation.

2.4 Datos textual e marcation

Texto consiste de un mixtura de datos textual e marcation. [Definition: Le marcation assume le forma de etiquettas de initio, etiquettas de fin, etiquettas de elemento vacue, referentias a entitates, referentias a character, commentarios, delimitatores de sectiones CDATA, declarationes de typo de documento, instructiones de processamento, declarationes XML, declarationes de texto, e qualcunque spatio in blanco que es al nivello superior del entitate-documento (isto es, exterior al elemento-documento e non interior a qualcunque altere marcation).]

[Definition: tote texto que non es marcation constitue le datos textual del documento.]

Le character ampersand (&) e le character minor-que (<) pote apparer in lor forma litteral solmente quando utilizate como delimitatores de marcation, o intra un commentario, un instruction de processamento, o un section CDATA. Si illos es necessari in altere locos, illos debe esser mascarate per medio de referentias numeric a character o le catenas "&amp;" e "&lt;" respectivemente. Le character major-que (>) pote esser representate per le catena "&gt;", e debe, a fines de compatibilitate, esser mascarate per "&gt;" o in un referentia a character quando illo appare in le catena "]]>" in le contento, quando aquelle catena non marca le fin de un section CDATA.

In le contento de elementos, dato textual es qualcunque catena que non contine le delimitator de initio de qualcunque marcation. In un section CDATA, dato textual es qualcunque catena de characteres que non include le le delimitator de fin de section CDATA, "]]>".

Pro permitter que valores de attributo contine e virgulettas e apostrophos, le apostropho (') pote esser representate como "&apos;", e le virgulettas (") como "&quot;".

Datos textual
[14]    CharData    ::=    [^<&]* - ([^<&]* ']]>' [^<&]*)

2.5 Commentarios

[Definition: Commentarios pote apparer ubique in un documento, exterior a altere marcationes; in plus, illos pote apparer intra le declaration de typo de documento in locos permittite per le grammatica. Illos non es parte del datos textual del documento; un processator XML pote, ma non es obligate a, permitter que un application recupera le texto del commentarios. A fines de compatibilitate, le catena "--" (duo tractos de union) non debe occurrer intra commentarios.] Referentias a entitates-parametro non es recognoscite intra commentarios.

Commentarios
[15]    Comment    ::=    '<!--' ((Char - '-') | ('-' (Char - '-')))* '-->'

Un exemplo de commentario:

<!-- declarationes pro <head> & <body> -->

Nota que le grammatica non permitte un commentario terminate in --->. Le exemplo sequente non es ben formate.

<!-- B+, B, o B--->

2.6 Instructiones de processamento

[Definition: Instructiones de processamento (IPs) permitte que documentos contine instructiones pro le applicationes.]

Instructiones de processamento
[16]    PI    ::=    '<?' PITarget (S (Char* - (Char* '?>' Char*)))? '?>'
[17]    PITarget    ::=    Name - (('X' | 'x') ('M' | 'm') ('L' | 'l'))

Le IPs non es parte del datos textual del documento, ma illos debe esser transmittite al application. Le IP comencia con un scopo (PITarget) usate pro identificar le application al qual le instruction es dirigite. Le nomines de scopo "XML", "xml", etc. es reservate pro standardization in iste o in futur versiones de iste specification. Le mechanismo de notation XML pote esser usate pro le declaration formal de scopos de IP. Referentias a entitates-parametro non es recognoscite intra instructiones de processamento.

2.7 Sectiones CDATA

[Definition: Sectiones CDATA pote occurrer in qualcunque loco ubi datos textual pote occurrer; illos es usate pro mascarar bloccos de texto que contine characteres que poterea de altere modo esser interpretate como marcation. Sectiones CDATA comencia con le catena "<![CDATA[" e termina con le catena "]]>":]

Sectiones CDATA
[18]    CDSect    ::=    CDStart CData CDEnd
[19]    CDStart    ::=    '<![CDATA['
[20]    CData    ::=    (Char* - (Char* ']]>' Char*))
[21]    CDEnd    ::=    ']]>'

Intra un section CDATA, solmente le catena CDEnd es recognoscite como marcation, de maniera que characteres de minor-que e ampersands pote occurrer in lor forma litteral; non existe necessitate (ni possibilitate) de mascarar los per medio de "&lt;" e "&amp;". Sectiones CDATA non pote annidar se.

Un exemplo de section CDATA, in le qual "<salutation>" e "</salutation>" es interpretate como dato textual, e non como marcation:

<![CDATA[<salutation>Bon die!</salutation>]]>

2.8 Prologo e declaration de typo de documento

[Definition: Documentos XML deberea comenciar con un declaration XML que specifica le version de XML utilizate.] Pro exemplo, le sequente es un documento XML complete, ben formate ma non valide:

<?xml version="1.0"?> <salutation>Bon die!</salutation>

e equalmente:

<salutation>Bon die!</salutation>

Le numero de version "1.0" deberea esser usate pro indicar le conformitate a iste version de iste specification; si un documento usa le valor "1.0" e illo non se conforma a iste version de iste specification, isto constitue un error. Le Gruppo de Travalio pro XML ha le intention de dar al versiones posterior de iste specification numeros differente de "1.0", ma iste intention non indica un promissa de producer versiones futur de XML, ni, si alcun sera producite, de usar alcun schema particular de numeration. Viste que le possibilitate de versiones futur non es excludite, iste constructo es fornite como un medio pro permitter le possibilitate de recognition automatic de versiones, si illo sera necessari. Le processatores pote signalar un error si illos recipe documentos etiquettate con versiones que illos non supporta.

Le function del marcation in un documento XML es describer su structura logic e de stockage e associar pares attributo-valor a su structuras logic. XML provide un mechanismo, le declaration de typo de documento, pro definir restrictiones super le structura logic e supportar le uso de unitates de stockage predefinite. [Definition: Un documento XML es valide si illo ha un declaration de typo de documento associate e si le documento obedi al restrictiones expresse in illo.]

Le declaration de typo de documento debe apparer ante le prime elemento del documento.

Prologo
[22]    prolog    ::=    XMLDecl? Misc* (doctypedecl Misc*)?
[23]    XMLDecl    ::=    '<?xml' VersionInfo EncodingDecl? SDDecl? S? '?>'
[24]    VersionInfo    ::=    S 'version' Eq ("'" VersionNum "'" | '"' VersionNum '"')/* */
[25]    Eq    ::=    S? '=' S?
[26]    VersionNum    ::=    ([a-zA-Z0-9_.:] | '-')+
[27]    Misc    ::=    Comment | PI | S

[Definition: Le declaration de typo de documento XML contine o puncta a declarationes de marcation que provide un grammatica pro un classe de documentos. Iste grammatica es cognite como le definition de typo de documento, o DTD. Le declaration de typo de documento pote punctar a un subcollection externe (un typo special de entitate externe) continente declarationes de marcation, o pote continer le declarationes de marcation directemente in un subcollection interne, o facer le duo cosas. Le DTD pro un documento consiste del duo subcollectiones prendite conjunctemente.]

[Definition: Un declaration de marcation es un declaration de typo de elemento, un declaration de lista de attributos, un declaration de entitate, o un declaration de notation.] Iste declarationes pote esser contenite total- o partialmente intra entitates-parametro, secundo le description in le requisitos de bon formation e validitate infra. Pro information ulterior, vide 4 Structuras physic.

Definition de typo de documento
[28]    doctypedecl    ::=    '<!DOCTYPE' S Name (S ExternalID)? S? ('[' (markupdecl | DeclSep)* ']' S?)? '>' [RV: Typo de elemento-radice]
[RBF: Subcollection externe]
/* */
[28a]    DeclSep    ::=    PEReference | S [RBF: EP inter declarationes]
/* */
[29]    markupdecl    ::=    elementdecl | AttlistDecl | EntityDecl | NotationDecl | PI | Comment [RV: Annidamento correcte de declarationes/EPs]
[RBF: EPs in subcollection interne]

Nota que es possibile construer un documento ben formate continente un doctypedecl que non puncta a un subcollection externe ni contine un subcollection interne.

Le declarationes de marcation pote consister total- o partialmente del texto de substitution de entitates-parametro. Le productiones sequente de iste specification pro nonterminales individual (elementdecl, AttlistDecl, etc.) describe le declarationes post que tote le entitates-parametro ha essite includite.

Referentias a entitates-parametro es recognoscite ubique in le DTD (subcollectiones interne e externe e entitates-parametro externe), salvo in litterales, instructiones de processamento, commentarios, e le contento de sectiones conditional ignorate (vide 3.4 Sectiones conditional). Illos es equalmente recognoscite in litterales de valor de entitate. Le utilization de entitates-parametro in le subcollection interne es restricte secundo le description infra.

Requisito de validitate: Typo del elemento-radice

Le Name in le declaration de typo de documento debe corresponder al typo de elemento del elemento-radice.

Requisito de validitate: Annidamento correcte de declarationes/EPs

Le texto de substitution de entitates-parametro debe esser correctemente annidate con declarationes de marcation. Isto significa que, si le prime o le ultime character de un declaration de marcation (markupdecl supra) es contenite in le texto de substitution pro un referentia a entitate-parametro, ambes debe esser contenite in le mesme texto de substitution.

Requisito de bon formation: EPs in subcollection interne

In le subcollection interne del DTD, referentias a entitates-parametro pote occurrer solmente ubi declarationes de marcation pote occurrer, non intra declarationes de marcation. (Isto non se applica a referentias que occurre in entitates-parametro externe o al subcollection externe.)

Requisito de bon formation: Subcollection externe

Le subcollection externe, si illo existe, debe corresponder al production pro extSubset.

Requisito de bon formation: EP inter declarationes

Le texto de substitution de un referentia a entitate-parametro in un DeclSep debe corresponder al production extSubsetDecl.

Tal como le subcollection interne, le subcollection externe e qualcunque entitates-parametro externe referite in un DeclSep debe consister de un serie de declarationes de marcation complete del typos permittite per le symbolo nonterminal markupdecl, intercalate con spatio in blanco o referentias a entitates-parametro. Totevia, portiones del contento del subcollection externe o de iste entitates-parametro externe pote conditionalmente esser ignorate per medio del constructo section conditional; isto non es permittite in le subcollection interne.

Subcollection externe
[30]    extSubset    ::=    TextDecl? extSubsetDecl
[31]    extSubsetDecl    ::=    ( markupdecl | conditionalSect | DeclSep)* /* */

Le subcollection externe e le entitates-parametro externe differe equalmente del subcollection interne in le facto que, in illos, referentias a entitates-parametro es permittite intra declarationes de marcation, non solo inter declarationes de marcation.

Un exemplo de un documento XML con un declaration de typo de documento:

<?xml version="1.0"?> <!DOCTYPE salutation SYSTEM "bondie.dtd">
<salutation>Bon die!</salutation>

Le identificator de systema "bondie.dtd" da le adresse (un referentia URI) de un DTD pro le documento.

Le declarationes pote tamben esser date localmente, como in iste exemplo:

<?xml version="1.0" encoding="UTF-8" ?>
<!DOCTYPE salutation [
  <!ELEMENT salutation (#PCDATA)>
]>
<salutation>Bon die!</salutation>

Si le subcollectiones interne e externe es utilizate simultaneemente, se considera que le subcollection interne occurre ante le subcollection externe. Isto ha le effecto que le declarationes de entitate e lista de attributos in le subcollection interne ha precedentia super aquelles in le subcollection interne.

2.9 Declaration de documento autonome

Declarationes de marcation pote affectar le contento del documento, quando passate de un processator XML a un application; exemplos es le valores implicite de attributos e le declarationes de entitate. Le declaration de documento autonome, que pote apparer como un componente de un declaration XML, signa si il ha o non tal declarationes que appare externe al entitate-documento o in entitates-parametro. [Definition: Un declaration de marcation externe es definite como un declaration de marcation que occurre in le subcollection externe o in un entitate-parametro (externe o interne, le ultime essente includite perque processatores non validante non es obligate a leger los).]

Declaration de documento autonome
[32]    SDDecl    ::=    S 'standalone' Eq (("'" ('yes' | 'no') "'") | ('"' ('yes' | 'no') '"')) [RV: Declaration de documento autonome]

In un declaration de documento autonome, le valor "yes" indica que il ha necun declarationes de marcation externe que affecta le information passate del processator XML al application. Le valor "no" indica que existe o pote exister declarationes de marcation externe. Nota que le declaration de documento autonome denota solmente le presentia de declarationes; le presentia, in un documento, de referentias a entitates externe, quando aquelle entitates es internemente declarate, non cambia su stato autonome.

Si il ha necun declarationes de marcation externe, le declaration de documento autonome ha necun significato. Si il ha declarationes de marcation externe ma il ha necun declaration de documento autonome, le valor "no" es assumite.

Qualcunque documento XML pro le qual standalone="no" es ver pote esser convertite algorithmicamente a un documento autonome, lo que pote esser desirabile pro alcun applicationes de livration per rete.

Requisito de validitate: Declaration de documento autonome

Le declaration de documento autonome debe haber le valor "no" si qualcunque declaration de marcation externe contine declarationes de:

  • attributos con valores implicite, si elementos al quales iste attributos se applica appare in le documento sin specificationes de valores pro iste attributos, o
  • entitates (altere que amp, lt, gt, apos, quot), si referentias a aquelle entitates appare in le documento, o
  • attributos con valores subjecte a normalization, ubi le attributo appare in le documento con un valor que va cambiar como resultato de normalization, o
  • typos de elemento con contento-elemento, si spatio in blanco occurre directemente intra qualcunque instantia de aquelle typos.

Un exemplo de declaration XML con un declaration de documento autonome:

<?xml version="1.0" standalone='yes'?>

2.10 Tractamento de spatio in blanco

In editar documentos XML, frequentemente es conveniente usar "spatio in blanco" (spatios, tabulationes, e lineas in blanco) pro separar le marcation pro augmentar le legibilitate. Typicamente, non se intende que aquelle spatios in blanco sia includite in le version livrate del documento. Del altere parte, es commun haber spatio in blanco que debe esser preservate in le version livrate, per exemplo in poesia e codice-fonte.

Un processator XML debe sempre passar tote le characteres in un documento que non es marcation al application. Un processator XML validante debe in ultra informar al application qual de iste characteres constitue spatio in blanco que appare in le contento-elemento.

Un attributo special appellate xml:space pote esser annexate a un elemento pro signalar un intention que in aquelle elemento, le spatio in blanco deberea esser preservate per le applicationes. In documentos valide, iste attributo, como qualcunque altere, debe esser declarate si illo es utilizate. Quando declarate, illo debe esser date como un typo enumerate cuje valores es un intra "default" e "preserve", o ambes. Per exemplo:

<!ATTLIST poema xml:space (default|preserve) 'preserve'>

<!-- -->
<!ATTLIST pre xml:space (preserve) #FIXED 'preserve'>

Le valor "default" signala que le modos implicite de processamento de spatios in blanco per le applicationes es acceptabile pro iste elemento; le valor "preserve" indica le intention que le applicationes preserva tote le spatios in blanco. Iste intention declarate se applica a tote le elementos in le contento del elemento ubi illo es specificate, salvo quando superposite per un altere instantia del attributo xml:space.

Le elemento-radice de qualcunque documento es considerante como non habente signalate qualcunque intention quanto al tractamento de spatios in blanco, salvo quando illo forni un valor pro iste attributo o le attributo es declarate con un valor implicite.

2.11 Tractamento de fin de linea

Le entitates analysate XML es frequentemente stockate in archivos de computator que, pro commoditate de edition, es organizate in lineas. Iste lineas es typicamente separate per alcun combination de characteres de retorno de carro (#xD) e avantiamento de linea (#xA).

Pro simplificar le travalio del applicationes, le characteres passate a un application per le processator XML debe esser como si le processator XML normalizarea tote le rupturas de linea in entitates analysate externe (includente le entitate-documento) in le momento del entrata, ante le analyse, per traducer tanto le sequentia de duo characteres #xD #xA quanto qualcunque #xD que non es sequite per #xA in un sol character #xA.

2.12 Identification de lingua

In le processamento de documentos, frequentemente es utile identificar le lingua natural o formal in que le contento es scripte. Un attributo special denominate xml:lang pote esser inserite in documentos pro specificar le lingua utilizate in le contento e in le valores de attributo de qualcunque elemento in un documento XML. In documentos valide, iste attributo, como qualcunque altere, debe esser declared si illo es utilizate. Le valores del attributo es identificatores de lingua tal como definite per [IETF RFC 1766], Etiquettas pro le identification de linguas, o su successor in le sequentia de standards IETF.

Nota:

Le etiquettas [IETF RFC 1766] es construite de codices de duo litteras pro linguas tal como definite per [ISO 639], de codices de duo litteras pro paises tal como definite per [ISO 3166], o de identificatores de linguas registrate con le Internet Assigned Numbers Authority [IANA-LANGCODES]. Uno expecta que le successor de [IETF RFC 1766] introducera codices de tres litteras pro linguas, pro includer linguas ancora non coperte per [ISO 639].

(Le productiones 33 a 38 ha essite eliminate.)

Per exemplo:

<p xml:lang="en">The quick brown fox jumps over the lazy dog.</p>
<p xml:lang="en-GB">What colour is it?</p>
<p xml:lang="en-US">What color is it?</p>
<sp who="Faust" desc='leise' xml:lang="de">
  <l>Habe nun, ach! Philosophie,</l>
  <l>Juristerei, und Medizin</l>
  <l>und leider auch Theologie</l>
  <l>durchaus studiert mit heißem Bemüh'n.</l>
</sp>

Le intention declarate con xml:lang se applica a tote le attributos e al contento del elemento ubi illo es specificate, salvo quando superposite per un instantia de xml:lang in un altere elemento intra aquelle contento.

Un simple declaration pro xml:lang pote assumer le forma

xml:lang NMTOKEN #IMPLIED

ma valores implicite specific pote equalmente esser attribuite, si necessari. In un collection de poemas francese con glossas e notas in interlingua, le attributo xml:lang poterea esser declarate assi:

<!ATTLIST poema  xml:lang NMTOKEN 'fr'>
<!ATTLIST glossa xml:lang NMTOKEN 'ia'>
<!ATTLIST nota   xml:lang NMTOKEN 'ia'>

3 Structuras logic

[Definition: Cata documento XML contine un o plus elementos, cuje frontieras es delimiate per etiquettas de initio e de fin, o, in le caso de elementos vacue, per un etiquetta de elemento vacue. Cata elemento ha un typo, identificate per nomine, a vices appellate su "identificator generic" (IG), e pote haber un collection de specificationes de attributo.] Cata specification de attributo ha un nomine e un valor.

Elemento
[39]    element    ::=    EmptyElemTag
| STag content ETag [RBF: Correspondentia de typo de elemento]
[RV: Elemento valide]

Iste specification non restringe le semantica, uso, o (ultra le syntaxe) nomines del typos de typos de elemento e attributos, excepte que le nomines que comencia con un correspondentia a (('X'|'x')('M'|'m')('L'|'l')) es reservate pro standardization in iste o futur versiones de iste specification.

Requisito de bon formation: Correspondentia de typo de elemento

Le Name in le etiquetta de fin de un elemento debe corresponder al typo de elemento in le etiquetta de initio.

Requisito de validitate: Elemento valide

Un elemento es valide si il ha un declaration correspondente a elementdecl ubi le Name corresponde al typo de elemento, e un del sequentes es ver:

  1. Le declaration corresponde a EMPTY e le elemento non ha contento.
  2. Le declaration corresponde a children e le sequentia de elementos-filios pertine al linguage generate per le expression regular in le modello del contento, con spatio in blanco opitional (characteres correspondente al nonterminal S) inter le etiquetta de initio e le prime elemento-filio, inter elementos-filios, o inter le ultime elemento-filio e le etiquetta de fin. Nota que un section CDATA continente solmente spatios in blanco non corresponde al nonterminal S, e consequentemente non pote apparer in iste positiones.
  3. Le declaration corresponde a Mixed e le contento consiste de datos textual e elementos-filios cuje typos corresponde a nomines in le modello del contento.
  4. Le declaration corresponde a ANY, e le typos de qualcunque elemento-filio ha essite declarate.

3.1 Etiquettas de initio, de fin, e de elemento vacue

[Definition: Le initio de tote elemento XML non vacue es marcate per un etiquetta de initio.]

Etiquetta de initio
[40]    STag    ::=    '<' Name (S Attribute)* S? '>' [RBF: Spec. unic de attributo]
[41]    Attribute    ::=    Name Eq AttValue [RV: Typo de valor de attributo]
[RBF: Necun referentia a entitates externe]
[RBF: Valores de attributo sin <]

Le Name in le etiquettas de initio e de fin specifica le typo del elemento. [Definition: Le pares Name-AttValue es denominate specificationes de attributo del elemento], [Definition: ubi le Name in cata par es denominate nomine del attributo] e [Definition: le contento del AttValue (le texto inter le delimitatores ' o ") es denominate valor del attributo.] Nota que le ordine del specificationes de attributo in un etiquetta de initio o etiquetta de elemento vacue non es significative.

Requisito de bon formation: Specification unic de attributo

Necun nomine de attributo pote apparar plus que un vice in le mesme etiquetta de initio o etiquetta de elemento vacue.

Requisito de validitate: Typo de valor de attributo

Le attributo debe haber essite declarate; le valor debe esser del typo declarate pro illo. (Pro typos de attributo, vide 3.3 Declarationes de lista de attributos.)

Requisito de bon formation: Necun referentia a entitates externe

Le valores de attributo non pote continer referentias directe o indirecte a entitates externe.

Requisito de bon formation: Valores de attributo sin <

Le texto de substitution de un entitate referite directe- o indirectemente in un valor de attributo non debe continer un <.

Exemplo de un etiquetta de initio:

<deftermino id="dt-catto" termino="catto">

[Definition: Le fin de tote elemento que comencia con un etiquetta de initio debe esser marcate per un etiquetta de fin continente un nomine que reflecte le typo de elemento specificate in le etiquetta de initio:]

Etiquetta de fin
[42]    ETag    ::=    '</' Name S? '>'

Exemplo de un etiquetta de fin:

</deftermino>

[Definition: Le texto inter le etiquettas de initio e de fin de un elemento es denominate su contento:]

Contento de elementos
[43]    content    ::=    CharData? ((element | Reference | CDSect | PI | Comment) CharData?)* /* */

[Definition: Un elemento sin contento es denominate vacue.] Le representation de un elemento vacue es o un etiquetta de initio immediatemente sequite per un etiquetta de fin, o un etiquetta de elemento vacue. [Definition: Un etiquetta de elemento vacue assume un forma special:]

Etiquettas pro elementos vacue
[44]    EmptyElemTag    ::=    '<' Name (S Attribute)* S? '/>' [RBF: Specif. unic de attributo]

Etiquettas de elemento vacue pote esser utilizate pro qualcunque elemento que non ha contento, non importante si illo ha o non essite declarate con le parola-clave EMPTY. A fines de interoperabilitate, le etiquetta de elemento vacue deberea esser utilizate, e deberea solmente esser utilizate, pro elementos que es declarate como EMPTY.

Exemplos de elementos vacue:

<IMG align="left"
 src="http://www.w3.org/Icons/WWW/w3c_home" />
<br></br>
<br/>

3.2 Declarationes de typo de elemento

Le structura de elementos de un documento XML pote, a fines de validation, esser restringite per medio del utilization de declarationes de typos de elemento e lista de attributos. Un declaration de typo de elemento restringe le contento del elemento.

Le declarationes de typo de elemento frequentemente restringe qual typos de elemento pote apparer como filios del elemento. A grado del utilizator, un processator XML pote emitter un aviso quanto un declaration mentiona un typo de elemento pro le qual necun declaration ha essite fornite, ma isto non constitue un error.

[Definition: Un declaration de typo de elemento assume le forma:]

Declaration de typo de elemento
[45]    elementdecl    ::=    '<!ELEMENT' S Name S contentspec S? '>' [RV: Declaration unic de typo de elemento]
[46]    contentspec    ::=    'EMPTY' | 'ANY' | Mixed | children

ubi le Name da le typo del elemento declarate.

Requisito de validitate: Declaration unic de typo de elemento

Necun elemento pote esser declarate plus que un vice.

Exemplos de declarationes de typo de elemento:

<!ELEMENT br EMPTY>
<!ELEMENT p (#PCDATA|emph)* >
<!ELEMENT %name.para; %content.para; >
<!ELEMENT container ANY>

3.2.1 Element Content

[Definition: An element type has element content when elements of that type must contain only child elements (no character data), optionally separated by white space (characters matching the nonterminal S).][Definition: In this case, the constraint includes a content model, a simple grammar governing the allowed types of the child elements and the order in which they are allowed to appear.] The grammar is built on content particles (cps), which consist of names, choice lists of content particles, or sequence lists of content particles:

Element-content Models
[47]    children    ::=    (choice | seq) ('?' | '*' | '+')?
[48]    cp    ::=    (Name | choice | seq) ('?' | '*' | '+')?
[49]    choice    ::=    '(' S? cp ( S? '|' S? cp )+ S? ')' /* */
/* */
[VC: Proper Group/PE Nesting]
[50]    seq    ::=    '(' S? cp ( S? ',' S? cp )* S? ')' /* */
[VC: Proper Group/PE Nesting]

where each Name is the type of an element which may appear as a child. Any content particle in a choice list may appear in the element content at the location where the choice list appears in the grammar; content particles occurring in a sequence list must each appear in the element content in the order given in the list.. The optional character following a name or list governs whether the element or the content particles in the list may occur one or more (+), zero or more (*), or zero or one times (?). The absence of such an operator means that the element or content particle must appear exactly once. This syntax and meaning are identical to those used in the productions in this specification.

The content of an element matches a content model if and only if it is possible to trace out a path through the content model, obeying the sequence, choice, and repetition operators and matching each element in the content against an element type in the content model. For compatibility, it is an error if an element in the document can match more than one occurrence of an element type in the content model. For more information, see E Deterministic Content Models.

Validity constraint: Proper Group/PE Nesting

Parameter-entity replacement text must be properly nested with parenthesized groups. That is to say, if either of the opening or closing parentheses in a choice, seq, or Mixed construct is contained in the replacement text for a parameter entity, both must be contained in the same replacement text.

For interoperability, if a parameter-entity reference appears in a choice, seq, or Mixed construct, its replacement text should contain at least one non-blank character, and neither the first nor last non-blank character of the replacement text should be a connector (| or ,).

Examples of element-content models:

<!ELEMENT spec (front, body, back?)>
<!ELEMENT div1 (head, (p | list | note)*, div2*)>
<!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*>

3.2.2 Mixed Content

[Definition: An element type has mixed content when elements of that type may contain character data, optionally interspersed with child elements.] In this case, the types of the child elements may be constrained, but not their order or their number of occurrences:

Mixed-content Declaration
[51]    Mixed    ::=    '(' S? '#PCDATA' (S? '|' S? Name)* S? ')*'
| '(' S? '#PCDATA' S? ')' [VC: Proper Group/PE Nesting]
[VC: No Duplicate Types]

where the Names give the types of elements that may appear as children. The keyword #PCDATA derives historically from the term "parsed character data."

Validity constraint: No Duplicate Types

The same name must not appear more than once in a single mixed-content declaration.

Examples of mixed content declarations:

<!ELEMENT p (#PCDATA|a|ul|b|i|em)*>
<!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* >
<!ELEMENT b (#PCDATA)>

3.3 Attribute-List Declarations

Attributes are used to associate name-value pairs with elements. Attribute specifications may appear only within start-tags and empty-element tags; thus, the productions used to recognize them appear in 3.1 Start-Tags, End-Tags, and Empty-Element Tags. Attribute-list declarations may be used:

  • To define the set of attributes pertaining to a given element type.
  • To establish type constraints for these attributes.
  • To provide default values for attributes.

[Definition: Attribute-list declarations specify the name, data type, and default value (if any) of each attribute associated with a given element type:]

Attribute-list Declaration
[52]    AttlistDecl    ::=    '<!ATTLIST' S Name AttDef* S? '>'
[53]    AttDef    ::=    S Name S AttType S DefaultDecl

The Name in the AttlistDecl rule is the type of an element. At user option, an XML processor may issue a warning if attributes are declared for an element type not itself declared, but this is not an error. The Name in the AttDef rule is the name of the attribute.

When more than one AttlistDecl is provided for a given element type, the contents of all those provided are merged. When more than one definition is provided for the same attribute of a given element type, the first declaration is binding and later declarations are ignored. For interoperability, writers of DTDs may choose to provide at most one attribute-list declaration for a given element type, at most one attribute definition for a given attribute name in an attribute-list declaration, and at least one attribute definition in each attribute-list declaration. For interoperability, an XML processor may at user option issue a warning when more than one attribute-list declaration is provided for a given element type, or more than one attribute definition is provided for a given attribute, but this is not an error.

3.3.1 Attribute Types

XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types. The string type may take any literal string as a value; the tokenized types have varying lexical and semantic constraints. The validity constraints noted in the grammar are applied after the attribute value has been normalized as described in 3.3 Attribute-List Declarations.

Attribute Types
[54]    AttType    ::=    StringType | TokenizedType | EnumeratedType
[55]    StringType    ::=    'CDATA'
[56]    TokenizedType    ::=    'ID' [VC: ID]
[VC: One ID per Element Type]
[VC: ID Attribute Default]
| 'IDREF' [VC: IDREF]
| 'IDREFS' [VC: IDREF]
| 'ENTITY' [VC: Entity Name]
| 'ENTITIES' [VC: Entity Name]
| 'NMTOKEN' [VC: Name Token]
| 'NMTOKENS' [VC: Name Token]

Validity constraint: ID

Values of type ID must match the Name production. A name must not appear more than once in an XML document as a value of this type; i.e., ID values must uniquely identify the elements which bear them.

Validity constraint: One ID per Element Type

No element type may have more than one ID attribute specified.

Validity constraint: ID Attribute Default

An ID attribute must have a declared default of #IMPLIED or #REQUIRED.

Validity constraint: IDREF

Values of type IDREF must match the Name production, and values of type IDREFS must match Names; each Name must match the value of an ID attribute on some element in the XML document; i.e. IDREF values must match the value of some ID attribute.

Validity constraint: Entity Name

Values of type ENTITY must match the Name production, values of type ENTITIES must match Names; each Name must match the name of an unparsed entity declared in the DTD.

Validity constraint: Name Token

Values of type NMTOKEN must match the Nmtoken production; values of type NMTOKENS must match Nmtokens.

[Definition: Enumerated attributes can take one of a list of values provided in the declaration]. There are two kinds of enumerated types:

Enumerated Attribute Types
[57]    EnumeratedType    ::=    NotationType | Enumeration
[58]    NotationType    ::=    'NOTATION' S '(' S? Name (S? '|' S? Name)* S? ')' [VC: Notation Attributes]
[VC: One Notation Per Element Type]
[VC: No Notation on Empty Element]
[59]    Enumeration    ::=    '(' S? Nmtoken (S? '|' S? Nmtoken)* S? ')' [VC: Enumeration]

A NOTATION attribute identifies a notation, declared in the DTD with associated system and/or public identifiers, to be used in interpreting the element to which the attribute is attached.

Validity constraint: Notation Attributes

Values of this type must match one of the notation names included in the declaration; all notation names in the declaration must be declared.

Validity constraint: One Notation Per Element Type

No element type may have more than one NOTATION attribute specified.

Validity constraint: No Notation on Empty Element

For compatibility, an attribute of type NOTATION must not be declared on an element declared EMPTY.

Validity constraint: Enumeration

Values of this type must match one of the Nmtoken tokens in the declaration.

For interoperability, the same Nmtoken should not occur more than once in the enumerated attribute types of a single element type.

3.3.2 Attribute Defaults

An attribute declaration provides information on whether the attribute's presence is required, and if not, how an XML processor should react if a declared attribute is absent in a document.

Attribute Defaults
[60]    DefaultDecl    ::=    '#REQUIRED' | '#IMPLIED'
| (('#FIXED' S)? AttValue) [VC: Required Attribute]
[VC: Attribute Default Legal]
[WFC: No < in Attribute Values]
[VC: Fixed Attribute Default]

In an attribute declaration, #REQUIRED means that the attribute must always be provided, #IMPLIED that no default value is provided. [Definition: If the declaration is neither #REQUIRED nor #IMPLIED, then the AttValue value contains the declared default value; the #FIXED keyword states that the attribute must always have the default value. If a default value is declared, when an XML processor encounters an omitted attribute, it is to behave as though the attribute were present with the declared default value.]

Validity constraint: Required Attribute

If the default declaration is the keyword #REQUIRED, then the attribute must be specified for all elements of the type in the attribute-list declaration.

Validity constraint: Attribute Default Legal

The declared default value must meet the lexical constraints of the declared attribute type.

Validity constraint: Fixed Attribute Default

If an attribute has a default value declared with the #FIXED keyword, instances of that attribute must match the default value.

Examples of attribute-list declarations:

<!ATTLIST termdef
          id      ID      #REQUIRED
          name    CDATA   #IMPLIED>
<!ATTLIST list
          type    (bullets|ordered|glossary)  "ordered">
<!ATTLIST form
          method  CDATA   #FIXED "POST">

3.3.3 Attribute-Value Normalization

Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.

  1. All line breaks must have been normalized on input to #xA as described in 2.11 End-of-Line Handling, so the rest of this algorithm operates on text normalized in this way.
  2. Begin with a normalized value consisting of the empty string.
  3. For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:

    • For a character reference, append the referenced character to the normalized value.
    • For an entity reference, recursively apply step 3 of this algorithm to the replacement text of the entity.
    • For a white space character (#x20, #xD, #xA, #x9), append a space character (#x20) to the normalized value.
    • For another character, append the character to the normalized value.

If the attribute type is not CDATA, then the XML processor must further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.

Note that if the unnormalized attribute value contains a character reference to a white space character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a white space character (not a reference), which is replaced with a space character (#x20) in the normalized value and also contrasts with the case where the unnormalized value contains an entity reference whose replacement text contains a white space character; being recursively processed, the white space character is replaced with a space character (#x20) in the normalized value.

All attributes for which no declaration has been read should be treated by a non-validating processor as if declared CDATA.

Following are examples of attribute normalization. Given the following declarations:

<!ENTITY d "&#xD;">
<!ENTITY a "&#xA;">
<!ENTITY da "&#xD;&#xA;">

the attribute specifications in the left column below would be normalized to the character sequences of the middle column if the attribute a is declared NMTOKENS and to those of the right columns if a is declared CDATA.

Attribute specification a is NMTOKENS a is CDATA
a="

xyz"
x y z #x20 #x20 x y z
a="&d;&d;A&a;&a;B&da;"
A #x20 B #x20 #x20 A #x20 #x20 B #x20 #x20
a=
"&#xd;&#xd;A&#xa;&#xa;B&#xd;&#xa;"
#xD #xD A #xA #xA B #xD #xA #xD #xD A #xA #xA B #xD #xD

Note that the last example is invalid (but well-formed) if a is declared to be of type NMTOKENS.

3.4 Conditional Sections

[Definition: Conditional sections are portions of the document type declaration external subset which are included in, or excluded from, the logical structure of the DTD based on the keyword which governs them.]

Conditional Section
[61]    conditionalSect    ::=    includeSect | ignoreSect
[62]    includeSect    ::=    '<![' S? 'INCLUDE' S? '[' extSubsetDecl ']]>' /* */
[VC: Proper Conditional Section/PE Nesting]
[63]    ignoreSect    ::=    '<![' S? 'IGNORE' S? '[' ignoreSectContents* ']]>' /* */
[VC: Proper Conditional Section/PE Nesting]
[64]    ignoreSectContents    ::=    Ignore ('<![' ignoreSectContents ']]>' Ignore)*
[65]    Ignore    ::=    Char* - (Char* ('<![' | ']]>') Char*)

Validity constraint: Proper Conditional Section/PE Nesting

If any of the "<![", "[", or "]]>" of a conditional section is contained in the replacement text for a parameter-entity reference, all of them must be contained in the same replacement text.

Like the internal and external DTD subsets, a conditional section may contain one or more complete declarations, comments, processing instructions, or nested conditional sections, intermingled with white space.

If the keyword of the conditional section is INCLUDE, then the contents of the conditional section are part of the DTD. If the keyword of the conditional section is IGNORE, then the contents of the conditional section are not logically part of the DTD. If a conditional section with a keyword of INCLUDE occurs within a larger conditional section with a keyword of IGNORE, both the outer and the inner conditional sections are ignored. The contents of an ignored conditional section are parsed by ignoring all characters after the "["