| Nota de Traduction |
Le version in Interlingua
de iste traduction es disponibile a: http://www.nautilus.com.br/~ensjo/ia/w3.org/TR/2000/REC-xml-20001006. Altere traductiones in Interlingua se trova a http://www.nautilus.com.br/~ensjo/ia/w3.org. Traductor: Emerson José Silveira da Costa <ensjo@nautilus.com.br>. Le version in Interlingua pote continer errores. Le version anglese de iste specification es le unic version normative. Illo es disponibile a: http://www.w3.org/TR/2000/REC-xml-20001006. Ultime version: http://www.w3.org/TR/REC-xml. |
Copyright © 2000 W3C® (MIT, INRIA, Keio), tote le derectos reservate. Es applicabile le regulas del W3C concernente responsabilitates, marcas de fabrica, uso de documentos, e licentiamento de software.
Le Linguage de Marcation Extensibile (Extensible Markup Language - XML) es un subcollection de SGML que es completemente descripte in iste documento. Su objectivo es render possibile que documentos SGML generic sia fornite, recipite e processate super le Web del mesme maniera como hodie illo es possibile con HTML. XML ha essite projectate pro presentar facilitate de implementation e interoperabilitate con SGML e HTML.
Iste documento ha essite reviste per membros del W3C e altere interessatos e ha essite indorsate per le Director como un Recommendation del W3C. Illo es un documento stabile e pote esser usate como un material de referentia o citate como un referentia normative in altere documentos. Le rolo del W3C in facer le Recommendation es attraher le attention al specification e promover su ample adoption. Illo extende le functionalitate e le interoperabilitate del Web.
Iste documento specifica un syntaxe create per extraher un subcollection de un standard international de processamento de textos de ample utilization (Standard Generalized Markup Language - SGML, ISO 8879:1986(E) emendate e corrigite) pro uso super le World Wide Web. Illo es un producto del Activitate del W3C pro XML, detalios del qual pote esser trovate a http://www.w3.org/XML. Le version anglese de iste specification es le unic version normative. Totevia, pro traductiones de iste documento, vide http://www.w3.org/XML/#trans. Un lista del Recommendationes actual del W3C e altere documentos technic es disponibile a http://www.w3.org/TR.
Iste secunde edition non es un nove version de XML (publicate initialmente in 10 de februario 1998); illo solmente incorpora le modificationes dictate per le errata del prime edition (disponibile a http://www.w3.org/XML/xml-19980210-errata) pro le commoditate del lectores. Le lista de errata pro le secunde edition es disponibile a http://www.w3.org/XML/xml-V10-2e-errata.
Per favor relata errores in iste documento a xml-editor@w3.org; archivos es disponibile.
Nota:
Le affiliation de C. M. Sperberg-McQueen ha cambiate desde le publication del prime edition. Ora ille es al World Wide Web Consortium, e pote esser contactate a cmsmcq@w3.org.
1 Introduction
1.1 Origine e objectivos
1.2 Terminologia
2 Documentos
2.1 Documentos XML ben
formate
2.2 Characteres
2.3 Constructos syntactic
commun
2.4 Datos textual e marcation
2.5 Commentarios
2.6 Instructiones de processamento
2.7 Sectiones CDATA
2.8 Prologo e declaration
de typo de documento
2.9 Declaration de documento autonome
2.10 Tractamento de spatio
in blanco
2.11 Tractamento de fin de linea
2.12 Identification de lingua
3 Structuras logic
3.1 Etiquettas de initio,
de fin, e de elemento vacue
3.2 Declarationes de typo de
elemento
3.2.1 Contento
elementar
3.2.2 Contento
mixte
3.3 Declarationes de lista de
attributos
3.3.1 Typos
de attributo
3.3.2 Valores
implicite de attributos
3.3.3 Normalization
de valores de attributo
3.4 Sectiones conditional
4 Structuras physic
4.1 Referentias a character
e entitate
4.2 Declarationes de entitate
4.2.1 Entitates
interne
4.2.2 Entitates
externe
4.3 Entitates analysate
4.3.1 Le
declaration de texto
4.3.2 Entitates
analysate ben formate
4.3.3 Codification
de characteres in entitates
4.4 Le tractamento de entitates
e referentias per le processator XML
4.4.1 Non
recognoscite
4.4.2 Includite
4.4.3 Includite
si validante
4.4.4 Prohibite
4.4.5 Includite
in litteral
4.4.6 Notificar
4.4.7 Ignorate
4.4.8 Includite
como EP
4.5 Construction de
texto de substitution de un entitate interne
4.6 Entitates predefinite
4.7 Declarationes de notation
4.8 Entitate-documento
5 Conformitate
5.1 Processatores validante
e non validante
5.2 Utilization de processatores
XML
6 Notation
A Referentias
A.1 Referentias normative
A.2 Altere referentias
B Classes de character
C XML e SGML (non normative)
D Expansion de referentias a entidade e character
(non normative)
E Modellos de contento deterministic (non normative)
F Autodetection de codificationes de character
(non normative)
F.1 Detection
sin information de codification externe
F.2 Prioritates
in le presentia de information de codification externe
G Gruppo de Travalio del W3C pro XML (non normative)
H Gruppo Central del W3C pro XML (non normative)
I Notas de production (non normative)
Le Linguage de Marcation Extensibile (Extensible Markup Language, abbreviate como XML) describe un classe de objectos de datos denominate documentos XML e describe partialmente le comportamento de programmas que los processa. XML es un profilo de application o un forma restricte de SGML, le Linguage de Marcation Standard Generalizate (Standard Generalized Markup Language [ISO 8879]). Per construction, le documentos XML es automaticamente conforme al SGML.
Le documentos XML consiste de unitates de stockage denominate entitates, que contine datos analysate ("parsed") o non analysate. Datos analysate consiste de characteres, del quales alcunes forma datos textual, e le restante forma marcation. Le marcation codifica un description del structuras logic e de stockage del documento. XML offere un mechanismo pro imponer restrictiones a iste structuras.
[Definition: Un modulo de software denominate processator XML es utilizate pro leger documentos XML e dar accesso a lor contento e structura.] [Definition: Uno assume que le processator XML age in favor de un altere modulo, denominate application.] Iste specification describe le comportamento obligatori de un processator XML in terminos de como illo debe leger le datos XML e le information que illo debe fornir al application.
XML ha essite disveloppate per un Gruppo de Travalio pro XML (originalmente cognoscite como Commission de Revision Editorial de SGML) formate sub le auspicios del World Wide Web Consortium (W3C) in 1996. Illo ha essite presidite per Jon Bosak de Sun Microsystems con le participation active de un Gruppo de Interesse Special pro XML (previemente cognoscite como le Gruppo de Travalio pro SGML) equalmente organizate per le W3C. Le lista de membros del Gruppo de Travalio pro XML es date in un appendice. Dan Connolly ha servite como le contacto del GT con le W3C.
Le objectivos de projecto de XML es:
Iste specification, junctemente con standards associate (Unicode e ISO/IEC pro characteres, Internet RFC 1766 pro etiquettas de identification de lingua, ISO 639 pro codices de nomines de linguas, e ISO 3166 pro codices de nomines de paises), provide tote le information necessari pro comprender XML version 1.0 e construer programmas pro processar lo.
Iste version del specification XML pote esser distribuite liberemente, a condition que tote le texto e notas legal remane intacte.
Le terminologia usate pro describer documentos XML es definite in le corpore de iste specification. Le terminos definite in le lista sequente es usate in le redaction de aquelle definitiones e in le description del actiones de un processator XML:
[Definition: Un objecto de datos es un documento XML si illo es ben formate, secundo le definition presente in iste specification. Un documento XML ben formate pote additionalmente esser valide si illo satisface certe requisitos ulterior.]
Cata documento XML ha un structura logic e un structura physic. Physicamente, le documento es composite de unitates denominate entitates. Un entitate pote facer referentia a altere entitates pro causar lor inclusion in le documento. Un documento comencia in un "radice" o entitate-documento. Logicamente, le documento es composite de declarationes, elementos, commentarios, referentias a characteres, e instructiones de processamento, tote le quales es indicate in le documento per marcation explicite. Le structuras logic e physic debe annidar se correctemente, secundo le description in 4.3.2 Entitates analysate ben formate.
[Definition: Un objecto textual es un documento XML ben formate si:]
| [1] | document |
::= | prolog element
Misc* |
Le correspondentia al production document implica que:
[Definition: Consequentemente,
pro cata elemento non-radice F in le documento, existe un
altere elemento P in le documento tal que F
es in le contento de P, ma non es in le contento de qualcunque
altere elemento que es in le contento de P. P
es denominate patre de F, e F es denominate
filio de P.]
[Definition: Un entitate analysate contine texto, un sequentia de characteres, que pote representar marcation o datos textual.] [Definition: Un character es un unitate atomic de texto secundo le specification ISO/IEC 10646 [ISO/IEC 10646] (vide in addition [ISO/IEC 10646-2000]). Characteres legal es tabulation, retorno de carro, avantiamento de linea, e le characteres legal de Unicode e ISO/IEC 10646. Le versiones de iste standards citate in A.1 Referentias normative era currente durante le preparation de iste documento. Nove characteres pote esser additionate a iste standards per emendamentos o nove editiones. Consequentemente, le processatores XML debe acceptar qualcunque character intra le gamma specificate pro Char. Le uso de "characteres de compatibilitate", secundo le definition in section 6.8 de [Unicode] (vide in addition D21 in le section 3.6 de [Unicode3]), es discoragiate.]
| [2] | Char |
::= | #x9 | #xA | #xD | [#x20-#xD7FF] | [#xE000-#xFFFD] | [#x10000-#x10FFFF] |
/* qualcunque character Unicode, excepte le bloccos {?surrogate?}, FFFE, e FFFF. */ |
Le mechanismo pro codificar positiones de codice de characteres in patronos de bits pote variar de entitate a entitate. Tote le processatores XML debe acceptar le codificationes UTF-8 e UTF-16 de 10646; le mechanismos pro signalar qual del duo es in uso, o pro activar altere codificationes, es discutite posteriormente, in 4.3.3 Codification de characteres in entitates.
Iste section defini alcun symbolos amplemente usate in le grammatica.
Un S (spatio in blanco) consiste de un o plus characteres de spatio (#x20), retornos de carro, avantiamentos de linea, o tabulationes.
| [3] | S |
::= | (#x20 | #x9 | #xD | #xA)+ |
Characteres es classificate pro commoditate como litteras, digitos, o altere characteres. Un littera consiste de un character de base alphabetic o syllabic o un character ideographic. Definitiones complete del characteres specific in cata classe es date in B Classes de characteres.
[Definition: Un Name
(nomine) es un symbolo comenciante con un littera o un stricte
gamma de characteres de punctuation, e continuante con litteras, digitos,
tractos de union, tractos de sublineamentos, duo punctos, o punctos, collectivemente
cognoscite como characteres de nomine.] Nomines comenciante con le catena
"xml", o qualcunque catena correspondente a (('X'|'x')
('M'|'m') ('L'|'l')), es reservate a fines de standardization in
iste o in futur versiones de iste specification.
Nota:
Le Recommendation Spatios nominal in XML [XML
Names] attribue un significato a nomines que contine characteres
de duo punctos (:). Consequentemente, le autores non deberea
usar le duo punctos in nomines XML salvo pro propositos de spatio nominal,
ma le processatores debe acceptar le duo punctos como un character de
nomine.
Un NmToken (symbolo nominal) es qualcunque mixtura de characteres de nomine.
| [4] | NameChar |
::= | Letter | Digit
| '.' | '-' | '_' | ':' | CombiningChar
| Extender |
| [5] | Name |
::= | (Letter | '_' | ':')
(NameChar)* |
| [6] | Names |
::= | Name (S Name)* |
| [7] | Nmtoken |
::= | (NameChar)+ |
| [8] | Nmtokens |
::= | Nmtoken (S Nmtoken)* |
Datos litteral es qualcunque catena delimitate per virgulettas (")
o apostropho (') que non contine le mesme character utilizate
como delimitator pro aquelle catena. Le litterales es usate pro specificar
le contento de entitates interne (EntityValue),
le valores de attributos (AttValue), e identificatores
externe (SystemLiteral). Nota que un SystemLiteral
pote esser analysate sin un examine pro trovar marcation.
| [9] | EntityValue |
::= | '"' ([^%&"] | PEReference
| Reference)* '"' |
| "'" ([^%&'] | PEReference
| Reference)* "'" |
|||
| [10] | AttValue |
::= | '"' ([^<&"] | Reference)*
'"' |
| "'" ([^<&'] | Reference)*
"'" |
|||
| [11] | SystemLiteral |
::= | ('"' [^"]* '"') | ("'" [^']*
"'") |
| [12] | PubidLiteral |
::= | '"' PubidChar* '"' |
"'" (PubidChar - "'")*
"'" |
| [13] | PubidChar |
::= | #x20 | #xD | #xA | [a-zA-Z0-9] | [-'()+,./:=?;!*#@$_%] |
Nota:
Ben que le production EntityValue permitte
le definition de un entitate consistente de un unic <
explicite in le litteral (p.ex., <!ENTITY minorque "<">),
nos consilia vehementemente que iste practica sia evitate, viste que
qualcunque referentia a aquelle entitate va provocar un error de bon
formation.
Texto consiste de un mixtura de datos textual e marcation. [Definition: Le marcation assume le forma de etiquettas de initio, etiquettas de fin, etiquettas de elemento vacue, referentias a entitates, referentias a character, commentarios, delimitatores de sectiones CDATA, declarationes de typo de documento, instructiones de processamento, declarationes XML, declarationes de texto, e qualcunque spatio in blanco que es al nivello superior del entitate-documento (isto es, exterior al elemento-documento e non interior a qualcunque altere marcation).]
[Definition: tote texto que non es marcation constitue le datos textual del documento.]
Le character ampersand (&) e le character minor-que
(<) pote apparer in lor forma litteral solmente
quando utilizate como delimitatores de marcation, o intra un commentario,
un instruction de processamento, o un section
CDATA. Si illos es necessari in altere locos, illos debe esser mascarate
per medio de referentias numeric a character
o le catenas "&" e "<"
respectivemente. Le character major-que (>) pote esser
representate per le catena ">", e debe,
a fines de compatibilitate, esser mascarate per
">" o in un referentia a character quando
illo appare in le catena "]]>" in le contento,
quando aquelle catena non marca le fin de un section
CDATA.
In le contento de elementos, dato textual es qualcunque catena que non
contine le delimitator de initio de qualcunque marcation. In un section
CDATA, dato textual es qualcunque catena de characteres que non include
le le delimitator de fin de section CDATA, "]]>".
Pro permitter que valores de attributo contine e virgulettas e apostrophos,
le apostropho (') pote esser representate como "'",
e le virgulettas (") como """.
| [14] | CharData |
::= | [^<&]* - ([^<&]* ']]>' [^<&]*) |
[Definition: Commentarios
pote apparer ubique in un documento, exterior a altere marcationes;
in plus, illos pote apparer intra le declaration de typo de documento
in locos permittite per le grammatica. Illos non es parte del datos
textual del documento; un processator XML pote, ma non es obligate
a, permitter que un application recupera le texto del commentarios. A
fines de compatibilitate, le catena "--" (duo
tractos de union) non debe occurrer intra commentarios.] Referentias a
entitates-parametro non es recognoscite intra commentarios.
| [15] | Comment |
::= | '<!--' ((Char - '-') | ('-' (Char
- '-')))* '-->' |
Un exemplo de commentario:
<!-- declarationes pro <head> & <body> --> |
Nota que le grammatica non permitte un commentario terminate in --->.
Le exemplo sequente non es ben formate.
<!-- B+, B, o B---> |
[Definition: Instructiones de processamento (IPs) permitte que documentos contine instructiones pro le applicationes.]
| [16] | PI |
::= | '<?' PITarget (S
(Char* - (Char* '?>'
Char*)))? '?>' |
| [17] | PITarget |
::= | Name - (('X' | 'x') ('M' | 'm') ('L'
| 'l')) |
Le IPs non es parte del datos textual del
documento, ma illos debe esser transmittite al application. Le IP comencia
con un scopo (PITarget) usate pro identificar
le application al qual le instruction es dirigite. Le nomines de scopo
"XML", "xml", etc. es reservate
pro standardization in iste o in futur versiones de iste specification.
Le mechanismo de notation XML pote esser usate
pro le declaration formal de scopos de IP. Referentias a entitates-parametro
non es recognoscite intra instructiones de processamento.
[Definition: Sectiones
CDATA pote occurrer in qualcunque loco ubi datos textual pote occurrer;
illos es usate pro mascarar bloccos de texto que contine characteres que
poterea de altere modo esser interpretate como marcation. Sectiones CDATA
comencia con le catena "<![CDATA[" e termina
con le catena "]]>":]
| [18] | CDSect |
::= | CDStart CData
CDEnd |
| [19] | CDStart |
::= | '<![CDATA[' |
| [20] | CData |
::= | (Char* - (Char*
']]>' Char*)) |
| [21] | CDEnd |
::= | ']]>' |
Intra un section CDATA, solmente le catena CDEnd
es recognoscite como marcation, de maniera que characteres de minor-que
e ampersands pote occurrer in lor forma litteral; non existe necessitate
(ni possibilitate) de mascarar los per medio de "<"
e "&". Sectiones CDATA non pote annidar
se.
Un exemplo de section CDATA, in le qual "<salutation>"
e "</salutation>" es interpretate como dato
textual, e non como marcation:
<![CDATA[<salutation>Bon die!</salutation>]]> |
[Definition: Documentos XML deberea comenciar con un declaration XML que specifica le version de XML utilizate.] Pro exemplo, le sequente es un documento XML complete, ben formate ma non valide:
<?xml version="1.0"?> <salutation>Bon die!</salutation> |
e equalmente:
<salutation>Bon die!</salutation> |
Le numero de version "1.0" deberea esser usate
pro indicar le conformitate a iste version de iste specification; si un
documento usa le valor "1.0" e illo non se conforma
a iste version de iste specification, isto constitue un error. Le Gruppo
de Travalio pro XML ha le intention de dar al versiones posterior de iste
specification numeros differente de "1.0", ma iste
intention non indica un promissa de producer versiones futur de XML, ni,
si alcun sera producite, de usar alcun schema particular de numeration.
Viste que le possibilitate de versiones futur non es excludite, iste constructo
es fornite como un medio pro permitter le possibilitate de recognition
automatic de versiones, si illo sera necessari. Le processatores pote
signalar un error si illos recipe documentos etiquettate con versiones
que illos non supporta.
Le function del marcation in un documento XML es describer su structura logic e de stockage e associar pares attributo-valor a su structuras logic. XML provide un mechanismo, le declaration de typo de documento, pro definir restrictiones super le structura logic e supportar le uso de unitates de stockage predefinite. [Definition: Un documento XML es valide si illo ha un declaration de typo de documento associate e si le documento obedi al restrictiones expresse in illo.]
Le declaration de typo de documento debe apparer ante le prime elemento del documento.
| [22] | prolog |
::= | XMLDecl? Misc*
(doctypedecl Misc*)? |
| [23] | XMLDecl |
::= | '<?xml' VersionInfo EncodingDecl?
SDDecl? S? '?>' |
| [24] | VersionInfo |
::= | S 'version' Eq ("'"
VersionNum "'" | '"' VersionNum
'"')/* */ |
| [25] | Eq |
::= | S? '=' S? |
| [26] | VersionNum |
::= | ([a-zA-Z0-9_.:] | '-')+ |
| [27] | Misc |
::= | Comment | PI
| S |
[Definition: Le declaration de typo de documento XML contine o puncta a declarationes de marcation que provide un grammatica pro un classe de documentos. Iste grammatica es cognite como le definition de typo de documento, o DTD. Le declaration de typo de documento pote punctar a un subcollection externe (un typo special de entitate externe) continente declarationes de marcation, o pote continer le declarationes de marcation directemente in un subcollection interne, o facer le duo cosas. Le DTD pro un documento consiste del duo subcollectiones prendite conjunctemente.]
[Definition: Un declaration de marcation es un declaration de typo de elemento, un declaration de lista de attributos, un declaration de entitate, o un declaration de notation.] Iste declarationes pote esser contenite total- o partialmente intra entitates-parametro, secundo le description in le requisitos de bon formation e validitate infra. Pro information ulterior, vide 4 Structuras physic.
| [28] | doctypedecl |
::= | '<!DOCTYPE' S Name
(S ExternalID)? S?
('[' (markupdecl | DeclSep)*
']' S?)? '>' |
[RV: Typo de elemento-radice] |
| [RBF: Subcollection externe] | ||||
| /* */ | ||||
| [28a] | DeclSep |
::= | PEReference | S |
[RBF: EP inter declarationes] |
| /* */ | ||||
| [29] | markupdecl |
::= | elementdecl | AttlistDecl
| EntityDecl | NotationDecl
| PI | Comment |
[RV: Annidamento correcte de declarationes/EPs] |
| [RBF: EPs in subcollection interne] |
Nota que es possibile construer un documento ben formate continente un doctypedecl que non puncta a un subcollection externe ni contine un subcollection interne.
Le declarationes de marcation pote consister total- o partialmente del texto de substitution de entitates-parametro. Le productiones sequente de iste specification pro nonterminales individual (elementdecl, AttlistDecl, etc.) describe le declarationes post que tote le entitates-parametro ha essite includite.
Referentias a entitates-parametro es recognoscite ubique in le DTD (subcollectiones interne e externe e entitates-parametro externe), salvo in litterales, instructiones de processamento, commentarios, e le contento de sectiones conditional ignorate (vide 3.4 Sectiones conditional). Illos es equalmente recognoscite in litterales de valor de entitate. Le utilization de entitates-parametro in le subcollection interne es restricte secundo le description infra.
Requisito de validitate: Typo del elemento-radice
Le Name in le declaration de typo de documento debe corresponder al typo de elemento del elemento-radice.
Requisito de validitate: Annidamento correcte de declarationes/EPs
Le texto de substitution de entitates-parametro debe esser correctemente annidate con declarationes de marcation. Isto significa que, si le prime o le ultime character de un declaration de marcation (markupdecl supra) es contenite in le texto de substitution pro un referentia a entitate-parametro, ambes debe esser contenite in le mesme texto de substitution.
Requisito de bon formation: EPs in subcollection interne
In le subcollection interne del DTD, referentias a entitates-parametro pote occurrer solmente ubi declarationes de marcation pote occurrer, non intra declarationes de marcation. (Isto non se applica a referentias que occurre in entitates-parametro externe o al subcollection externe.)
Requisito de bon formation: Subcollection externe
Le subcollection externe, si illo existe, debe corresponder al production pro extSubset.
Requisito de bon formation: EP inter declarationes
Le texto de substitution de un referentia a entitate-parametro in un DeclSep debe corresponder al production extSubsetDecl.
Tal como le subcollection interne, le subcollection externe e qualcunque entitates-parametro externe referite in un DeclSep debe consister de un serie de declarationes de marcation complete del typos permittite per le symbolo nonterminal markupdecl, intercalate con spatio in blanco o referentias a entitates-parametro. Totevia, portiones del contento del subcollection externe o de iste entitates-parametro externe pote conditionalmente esser ignorate per medio del constructo section conditional; isto non es permittite in le subcollection interne.
| [30] | extSubset |
::= | TextDecl? extSubsetDecl |
|
| [31] | extSubsetDecl |
::= | ( markupdecl | conditionalSect
| DeclSep)* |
/* */ |
Le subcollection externe e le entitates-parametro externe differe equalmente del subcollection interne in le facto que, in illos, referentias a entitates-parametro es permittite intra declarationes de marcation, non solo inter declarationes de marcation.
Un exemplo de un documento XML con un declaration de typo de documento:
<?xml version="1.0"?> <!DOCTYPE salutation SYSTEM "bondie.dtd"> <salutation>Bon die!</salutation> |
Le identificator de systema "bondie.dtd"
da le adresse (un referentia URI) de un DTD pro le documento.
Le declarationes pote tamben esser date localmente, como in iste exemplo:
<?xml version="1.0" encoding="UTF-8" ?> <!DOCTYPE salutation [ <!ELEMENT salutation (#PCDATA)> ]> <salutation>Bon die!</salutation> |
Si le subcollectiones interne e externe es utilizate simultaneemente, se considera que le subcollection interne occurre ante le subcollection externe. Isto ha le effecto que le declarationes de entitate e lista de attributos in le subcollection interne ha precedentia super aquelles in le subcollection interne.
Declarationes de marcation pote affectar le contento del documento, quando passate de un processator XML a un application; exemplos es le valores implicite de attributos e le declarationes de entitate. Le declaration de documento autonome, que pote apparer como un componente de un declaration XML, signa si il ha o non tal declarationes que appare externe al entitate-documento o in entitates-parametro. [Definition: Un declaration de marcation externe es definite como un declaration de marcation que occurre in le subcollection externe o in un entitate-parametro (externe o interne, le ultime essente includite perque processatores non validante non es obligate a leger los).]
| [32] | SDDecl |
::= | S 'standalone' Eq
(("'" ('yes' | 'no') "'") | ('"' ('yes' |
'no') '"')) |
[RV: Declaration de documento autonome] |
In un declaration de documento autonome, le valor "yes" indica que il ha necun declarationes de marcation externe que affecta le information passate del processator XML al application. Le valor "no" indica que existe o pote exister declarationes de marcation externe. Nota que le declaration de documento autonome denota solmente le presentia de declarationes; le presentia, in un documento, de referentias a entitates externe, quando aquelle entitates es internemente declarate, non cambia su stato autonome.
Si il ha necun declarationes de marcation externe, le declaration de documento autonome ha necun significato. Si il ha declarationes de marcation externe ma il ha necun declaration de documento autonome, le valor "no" es assumite.
Qualcunque documento XML pro le qual standalone="no"
es ver pote esser convertite algorithmicamente a un documento autonome,
lo que pote esser desirabile pro alcun applicationes de livration per
rete.
Requisito de validitate: Declaration de documento autonome
Le declaration de documento autonome debe haber le valor "no" si qualcunque declaration de marcation externe contine declarationes de:
amp, lt, gt,
apos, quot), si referentias
a aquelle entitates appare in le documento, oUn exemplo de declaration XML con un declaration de documento autonome:
<?xml version="1.0" standalone='yes'?> |
In editar documentos XML, frequentemente es conveniente usar "spatio in blanco" (spatios, tabulationes, e lineas in blanco) pro separar le marcation pro augmentar le legibilitate. Typicamente, non se intende que aquelle spatios in blanco sia includite in le version livrate del documento. Del altere parte, es commun haber spatio in blanco que debe esser preservate in le version livrate, per exemplo in poesia e codice-fonte.
Un processator XML debe sempre passar tote le characteres in un documento que non es marcation al application. Un processator XML validante debe in ultra informar al application qual de iste characteres constitue spatio in blanco que appare in le contento-elemento.
Un attributo special appellate xml:space
pote esser annexate a un elemento pro signalar un intention que in aquelle
elemento, le spatio in blanco deberea esser preservate per le applicationes.
In documentos valide, iste attributo, como qualcunque altere, debe esser
declarate si illo es utilizate. Quando declarate,
illo debe esser date como un typo enumerate
cuje valores es un intra "default" e "preserve", o
ambes. Per exemplo:
<!ATTLIST poema xml:space (default|preserve) 'preserve'> <!-- --> <!ATTLIST pre xml:space (preserve) #FIXED 'preserve'> |
Le valor "default" signala que le modos implicite de processamento
de spatios in blanco per le applicationes es acceptabile pro iste elemento;
le valor "preserve" indica le intention que le applicationes preserva
tote le spatios in blanco. Iste intention declarate se
applica a tote le elementos in le contento del elemento ubi illo es specificate,
salvo quando superposite per un altere instantia del attributo xml:space.
Le elemento-radice de qualcunque documento es considerante como non habente signalate qualcunque intention quanto al tractamento de spatios in blanco, salvo quando illo forni un valor pro iste attributo o le attributo es declarate con un valor implicite.
Le entitates analysate XML es frequentemente stockate in archivos de computator que, pro commoditate de edition, es organizate in lineas. Iste lineas es typicamente separate per alcun combination de characteres de retorno de carro (#xD) e avantiamento de linea (#xA).
Pro simplificar le travalio del applicationes, le characteres passate a un application per le processator XML debe esser como si le processator XML normalizarea tote le rupturas de linea in entitates analysate externe (includente le entitate-documento) in le momento del entrata, ante le analyse, per traducer tanto le sequentia de duo characteres #xD #xA quanto qualcunque #xD que non es sequite per #xA in un sol character #xA.
In le processamento de documentos, frequentemente es utile identificar le
lingua natural o formal in que le contento es scripte. Un attributo
special denominate xml:lang pote esser inserite in documentos
pro specificar le lingua utilizate in le contento e in le valores de attributo
de qualcunque elemento in un documento XML. In documentos valide, iste
attributo, como qualcunque altere, debe esser declared
si illo es utilizate. Le valores del attributo es identificatores de
lingua tal como definite per [IETF RFC 1766],
Etiquettas pro le identification de linguas, o su successor
in le sequentia de standards IETF.
Nota:
Le etiquettas [IETF RFC 1766] es construite de codices de duo litteras pro linguas tal como definite per [ISO 639], de codices de duo litteras pro paises tal como definite per [ISO 3166], o de identificatores de linguas registrate con le Internet Assigned Numbers Authority [IANA-LANGCODES]. Uno expecta que le successor de [IETF RFC 1766] introducera codices de tres litteras pro linguas, pro includer linguas ancora non coperte per [ISO 639].
(Le productiones 33 a 38 ha essite eliminate.)
Per exemplo:
<p xml:lang="en">The quick brown fox jumps over the lazy dog.</p> <p xml:lang="en-GB">What colour is it?</p> <p xml:lang="en-US">What color is it?</p> <sp who="Faust" desc='leise' xml:lang="de"> <l>Habe nun, ach! Philosophie,</l> <l>Juristerei, und Medizin</l> <l>und leider auch Theologie</l> <l>durchaus studiert mit heißem Bemüh'n.</l> </sp> |
Le intention declarate con xml:lang se applica a tote
le attributos e al contento del elemento ubi illo es specificate,
salvo quando superposite per un instantia de xml:lang
in un altere elemento intra aquelle contento.
Un simple declaration pro xml:lang pote assumer le forma
xml:lang NMTOKEN #IMPLIED |
ma valores implicite specific pote equalmente esser attribuite, si necessari.
In un collection de poemas francese con glossas
e notas in interlingua, le attributo xml:lang poterea esser declarate
assi:
<!ATTLIST poema xml:lang NMTOKEN 'fr'> <!ATTLIST glossa xml:lang NMTOKEN 'ia'> <!ATTLIST nota xml:lang NMTOKEN 'ia'> |
[Definition: Cata documento XML contine un o plus elementos, cuje frontieras es delimiate per etiquettas de initio e de fin, o, in le caso de elementos vacue, per un etiquetta de elemento vacue. Cata elemento ha un typo, identificate per nomine, a vices appellate su "identificator generic" (IG), e pote haber un collection de specificationes de attributo.] Cata specification de attributo ha un nomine e un valor.
| [39] | element |
::= | EmptyElemTag |
|
| STag content
ETag |
[RBF: Correspondentia de typo de elemento] | |||
| [RV: Elemento valide] |
Iste specification non restringe le semantica, uso, o (ultra le syntaxe)
nomines del typos de typos de elemento e attributos, excepte que le
nomines que comencia con un correspondentia a (('X'|'x')('M'|'m')('L'|'l'))
es reservate pro standardization in iste o futur versiones de iste specification.
Requisito de bon formation: Correspondentia de typo de elemento
Le Name in le etiquetta de fin de un elemento debe corresponder al typo de elemento in le etiquetta de initio.
Requisito de validitate: Elemento valide
Un elemento es valide si il ha un declaration correspondente a elementdecl ubi le Name corresponde al typo de elemento, e un del sequentes es ver:
[Definition: Le initio de tote elemento XML non vacue es marcate per un etiquetta de initio.]
| [40] | STag |
::= | '<' Name (S
Attribute)* S? '>' |
[RBF: Spec. unic de attributo] |
| [41] | Attribute |
::= | Name Eq AttValue |
[RV: Typo de valor de attributo] |
| [RBF: Necun referentia a entitates externe] | ||||
| [RBF: Valores de attributo sin <] |
Le Name in le etiquettas de initio e de fin specifica le
typo del elemento. [Definition:
Le pares Name-AttValue
es denominate specificationes de attributo del elemento],
[Definition: ubi le
Name in cata par es denominate nomine del attributo]
e [Definition:
le contento del AttValue (le texto inter le
delimitatores ' o ") es denominate valor del
attributo.] Nota que le ordine del specificationes de attributo in un
etiquetta de initio o etiquetta de elemento vacue non es significative.
Requisito de bon formation: Specification unic de attributo
Necun nomine de attributo pote apparar plus que un vice in le mesme etiquetta de initio o etiquetta de elemento vacue.
Requisito de validitate: Typo de valor de attributo
Le attributo debe haber essite declarate; le valor debe esser del typo declarate pro illo. (Pro typos de attributo, vide 3.3 Declarationes de lista de attributos.)
Requisito de bon formation: Necun referentia a entitates externe
Le valores de attributo non pote continer referentias directe o indirecte a entitates externe.
Requisito de bon formation:
Valores de attributo sin <
Le texto de substitution de un entitate referite
directe- o indirectemente in un valor de attributo non debe continer un <.
Exemplo de un etiquetta de initio:
<deftermino id="dt-catto" termino="catto"> |
[Definition: Le fin de tote elemento que comencia con un etiquetta de initio debe esser marcate per un etiquetta de fin continente un nomine que reflecte le typo de elemento specificate in le etiquetta de initio:]
| [42] | ETag |
::= | '</' Name S?
'>' |
Exemplo de un etiquetta de fin:
</deftermino> |
[Definition: Le texto inter le etiquettas de initio e de fin de un elemento es denominate su contento:]
| [43] | content |
::= | CharData? ((element
| Reference | CDSect
| PI | Comment) CharData?)* |
/* */ |
[Definition: Un elemento sin contento es denominate vacue.] Le representation de un elemento vacue es o un etiquetta de initio immediatemente sequite per un etiquetta de fin, o un etiquetta de elemento vacue. [Definition: Un etiquetta de elemento vacue assume un forma special:]
| [44] | EmptyElemTag |
::= | '<' Name (S
Attribute)* S? '/>' |
[RBF: Specif. unic de attributo] |
Etiquettas de elemento vacue pote esser utilizate pro qualcunque elemento que non ha contento, non importante si illo ha o non essite declarate con le parola-clave EMPTY. A fines de interoperabilitate, le etiquetta de elemento vacue deberea esser utilizate, e deberea solmente esser utilizate, pro elementos que es declarate como EMPTY.
Exemplos de elementos vacue:
<IMG align="left" src="http://www.w3.org/Icons/WWW/w3c_home" /> <br></br> <br/> |
Le structura de elementos de un documento XML pote, a fines de validation, esser restringite per medio del utilization de declarationes de typos de elemento e lista de attributos. Un declaration de typo de elemento restringe le contento del elemento.
Le declarationes de typo de elemento frequentemente restringe qual typos de elemento pote apparer como filios del elemento. A grado del utilizator, un processator XML pote emitter un aviso quanto un declaration mentiona un typo de elemento pro le qual necun declaration ha essite fornite, ma isto non constitue un error.
[Definition: Un declaration de typo de elemento assume le forma:]
| [45] | elementdecl |
::= | '<!ELEMENT' S Name
S contentspec S?
'>' |
[RV: Declaration unic de typo de elemento] |
| [46] | contentspec |
::= | 'EMPTY' | 'ANY' | Mixed | children
|
ubi le Name da le typo del elemento declarate.
Requisito de validitate: Declaration unic de typo de elemento
Necun elemento pote esser declarate plus que un vice.
Exemplos de declarationes de typo de elemento:
<!ELEMENT br EMPTY> <!ELEMENT p (#PCDATA|emph)* > <!ELEMENT %name.para; %content.para; > <!ELEMENT container ANY> |
[Definition: An element type has element content when elements of that type must contain only child elements (no character data), optionally separated by white space (characters matching the nonterminal S).][Definition: In this case, the constraint includes a content model, a simple grammar governing the allowed types of the child elements and the order in which they are allowed to appear.] The grammar is built on content particles (cps), which consist of names, choice lists of content particles, or sequence lists of content particles:
| [47] | children |
::= | (choice | seq)
('?' | '*' | '+')? |
|
| [48] | cp |
::= | (Name | choice
| seq) ('?' | '*' | '+')? |
|
| [49] | choice |
::= | '(' S? cp ( S?
'|' S? cp )+ S?
')' |
/* */ |
| /* */ | ||||
| [VC: Proper Group/PE Nesting] | ||||
| [50] | seq |
::= | '(' S? cp ( S?
',' S? cp )* S?
')' |
/* */ |
| [VC: Proper Group/PE Nesting] |
where each Name is the type of an element which
may appear as a child. Any content particle
in a choice list may appear in the element
content at the location where the choice list appears in the grammar;
content particles occurring in a sequence list must each appear in the
element content in the order given in
the list.. The optional character following a name or list governs whether
the element or the content particles in the list may occur one or more
(+), zero or more (*), or zero or one times
(?). The absence of such an operator means that the element
or content particle must appear exactly once. This syntax and meaning
are identical to those used in the productions in this specification.
The content of an element matches a content model if and only if it is possible to trace out a path through the content model, obeying the sequence, choice, and repetition operators and matching each element in the content against an element type in the content model. For compatibility, it is an error if an element in the document can match more than one occurrence of an element type in the content model. For more information, see E Deterministic Content Models.
Validity constraint: Proper Group/PE Nesting
Parameter-entity replacement text must be properly nested with parenthesized groups. That is to say, if either of the opening or closing parentheses in a choice, seq, or Mixed construct is contained in the replacement text for a parameter entity, both must be contained in the same replacement text.
For interoperability, if a parameter-entity
reference appears in a choice, seq,
or Mixed construct, its replacement text should
contain at least one non-blank character, and neither the first nor
last non-blank character of the replacement text should be a connector
(| or ,).
Examples of element-content models:
<!ELEMENT spec (front, body, back?)> <!ELEMENT div1 (head, (p | list | note)*, div2*)> <!ELEMENT dictionary-body (%div.mix; | %dict.mix;)*> |
[Definition: An element type has mixed content when elements of that type may contain character data, optionally interspersed with child elements.] In this case, the types of the child elements may be constrained, but not their order or their number of occurrences:
| [51] | Mixed |
::= | '(' S? '#PCDATA' (S?
'|' S? Name)* S?
')*' |
|
| '(' S? '#PCDATA' S?
')' |
[VC: Proper Group/PE Nesting] | |||
| [VC: No Duplicate Types] |
where the Names give the types of elements that may appear as children. The keyword #PCDATA derives historically from the term "parsed character data."
Validity constraint: No Duplicate Types
The same name must not appear more than once in a single mixed-content declaration.
Examples of mixed content declarations:
<!ELEMENT p (#PCDATA|a|ul|b|i|em)*> <!ELEMENT p (#PCDATA | %font; | %phrase; | %special; | %form;)* > <!ELEMENT b (#PCDATA)> |
Attributes are used to associate name-value pairs with elements. Attribute specifications may appear only within start-tags and empty-element tags; thus, the productions used to recognize them appear in 3.1 Start-Tags, End-Tags, and Empty-Element Tags. Attribute-list declarations may be used:
[Definition: Attribute-list declarations specify the name, data type, and default value (if any) of each attribute associated with a given element type:]
| [52] | AttlistDecl |
::= | '<!ATTLIST' S Name
AttDef* S? '>' |
| [53] | AttDef |
::= | S Name S
AttType S DefaultDecl |
The Name in the AttlistDecl rule is the type of an element. At user option, an XML processor may issue a warning if attributes are declared for an element type not itself declared, but this is not an error. The Name in the AttDef rule is the name of the attribute.
When more than one AttlistDecl is provided for a given element type, the contents of all those provided are merged. When more than one definition is provided for the same attribute of a given element type, the first declaration is binding and later declarations are ignored. For interoperability, writers of DTDs may choose to provide at most one attribute-list declaration for a given element type, at most one attribute definition for a given attribute name in an attribute-list declaration, and at least one attribute definition in each attribute-list declaration. For interoperability, an XML processor may at user option issue a warning when more than one attribute-list declaration is provided for a given element type, or more than one attribute definition is provided for a given attribute, but this is not an error.
XML attribute types are of three kinds: a string type, a set of tokenized types, and enumerated types. The string type may take any literal string as a value; the tokenized types have varying lexical and semantic constraints. The validity constraints noted in the grammar are applied after the attribute value has been normalized as described in 3.3 Attribute-List Declarations.
| [54] | AttType |
::= | StringType | TokenizedType
| EnumeratedType |
|
| [55] | StringType |
::= | 'CDATA' |
|
| [56] | TokenizedType |
::= | 'ID' |
[VC: ID] |
| [VC: One ID per Element Type] | ||||
| [VC: ID Attribute Default] | ||||
| 'IDREF' |
[VC: IDREF] | |||
| 'IDREFS' |
[VC: IDREF] | |||
| 'ENTITY' |
[VC: Entity Name] | |||
| 'ENTITIES' |
[VC: Entity Name] | |||
| 'NMTOKEN' |
[VC: Name Token] | |||
| 'NMTOKENS' |
[VC: Name Token] |
Values of type ID must match the Name production. A name must not appear more than once in an XML document as a value of this type; i.e., ID values must uniquely identify the elements which bear them.
Validity constraint: One ID per Element Type
No element type may have more than one ID attribute specified.
Validity constraint: ID Attribute Default
An ID attribute must have a declared default of #IMPLIED or #REQUIRED.
Values of type IDREF must match the Name production, and values of type IDREFS must match Names; each Name must match the value of an ID attribute on some element in the XML document; i.e. IDREF values must match the value of some ID attribute.
Validity constraint: Entity Name
Values of type ENTITY must match the Name production, values of type ENTITIES must match Names; each Name must match the name of an unparsed entity declared in the DTD.
Validity constraint: Name Token
Values of type NMTOKEN must match the Nmtoken production; values of type NMTOKENS must match Nmtokens.
[Definition: Enumerated attributes can take one of a list of values provided in the declaration]. There are two kinds of enumerated types:
| [57] | EnumeratedType |
::= | NotationType | Enumeration |
|
| [58] | NotationType |
::= | 'NOTATION' S '(' S?
Name (S? '|' S?
Name)* S? ')' |
[VC: Notation Attributes] |
| [VC: One Notation Per Element Type] | ||||
| [VC: No Notation on Empty Element] | ||||
| [59] | Enumeration |
::= | '(' S? Nmtoken
(S? '|' S? Nmtoken)*
S? ')' |
[VC: Enumeration] |
A NOTATION attribute identifies a notation, declared in the DTD with associated system and/or public identifiers, to be used in interpreting the element to which the attribute is attached.
Validity constraint: Notation Attributes
Values of this type must match one of the notation names included in the declaration; all notation names in the declaration must be declared.
Validity constraint: One Notation Per Element Type
No element type may have more than one NOTATION attribute specified.
Validity constraint: No Notation on Empty Element
For compatibility, an attribute of type NOTATION must not be declared on an element declared EMPTY.
Validity constraint: Enumeration
Values of this type must match one of the Nmtoken tokens in the declaration.
For interoperability, the same Nmtoken should not occur more than once in the enumerated attribute types of a single element type.
An attribute declaration provides information on whether the attribute's presence is required, and if not, how an XML processor should react if a declared attribute is absent in a document.
| [60] | DefaultDecl |
::= | '#REQUIRED' | '#IMPLIED' |
|
| (('#FIXED' S)? AttValue) |
[VC: Required Attribute] | |||
| [VC: Attribute Default Legal] | ||||
| [WFC: No < in Attribute Values] | ||||
| [VC: Fixed Attribute Default] |
In an attribute declaration, #REQUIRED means that the attribute must always be provided, #IMPLIED that no default value is provided. [Definition: If the declaration is neither #REQUIRED nor #IMPLIED, then the AttValue value contains the declared default value; the #FIXED keyword states that the attribute must always have the default value. If a default value is declared, when an XML processor encounters an omitted attribute, it is to behave as though the attribute were present with the declared default value.]
Validity constraint: Required Attribute
If the default declaration is the keyword #REQUIRED, then the attribute must be specified for all elements of the type in the attribute-list declaration.
Validity constraint: Attribute Default Legal
The declared default value must meet the lexical constraints of the declared attribute type.
Validity constraint: Fixed Attribute Default
If an attribute has a default value declared with the #FIXED keyword, instances of that attribute must match the default value.
Examples of attribute-list declarations:
<!ATTLIST termdef
id ID #REQUIRED
name CDATA #IMPLIED>
<!ATTLIST list
type (bullets|ordered|glossary) "ordered">
<!ATTLIST form
method CDATA #FIXED "POST">
|
Before the value of an attribute is passed to the application or checked for validity, the XML processor must normalize the attribute value by applying the algorithm below, or by using some other method such that the value passed to the application is the same as that produced by the algorithm.
For each character, entity reference, or character reference in the unnormalized attribute value, beginning with the first and continuing to the last, do the following:
If the attribute type is not CDATA, then the XML processor must further process the normalized attribute value by discarding any leading and trailing space (#x20) characters, and by replacing sequences of space (#x20) characters by a single space (#x20) character.
Note that if the unnormalized attribute value contains a character reference to a white space character other than space (#x20), the normalized value contains the referenced character itself (#xD, #xA or #x9). This contrasts with the case where the unnormalized value contains a white space character (not a reference), which is replaced with a space character (#x20) in the normalized value and also contrasts with the case where the unnormalized value contains an entity reference whose replacement text contains a white space character; being recursively processed, the white space character is replaced with a space character (#x20) in the normalized value.
All attributes for which no declaration has been read should be treated by a non-validating processor as if declared CDATA.
Following are examples of attribute normalization. Given the following declarations:
<!ENTITY d "
"> <!ENTITY a "
"> <!ENTITY da "
"> |
the attribute specifications in the left column below would be normalized
to the character sequences of the middle column if the attribute a
is declared NMTOKENS and to those of the right columns if a
is declared CDATA.
| Attribute specification | a is NMTOKENS | a is CDATA | |
|---|---|---|---|
|
x y z |
#x20 #x20 x y z |
|
|
A #x20 B |
#x20 #x20 A #x20 #x20 B #x20 #x20 |
|
|
#xD #xD A #xA #xA B #xD #xA |
#xD #xD A #xA #xA B #xD #xD |
Note that the last example is invalid (but well-formed) if a
is declared to be of type NMTOKENS.
[Definition: Conditional sections are portions of the document type declaration external subset which are included in, or excluded from, the logical structure of the DTD based on the keyword which governs them.]
| [61] | conditionalSect |
::= | includeSect | ignoreSect |
|
| [62] | includeSect |
::= | '<![' S? 'INCLUDE' S? '[' extSubsetDecl
']]>' |
/* */ |
| [VC: Proper Conditional Section/PE Nesting] | ||||
| [63] | ignoreSect |
::= | '<![' S? 'IGNORE' S? '[' ignoreSectContents*
']]>' |
/* */ |
| [VC: Proper Conditional Section/PE Nesting] | ||||
| [64] | ignoreSectContents |
::= | Ignore ('<![' ignoreSectContents
']]>' Ignore)* |
|
| [65] | Ignore |
::= | Char* - (Char*
('<![' | ']]>') Char*) |
Validity constraint: Proper Conditional Section/PE Nesting
If any of the "<![", "[",
or "]]>" of a conditional section is contained
in the replacement text for a parameter-entity reference, all of them
must be contained in the same replacement text.
Like the internal and external DTD subsets, a conditional section may contain one or more complete declarations, comments, processing instructions, or nested conditional sections, intermingled with white space.
If the keyword of the conditional section is INCLUDE, then the
contents of the conditional section are part of the DTD. If the keyword
of the conditional section is IGNORE, then the contents of the
conditional section are not logically part of the DTD. If a conditional
section with a keyword of INCLUDE occurs within a larger conditional
section with a keyword of IGNORE, both the outer and the inner
conditional sections are ignored. The contents of an ignored conditional
section are parsed by ignoring all characters after the "["