XML Syntax
XML Declaration
The XML document can optionally have an XML declaration.
<?xml version = "1.0" encoding = "UTF-8"?>
where:
- version is the XML version
- encoding specifies the character encoding used in the document.
Syntax Rules for XML Declaration
-
The XML declaration is case sensitive and it must begin with
<?xml>
wherexml
is written in lower-case. -
If the document contains an XML declaration, then it must be strictly the first declaration in the XML document.
-
The XML declaration must be the first declaration in the XML document.
-
An HTTP protocol can override the value of encoding that you put in the XML declaration.
Tags and Elements
An XML file consists of several XML elements, also called XML nodes or XML tags.
XML element names are enclosed in angle brackets <
>
as shown below:
<element>
Syntax Rules for Tags and Elements
Element Syntax
Each XML element must be closed either with start or end elements
<element>....</element>
or in simple cases
<element/>
Nesting of Elements
An XML element may contain multiple XML elements as children, but the child elements must not overlap, i.e., an end tag of an element must have the same name as that of the most recent unpaired start tag
A wrong nested tags example:
<?xml version = "1.0"?>
<contact-info>
<company>TutorialReference
</contact-info>
</company>
A correct nested tags example:
<?xml version = "1.0"?>
<contact-info>
<company>TutorialsPoint</company>
<contact-info>
Root Element
An XML document can only have one root element.
An example of wrong
<x>...</x>
<y>...</y>
The Following example shows a correctly formed XML document
<root>
<x>...</x>
<y>...</y>
</root>
Case Sensitivity
he names of XML-elements are case-sensitive. That means the name of the start and the end elements need to be exactly in the same case.
For example, <address>
is different from <Address>
XML Attributes
An attribute specifies a single property for the element, using a name/value pair. An XML-element can have one or more attributes.
<a href="https://tutorialreference.com/">TutorialReference</a>
Here href is the attribute name and https://tutorialreference.com is attribute value.
Syntax Rules for XML Attributes
-
Attribute names in XML (unlike HTML) are case sensitive. That is, HREF and href are considered two different XML attributes.
-
Same attribute cannot have two values in a syntax.
-
Attribute names are defined without quotation marks, whereas attribute values must always appear in quotation marks.
XML References
References usually allow you to add or include additional text or markup in an XML document. References always begin with the symbol &
which is a reserved character and end with the symbol ;
.
XML has two types of references:
- Entity References − An entity reference contains a name between the start and the end delimiters. For example
&
where amp is name. The name refers to a predefined string of text and/or markup. - Character References − These contain references, such as
A
, contains a hash mark (#
) followed by a number. The number always refers to the Unicode code of a character. In this case, 65 refers to alphabetA
.
XML Text
XML element names and XML attribute names are case-sensitive, which means that initial and final element names must be written in the same case. To avoid character encoding problems, all XML files should be saved as UTF-8 or UTF-16 Unicode files.
White space characters such as blanks, tabs, and line breaks between XML elements and between XML attributes will be ignored.
Some characters are reserved by the XML syntax itself, so they cannot be used directly. They are summarized in the table below:
Not Allowed Character | Replacement Entity | Character Description |
---|---|---|
< | < | less than |
> | > | greater than |
& | & | ampersand |
' | ' | apostrophe |
" | " | quotation mark |
Only <
and &
are strictly illegal in XML, but it is a good habit to replace >
with >
as well.
Additional guidelines
Comments in XML
The syntax for writing comments in XML is similar to that of HTML:
<!-- This is a comment -->
Two dashes in the middle of a comment are not allowed:
<!-- This is an invalid -- comment -->
White-space is Preserved in XML
XML does not truncate multiple white-spaces (while HTML truncates multiple white-spaces to one single white-space).
XML Stores New Line as LF
- Windows applications store a new line as: carriage return and line feed (CR+LF).
- Unix and Mac OSX use LF.
- Old Mac systems use CR.
- XML stores a new line as LF.
Well Formed XML
XML documents that conform to the syntax rules above are said to be "Well Formed" XML documents.