October 06, 2003
XML Schema Design Tips

Designing an XML Schema is not easy. Often, there seem to be many ways of achieving the same result, but generally one finds that some choices are better than others. This article is aimed at reducing the amount of inevitable trial-and-error to make the best decisions. It is written at a high-level and assumes that you are familiar with XML Schema and its terminology and that you've attempted to write your own schema. I provide few examples or illustrations.

Choices


First, one needs to understand the differences between similar concepts and keywords before deciding which path to take when drafting a schema. Here is a discussion of some of the more complicated choices one has to make.

  • Complex Types vs. Model Groups (better)
    When defining a group of elements that can be referenced in different places in your schema, you need to either create a Complex Type or a Group (we'll ignore Simple Types for this discussion). Which to use is not always clear. You should read the article W3C XML Schema Made Simple. To summarize, Complex Types are complicated, so Groups should be used as much as possible. On the other hand, whereas Groups only specify Elements, Complex Types allow for the specification of Attributes as well.
  • Global Type vs. Local Type (better)
    • In order to avoid a complicated schema that is hard to navigate, you should avoid converting Local Types to Global Types unless you have to. Local Types abstract your design better. The only Types that need to be globalized are those involved in recursion, needed by multiple Elements, or that serve as the Base Type for other derived Types.
    • That said, it might be useful to have Global Types for extensibility, but you don't need to worry about that when you're designing the first version of your schema. Once you're done, you can think about globalizing certain Types for extensibility.
  • Global Type as Single Base (avoid)
    One reason to globalize Types might be to architect single-rooted Type hierarchies so as to factor out as much definition as possible for reuse, since only Global Types can be inherited. Especially if you come from an objected-oriented programming background, you might decide early in your design process to figure out a list of Types that you can make Global to form as Base for all other Types. However, this objective of creating Global Types for definition reuse does not work well.
    • Like xfront.com says, anything over 3 levels of derivation is too confusing and unnecessary. I found that to be the case as well and now strive for only a single level of derivation.
    • Sometimes it's just not possible for two closely related Types to share a common base Type. Take the case of a Complex Empty Type and a Simple Type. For example, you might want to keep <elemWithNoArg /> and <elemWithArg>arg</elemWithArg> close together, but their Type definitions cannot be specified in such a way as to share a common Type hierarchy since the XML Specification does not allow it.
    • If all you want to do is share Attributes between some of your Types, use Attribute Groups instead of Global Types. This is a more flexible approach anyway as you might found out later that one of your Types should not accept some of the Attributes that you had anticipated to be reusable.
  • Global Type (better) vs. Global Element
    • Between a Global Element and a Global Type, it's better to make a Global Type because it doesn't lock you into a single Element name.
    • In addition, having a Global Element is not equivalent to using a Global Type. It completely changes the semantics of the schema. Unlike a Global Type, a Global Element allows an XML author to specify different types of root Elements. But as one can judge from the XML Schemas drafted these days, one generally wants to allow only one root Element. So only define one Global Element, and use Global Types and Groups for everything else.

Attribute Defaults


Specifying Attribute Defaults in an XML Schema is highly controversial.
Attribute Defaults have nothing to do with validation and are instead part of the PVSI (Post-Validation Schema Infoset) process which is not easy to do with Xerces.
Note also that it is not supported by Relax NG, in case the flexibility to convert your schema to other formats is desired.
In other words, don't use them. Just use the ATTS StyleSheet (or a modified version of it) and a simple XML file that contains the Attribute Defaults.

Reminders


  • Empty Elements
    To specify Empty Elements, don't write the following, which actually defines an Element of anyType:
        <xs:element name="product" />
    Write this instead:
        <xs:element name="product">
    <xs:complexType/>
    </xs:element>
  • Union
    Remember that Unions are only for Simple Types, so they're not a viable solution for having a type that can be both <anyCharacter /> and <characterClass>anyCharacter</characterClass> .

    Posted by juliob at 04:09 PM
XML Schema Terminology

Making sense of XML Schema (XS) terminology is like shitting a bowling ball. Here's some help.

General Terminology Guidelines


  • Type refers to everything about an Element or Attribute, whereas Content only refers to what's in between the opening and closing tag of Elements. In other words, unlike Type, Content describes an Element's child Elements or Text or the fact that the Element is Empty; it does not cover whether the Element has Attributes or not.
  • A Simple Element is an Element that only contains Text, whereas a Complex Element is any other type of Element, i.e. an Element that is Empty, or has Attributes, or has child Elements, or any combination of the above and/or Text.
  • When one talks of Simple Elements or Complex Elements, one means Elements of Simple Type and Elements of Complex Type. Note that by itself, the term Simple Type could be applied to Elements or Attributes.

Terminology


  1. Simple Element and Simple Type
    Definition: a Simple Element is an XML Element that can only contain Text, but not Child Elements nor Attributes.
    E.g.:
      XML: <dateborn>1968-03-27</dateborn>
    XS: <xs:element name="dateborn" type="xs:date" />

    NOTE: for an Element to only contain Text, its type has to be either a pre-defined XML Schema datatype or a custom Simple Type (see next definition).
  2. Custom Simple Type for an Element or Attribute
    Definition: A Custom Simple Type is a new Simple Type based on a List of values or on a Restriction or Union of Simple Type(s). This or these base Simple Types can be pre-defined XML Schema datatypse or some other Custom Simple Types. The XS element xs:simpleType comes into play when you create a new Simple Type.
    E.g.:
      XML: <age>100</age>
    XS: <xs:element name="age">
    <xs:simpleType>
    <xs:restriction base="xs:integer">
    <xs:minInclusive value="0"/>
    <xs:maxInclusive value="100"/>
    </xs:restriction>
    </xs:simpleType>
    </xs:element>

  3. Complex Element and Complex Type
    Definition: A Complex Element is any XML Element that cannot be considered a Simple Element. There are 4 types of Complex Elements.
    1. Complex Empty Element
      Definition: Element that has null Content (but can optionally have Attributes)
      E.g.:
          XML: <product pid="1345"/>
      XS: <xs:element name="product">
      <xs:complexType>
      <xs:attribute name="prodid" type="xs:positiveInteger"/>
      </xs:complexType>
      </xs:element>

      NOTE: Contrast with <xs:element name="product"/>, which counter-intuitively does not specify an Empty Element; it specifies an Element of any Type and any Content.
    2. Complex Elements-Only Element
      Definition: Element that can contain only other Elements (and optionally Attributes)
      E.g.:
          XML: <employee>
      <firstname>John</firstname>
      <lastname>Smith</lastname>
      </employee>
      XS: <xs:element name="employee">
      <xs:complexType>
      <xs:sequence>
      <xs:element name="firstname" type="xs:string"/>
      <xs:element name="lastname" type="xs:string"/>
      </xs:sequence>
      </xs:complexType>
      </xs:element>

    3. Complex Mixed-Content Element
      Definition: Element that can contain Elements and Text (and optionally Attributes)
      E.g.:
          XML: <letter>
      Dear Mr.<name>John Smith</name>.
      Your order <orderid>1032</orderid>
      will be shipped on <shipdate>2001-07-13</shipdate>.
      </letter>
      XS: <xs:element name="letter">
      <xs:complexType mixed="true">
      <xs:sequence>
      <xs:element name="name" type="xs:string"/>
      <xs:element name="orderid" type="xs:positiveInteger"/>
      <xs:element name="shipdate" type="xs:date"/>
      </xs:sequence>
      </xs:complexType>
      </xs:element>

    4. Complex Text-Only Element
      Definition: Element that can only contain Text (and optionally Attributes).
      E.g.:
          XML: <shoesize country="france">35</shoesize>
      XS: <xs:element name="shoesize">
      <xs:complexType>
      <xs:simpleContent>
      <xs:extension base="xs:integer">
      <xs:attribute name="country" type="xs:string" />
      </xs:extension>
      </xs:simpleContent>
      </xs:complexType>
      </xs:element>

      Note: This specifies an Element of Complex Type and Simple Content. Because it can contain Attributes, the Element is of Complex Type. But because it does not contain other elements, it is of Simple Content.
      Note: It follows, then, that if there are no Attributes, you might as well use a Simple Type, which would be equivalent.
      Note: The XS syntax inside xs:simpleContent is like that of xs:simpleType in that you need to declare a xs:restriction/xs:union/xs:list Element inside, but unlike xs:simpleType, you can also declare an xs:extension Element inside so that Attributes may be defined.
      Note: If you want to both restrict the Type of the Text (xs:restriction) and also add an Attribute (xs:extension), you're going to have to separately define an intermediary (xs:restriction) Simple Type that you would then use as base for your xs:simpleContent's xs:extension.
  4. Complex Content
    Definition: Note that there isn't a one-to-one relationship between "Complex Elements" and "Elements with Complex Content". Complex Content refers to what can be specified inside the opening and closing tags of the first 3 of the 4 Complex Elements defined above; Complex Text-Only Element is the exception.
    • XS Shorthand: xs:complexContent is understood by default if neither xs:simpleContent or xs:complexContent is specified. For example, these two examples are equivalent:
          XS: <xs:element name="product">
      <xs:complexType>
      <xs:complexContent>
      <xs:restriction base="xs:integer">
      <xs:attribute name="prodid" type="xs:positiveInteger"/>
      </xs:restriction>
      </xs:complexContent>
      </xs:complexType>
      </xs:element>

      or more compactly:
      <xs:element name="product">
      <xs:complexType>
      <xs:restriction base="xs:integer">
      <xs:attribute name="prodid" type="xs:positiveInteger"/>
      </xs:restriction>
      </xs:complexType>
      </xs:element>


      NOTE: this shorthand is why xs:complexType supports the Attribute mixed="true" as well as xs:complexContent does.

Posted by juliob at 03:36 AM
October 05, 2003
XML Schema: String Types

The XML Schema specification comes with many different types of strings. What's the difference between them? Here's a quick reference to distinguish them.

Here, the string types are organized hierarchically in accordance with their inheritance structure.

  • string Any unicode characters
    • normalizedString A string without \r, \n, \t
      • token A normalizedString without leading or trailing spaces and without 2 or more consecutive internal spaces
        • NMTOKEN A token with any mixture letter, digit, or the characters -\._:
        • Name A token that's like an NMTOKEN whose initial character is a letter or one of the characters _:
          • NCName A Name without the character :
            • ID A NCName; only used for attributes
            • IDREF A NCName; only used for attributes
            • ENTITY A NCName; only used for attributes
  • QName An NCName optionally prefixed by an NCName:
  • Notation An NCName

Posted by juliob at 11:22 PM
License:
Creative Commons License