XML Schema Design Tips
Designing an XML Schema is not easy. Often, there seem to be many ways of achieving the same result, but generally one finds that some choices are better than others. This article is aimed at reducing the amount of inevitable trial-and-error to make the best decisions. It is written at a high-level and assumes that you are familiar with XML Schema and its terminology and that you've attempted to write your own schema. I provide few examples or illustrations.
First, one needs to understand the differences between similar concepts and keywords before deciding which path to take when drafting a schema. Here is a discussion of some of the more complicated choices one has to make.
- Complex Types vs. Model Groups (better)
When defining a group of elements that can be referenced in different places in your schema, you need to either create a Complex Type or a Group (we'll ignore Simple Types for this discussion). Which to use is not always clear. You should read the article W3C XML Schema Made Simple. To summarize, Complex Types are complicated, so Groups should be used as much as possible. On the other hand, whereas Groups only specify Elements, Complex Types allow for the specification of Attributes as well.
- Global Type vs. Local Type (better)
- In order to avoid a complicated schema that is hard to navigate, you should avoid converting Local Types to Global Types unless you have to. Local Types abstract your design better. The only Types that need to be globalized are those involved in recursion, needed by multiple Elements, or that serve as the Base Type for other derived Types.
- That said, it might be useful to have Global Types for extensibility, but you don't need to worry about that when you're designing the first version of your schema. Once you're done, you can think about globalizing certain Types for extensibility.
- Global Type as Single Base (avoid)
One reason to globalize Types might be to architect single-rooted Type hierarchies so as to factor out as much definition as possible for reuse, since only Global Types can be inherited. Especially if you come from an objected-oriented programming background, you might decide early in your design process to figure out a list of Types that you can make Global to form as Base for all other Types. However, this objective of creating Global Types for definition reuse does not work well.
- Like xfront.com says, anything over 3 levels of derivation is too confusing and unnecessary. I found that to be the case as well and now strive for only a single level of derivation.
- Sometimes it's just not possible for two closely related Types to share a common base Type. Take the case of a Complex Empty Type and a Simple Type. For example, you might want to keep <elemWithNoArg /> and <elemWithArg>arg</elemWithArg> close together, but their Type definitions cannot be specified in such a way as to share a common Type hierarchy since the XML Specification does not allow it.
- If all you want to do is share Attributes between some of your Types, use Attribute Groups instead of Global Types. This is a more flexible approach anyway as you might found out later that one of your Types should not accept some of the Attributes that you had anticipated to be reusable.
- Global Type (better) vs. Global Element
- Between a Global Element and a Global Type, it's better to make a Global Type because it doesn't lock you into a single Element name.
- In addition, having a Global Element is not equivalent to using a Global Type. It completely changes the semantics of the schema. Unlike a Global Type, a Global Element allows an XML author to specify different types of root Elements. But as one can judge from the XML Schemas drafted these days, one generally wants to allow only one root Element. So only define one Global Element, and use Global Types and Groups for everything else.
Specifying Attribute Defaults in an XML Schema is highly controversial.
Attribute Defaults have nothing to do with validation and are instead part of the PVSI (Post-Validation Schema Infoset) process which is not easy to do with Xerces.
Note also that it is not supported by Relax NG, in case the flexibility to convert your schema to other formats is desired.
In other words, don't use them. Just use the ATTS StyleSheet
(or a modified version of it) and a simple XML file that contains the Attribute Defaults.