Wednesday, September 27, 2006

X Marks The Spot

X Marks The Spot

Why do developers treasure XML? And why should you?

If you’ve ever had someone do a job for you—be it setting up a Web site, making a Flash presentation, or bigger—even building a whole application for you—you know the dreadful feeling that overcomes you when you have to change the content. Getting in touch with the person, checking if he’s free to do the job, the waiting, the watching, the possible paying, it’s just too much! What if you could just use Notepad—yes, Notepad—to edit your content, and just sit back while it appears on your site automatically? The solution is close at hand. In fact, it’s everywhere.

What Is It?

XML (eXtended Markup Language) is a document format that’s been doing the rounds for many years now. Its purpose is to bring to the Web, and other areas, a way to represent structured data in a way that it can be used easily and quickly, without having to invest in a full-fledged database application. Why are we telling you about it? Because while it is mostly developers who keep fawning over it, there’s a lot you can do with XML too. Its beauty lies in its simplicity—while it’s really easy to learn, it’s still ridiculously powerful. You can use it to organize your personal data, create and edit the content for your site or presentations, and even make your own RSS feeds.

Even better, you can use the same file for all these purposes, without ever being limited to a particular operating system or software—XML works everywhere! A document written in XML is like a database in itself. And just like a database, its data can be brought into any application that supports it, and then manipulated or presented in any way you choose. So you could make just one XML file and use the content for your Web site or to print a brochure or magazine. You could also share this XML file with someone, who could then use the information for his purposes as well.

Yes, Yes, But What Is It?

XML is, to put it a little simplistically, an extension of HTML (Hypertext Markup Language). In fact, its roots are in a format called XHTML— EXtended HTML. All these use the tag approach— content is enclosed within markers, called tags, which are then processed by programs called parsers, which interpret the meaning of these tags. It looks something like this:

indicates that this is where the title ends. But this is where the similarity between XML and HTML ends: HTML is designed to display information; XML is designed to store it.

While HTML has a defined set of tags, each to denote something specific (like for the page header), XML lets you specify your own tags to represent what you want to put in it. For example, if you wanted to put in a phone number, you’d just use:

And this is where the eXtensible part comes in. You can just keep creating tags to suit your fancy, so there are virtually no limits to what you can do with your XML file. You can even use XML to build a specification for your own language!

Rules Are Fun

Of course, we can’t have people making XML documents any way they please. In order to make it easier to share and use XML files, the W3C (World Wide Web Consortium) decided to lay down some rules to make sure that all XML files share the same general structure—a tree layout, starting at a node and branching out into sub-nodes.

Rule #1: Thy XML should describe itself

The XML declaration is used to define the XML version and its encoding format. It starts the document, and looks like this:

Rule #2: Thy XML shall have only one root.

Your XML can contain as many tags as you want, but just like a tree, it should start with only one tag, called the “root”.

Rule #3: Thou shalt finish what thou started.

Every tag should have both an opening as well as a closing. This denotes the beginning and end of your data. A closing tag uses a forward slash—the “/” sign—to indicate that the field ends.

Rule # 4: Thou shalt pay attention to case.

XML tags are case-sensitive, so is different from . Such details are important.

You can write this in any old text editor, and save the file as “[filename].xml”. You will then be able to open it in any Web browser or program that supports XML.

As you can see, tags can even be repeated within the XML. You will also notice that in the “Artist” tag, we’ve mentioned a name as well. These are called attributes, and are usually used to assign unique IDs or names to your tags. You could also have a tag under the Artist tag to serve the same purpose. There are no real rules when it comes to this, but most who have experience with XML will tell you that you’d rather avoid attributes. The opening tag, closing tag, and the content within are collectively known as an element.

So, Now What?

Well, you’ve made your XML, but what good is it? It still doesn’t do anything, apart from mocking you from where it sits.

What you do with your XML now really depends on your own expertise and how willing you are to get your hands dirty with some development. When it started out, the purpose of XML was to separate Web pages into the interface – the pretty colours and effects we all see - and the data, the information that is presented using those pretty colours. Indeed, this is the simplest thing we can do with XML.

Style Me Up!

The first way to beautify an XML is the Cascading Style Sheet (CSS). It’s usually used for HTML, but since its job is to recognise tags and apply colours or effects to them, it works just as well with XML too. This is what a CSS for our music collection would look like:

Collection

{

background-color: #ffffff;

width: 100%;

}

Track

{

display: block;

margin-bottom: 30pt;

margin-left: 0;

}

... and so on, for each tag. As you can see, the

CSS can be used to assign height, width and even background colours, as well as display styles and margins.

You can write this style sheet in any text editor, and save it as “[filename].css”. To apply this style sheet to the XML, you will need to put this line in the XML file:

This tells your browser to display the XML using the style sheet you just made.

CSS isn’t a standard set by the W3C, and is quite limited in what it can do, so the Wise Men

of XML don’t advise its use. Instead, they point towards XML Stylesheet Language, or XSL. XSL offers a huge number of capabilities—it’s practically a whole new programming language on

its own. You can even specify conditions for formatting, using the classic: “if (condition) then (do this) else (do that)” structure that is common to all programming languages—like if you wanted your favorite artist to be shown in red, leaving the others in blue. Unfortunately though, XSL is a lot more difficult to learn than

CSS, but there are a lot of tools that you could use to generate an XSL style sheet without having to know XSL itself.

The best of these tools is Altova’s StyleVision, which lets you display XML data in HTML, Rich Text Format and even PDF.

Click!

There’s just so much to learn about XML that it’s mind boggling. For lots more information and some extremely useful tutorials on anything XML, head to www.w3cschools.com—a free educational site by the W3C.

When it comes to XML editing, even Notepad is enough, but sometimes you just need a lot more features. By far, the most powerful tool for XML editing and development is Altova’s XMLSpy. It’s shareware, though, and we know how irritating that can be. For the freeware buff, Wattle Software’s XMLWriter is a pretty powerful XML editor, too. Its interface is a little similar to XMLSpy, so it’s easy to switch to after your XMLSpy trial expires.

What’s up, Doc(Book)?

Now that we’ve looked at the basics of XML, it’s time to talk about an application of XML that is increasingly making its way as a standard for documentation—DocBook.

If you were to write a book and use XML to do so, you would follow the rules set by the DocBook standard. It started out as a format for just technical documentation, but its potential isn’t limited to that. As it proliferates, you need not be tied to just one platform or software—there are plenty of tools that support DocBook, including the increasingly popular OpenOffice.org.

When you make XMLs in the DocBook standard, you are creating a book that can be easily rendered into various formats - be it print or the Web. If you are an open source buff, Doc-Book is also a part of the open source movement, and no doubt some blessings will come your way. There’s also the future to consider – if DocBook does come to be as widely used as hoped, a day will come when all the DocBook documents on your PC will be part of one massive, easily-searchable index. You might soon even be able to select a document and decide on the fly whether you want to see it as it would appear on the Web, in print, or in a PowerPointlike presentation - a great deal better than having your stuff strewn about in different formats like it is today.

In both templates, you’d have noticed the “!DOCTYPE” tag. This tells whichever program that is opening the XML that it conforms to the DocBook standard—a Document Type Definition (DTD) file that enforces a structure on the XML.

For everything DocBook related, visit www.docbook.org. There’s even a Wiki you can use to collaborate with other users of the Doc- Book standard.

Really Simple Stuff

Unless you’ve been in a coma these past months, you couldn’t have missed the orange “RSS” button on so many sites. Yes, the most popular way to feed content to users today is based on XML. What’s more, its shockingly easy to make one for your site too! All you have to do is stick to the RSS standard structure, and voila! Your own newsfeed.

Have XML, Will Use

XML is the future of the way data will be shared on the Internet. It’s simple, it’s small, and because it’s truly platform-independent and can be edited in even the most basic of text editors, nothing really stops it from becoming ubiquitous.

Wannabe developer or just your average user, there’s really no excuse for you not knowing at least the basics of XML

No comments: