Made some progress today, getting the XML declaration highlighted, and also simple XML tags. For XML tags, the simplest approach seems to be to treat the tag start (< + optional slash + tag name) as one match, and the tag closer (optional slash + >) as another; this allows for highlighting of the tag contents separately, and also handles the problem of multi-line opening tags (pattern matching in the syntax highlighter component is through QRegExp, which seems to work line by line).
I still haven't figured out how to get the DTD declaration to highlight (it's on two lines); it looks as though I might have to take the same approach as with comments, and I'll need to figure out how to handle two different highlighting requirements using that approach. Handling attributes and their values is going to be a bit more complicated, because I'm not sure that I can definitively say whether an att+value combination is inside a tag or not; QRegExp supports lookaheads (but not lookbehinds). Finally, one remaining problem is specifying with accuracy the XmlName pattern for tags, which I haven't bothered to do yet (I'm just using an ascii model for the moment).