For my first important blog entry, I have a question. How much freedom should a content publishing tool give a non-Web developer?
Here’s the situation: A new content publishing tool has been developed at my work place. A writer logs in to the tool, selects the type of content they want to publish, and is then presented with a number of form fields. There may be, for example, form fields for a headline, a byline, and a story. Simple, yes? OK.
Let’s assume a writer enters plain text in all fields. When the content is submitted to the CMS and produced as a Web page, all of the necessary markup is inserted (header tags around the headline, paragraph tags around the story, etc.). Excellent!
However, what should happen if the writer enters some HTML in any of the form fields?
Currently, this tool ignores it. Doesn’t care. It doesn’t even check for markup, so naturally anything, good or bad, could be entered. The argument presented by the tool developers is that the writers need to learn more about Web publishing and need to be more responsible for their content — both the stories and the markup.
Now, I’m all for people learning correct markup. And I’m all for people being responsible for their work. However no one is perfect, and mistakes in markup are bound to happen. Right? So what potential solutions are there to this?
- Simple checks for matching tags could be done very easily with JavaScript (the publishing tool has a Web front end) or on the back-end (though it seems like a waste of a server trip).
- Even more ideally, those input fields that potentially allow for markup could be accompanied by a lovely WordPress-like interface.
- The server could just strip all markup, since writers aren’t supposed to be entering any, anyway.
But without any checks in place, markup like this is bound to go live:
<font color=red>My Headline</font>
In fact, it already has… as have some tables (for no apparent reason). When I see irresponsible use of markup in a fairly constrained environment like this, after writers have been given training on the tool, have been told not to write markup, and have been shown the potential results of their misuse and mistakes, that shouts to me, “Don’t let them do that!” Without setting up a process where a bunch of people have to check all of the writers’ work, I see no other way for ensuring the quality of the markup (or the complete lack of it) without something being built into the publishing tool.
Your comments are most welcome on this topic. What additional arguments can you present for or against checks in a publishing tool? What options for markup validation do other publishing/CMS systems offer? And how much freedom should a non-Web developer have with markup?


5 comments
Kevin stopped by on March 23, 2004 at 10:32 AM EST and had this to say:
It’s a tough problem. You can either limit the markup allowed, by stripping it when you get it (at submit time), only allow editors to use a pre-markup language, like Markdown, or you give them a WYSIWYG tool (maybe in XUL, build around a limited set of Composer’s widgets) where they click a button, and you control the markup that’s spit out.
Whatever the case, it probably needs to be a combination of those options. Content should be validated on submit to make sure that all tags are closed, etc. Non-programmers/Non-HTML folks should be limited in the tags they’re allowed to use (no blink, no marquee, etc), but there should be a way around that (an admin function, which means user roles need to be there) built in to allow for the exception cases we all know will come up.
Less knowledgeable users are not to be trusted, especially if they’re creating HTML and don’t know it. They should be given as many tools as can be created to help them, but everything they submit should be validated and fixed whenever possible.
Steve stopped by on March 23, 2004 at 9:39 PM EST and had this to say:
If the point of the tool is to allow non-html people to provide content, then why in the world should it be acceptable for them to enter any html at all? At that stage you may as well give them a crash course in markup and hand them a copy of WS_FTP and let them destroy your valid code as they’re already able to with this “tool”.
So…I agree with what you are shouting at yourself as well as with what Kevin says. The tool needs to handle these sorts of things, even if its as simple as a regexp match on <.*> with javascript prior to submission, otherwise the tool is just a fancy front-end to the file system and nothing more.
Simon Willison stopped by on March 31, 2004 at 3:19 AM EST and had this to say:
Our solution at work (an online newspaper site) is to drastically restrict the markup they are allowed to use by validating the XHTML they have entered against a Relax NG schema (we could have just used a DTD, but Relax NG gives us more power and has a funky compact syntax). This also forces them to create well formed XHTML, or they can’t post anything. This solves the font tag problem but has the downside that they have to be able to write wel-formed markup - we’ve already had one support call which turned out to be caused by an unescaped &.
This technique does seem to be working, but we only have about half a dozen end users to train and sort out when things go wrong. The added support costs could be a real problem with a larger content team.
markku stopped by on March 31, 2004 at 11:39 AM EST and had this to say:
You can always strip the markup out of the content, while keeping the original markup-filled data. Serve the clean content to your (web) users, while give the writers their original markup when they want to edit it.
JC stopped by on April 14, 2004 at 4:33 AM EST and had this to say:
All user input should be validated, including that of people who are trusted by the site to post content but aren’t alphageeks (for a good CMS, this includes the ’super-user’ or chief admin unless she/he configs the system otherwise). HTML markup can be the beginning of a hack attack (though this is not so common when it comes to stuff trusted administrators post - a hacker has to somehow convince them to post the markup). Validating this kind of input really is not difficult with server-side scripting, good cms’s have been doing it for years, you can google for clean or sanitize or something like that — you’ll find php classes that allow you to pick which tags you want and don’t want in a config file, to strip tags selectively. You’ll notice WordPress selects certain markup to allow, and ditches the rest. WordPress (here) is pretty picky - allowing users use ul and li can help logical presentation, and for some presentations tabular format is needed too. For stuff that can break your page though (like tables), you might want to have this first admin approved.
My biggest problem was getting boss-types to limit use of and to maintain a clean-looking site. My call really would be to inform as few people as possible about font tags. You never know how much bad taste lurks behind a civilized appearance.