my.blog

My.Projects

Game Baker Easy to use, graphical Game Designer for linux.

Social Comic Book Display your twitter posts in a comic book layout.

Seam Resizer Implementation of seam removal and insertion for photo editing.

More:

Viral Ad Network Make money from your website by showing viral ads on your site.

Santa's Snowy Workshop A highly playable Christmas Real Time Strategy game..

My.Papers

Average Views on YouTube The average daily views/video on YouTube doubles at the end of 2007.

My.Blog

Nerdy news updates and articles
Tim Wintle
Fig 1

Tim Wintle's Blog

Tim works at Team Rubber, where he uses Python, large computers, and some clever maths to look at the web in new ways. In his free time he codes various other bits of software, and web apps.

.

Sun, 08 Apr 2007

Why tagging can take us furthur away from the semantic web?

There is a lot of talk these days about the "Semantic Web", and some purists suggesting that we tag all data. I believe that, while tagging may be wide-spread now due to it's relative ease of implementation, it is likely to hurt the long-term aims of the semantic web.

Firstly, let us define semantics. Here is the wikipedia definition:

Semantics ... refers to the aspects of meaning that are expressed in a language, code, or other form of representation. Semantics is contrasted with two other aspects of meaningful expression, namely, syntax, the construction of complex signs from simpler signs, and pragmatics, the practical use of signs by agents or communities of interpretation in particular circumstances and contexts. ...semantics may also denote the theoretical study of meaning in systems of signs.

So, the idea is that every item on the web can be uniquely categorised by some series of symbols, which occur within an alphabet. In everyday linguistics, we would take the symbols to be words, and the alphabet to be all the words in the dictionary. Notice that this is separate from the order of the words and punctuation.

Regarding natural language, there are two possibilities:

  • Language is fully capable of describing the entire concept of a document
  • Language can only describe a subset of concepts
Most people (including myself) would fall into the first category. We can then separate this into two furthur groups:
  • The semantics of natural language (i.e. words used) are fully capable of describing the entire concept of a document
  • Language can only describe all concepts when it includes the syntax and pragmatics
Here I would fall into the second category. It seems that an unstructured list of words cannot describe a document uniquely in its entirity. For a (very basic) example of the problem, "Suits Black Cat" and "Black Cat Suits" are two very different concepts, but they are semantically identical. (Note to replies - I know there is not technically an isomorphism here, but I do not want to get too deep into the maths/philosophy of this here. If someone has evidence against this please comment).

Now for some comments on tagging:

  1. Tagging tend to be taken from a smaller alphabet than words used in articles / web pages / full transcripts (in the case of video/audio). Basically, in the full text, an author will probably have used more than one synonym, where in selecting tags, people are more likely to choose the most commonly used word.
  2. Tagging removes punctuation. This is not technically removing any semantics from those used in the text, however it is perfectly possible to create semantics describing a page which relate to the grammar and linguistics. This is an ability to effectively increase the alphabet size that is missed by tagging.
  3. Tagging only uses one occurrence of each tag - this removes the ability to make use of the density of a word. Imagine you are putting up some new shelves. You measure your wall to see how long you want them, but your tape measure only has two marks, 0 and 1. Your wall is nearer 1, so you go to Ikea to get your shelves (which are also marked 0 or 1), and just have to hope that they fit.
Clearly, then, tagging effectively provides a lower number of available semantics that can be used for classification than natural language. This reduces the number of unique items that can be described using semantics directly derived from tags.

But how can this harm the semantic web? I hear you ask. Well, the more that tagging gets used, the more that we change the distribution of these words in our overall semantic, making it harder for people to fairly extract semantic data in the future.

In conclusion, if you are designing a site with tagging, that is all very well for usability, and for the semantic web in the stage we are at. All this tagging may, however, have a detrimental effect on the growth of the true semantic web, so please try to separate them off, and make it clear they are tags, as this will make it much easier for future algorithms, and the evolution of the web.



TrackBack ping me at:
http://www.timwintle.co.uk/blog.pl/Search/tagging-semantic-web.trackback