W3C - The multilingual web

I went to an interesting workshop of W3C in Madrid, called The Multilingual Web.
W3C is an acronym for World Wide Web Consortorium. W3C develops standards for webpages, and is led by the inventor of the web himself, Tim Berners-Lee.

This workshop was a particulary interesting one, as it focusses on languages, of which sign languages obviously make part. I was shocked to find out almost no-one had thought of it that way. One potential exception could have been SIDAR, a spanish/latin accessibility foundation, but still they seemed to focus on sign languages as a way to access information, as an alternative way to transfer messages, not as a language in its own right.

Facebook supports an impressive amount of languages, which are implemented by the community (crowd-sourcing). Facebook provides a translation-plugin, and it's up to the people to translate the interface into their own language. Google is performing similar efforts. Unfortunately, in both cases, sign languages are not supported.

I couldn't resist to raise a question to the panel. Here it is:
"[...] I think it is important to consider sign languages too when talking about the multilingual web. They are as natural as all the other languages brought to attention in this workshop, and are used as a first language by both deaf and hearing people, amounting over 100 million people worldwide. My questions thus focus on the language itself, rather than from an accessibility point of view. [...] Do Facebook and Google see opportunities for sign language content / interface? What are eventually barriers to take this opportunity?"

Ghassan Haddad (Facebook, Director of localisation):
"Translation is done by user groups. If you are interested as a community we can open it up - we have opened up for Cherokee, which only has 1000 speakers. We are opening it to another native american group with 50 speakers." (*)

Peter Constable (Microsoft, Senior Program Manager Lead in in Windows globalization):
"In terms of localisation of content. It's in strings. We're talking about sign languages, which often have no written form at all... where there are, they are not encoded in unicode. [...] If a signed language has a transcription system, those are potential canidates for unicode, but right now they are not there." (*)

Mark Davis (Google, President of Unicode Consortorium): "If there are standards for the way sign languages are represented in symbols, we would welcome that. We are engaged in a process of doing a lot more work for symbols, too." (*)

* The replies have been transcribed, you can find the answers in their exact context here

The talks have thus shifted to unicode, which is able to work with character strings in several languages, such as Arabian and Thai. Strings are written languages, and as Peter Constable from Microsoft put it correctly (he learns American Sign Language), sign languages often have no written form.

I would think there are two ways to go.

  1. Or sign languages should have a notation system that fits in unicode. Signspeak, an EU-project which is performed by, among others, Dr. Onno Crasborn at the University of Nijmegen could do. It aims to create a system which is able to recognize signed messages captured on film and create unicode strings out of them.
  2. Or unicode should have a multimedial brother, which allows strings to be much more than just a set of characters, so any language, spoken, written or signed can be supported.

The first option is one of accessibility. The message is not put through in a native way, it is converted. The second choice respects the nature of languages, and treats them as such.

I would think that if we think about the multilingual web, it is wrong to be limited by character strings, language is so much more than that. Think of intonations for example, and how language lives so much more when it's uttered than when it is written. Language should get all the freedom it deserves, certainly on the web where people do socialize.

On the other hand, computers have always loved strings, as they are easy to manipulate by regular expressions, but I would argue that during this century, with the current power of CPU's, it's definitely time to expand the string universe.
A string relates to characters as a video does to images. It's time to get the internet and operating systems to the next level by rethinking the relationship between strings and human messages. Shouldn't user interfaces be inituitive and user centered? So, shouldn't we offer the user litterally freedom of speech, and think beyond the prehistoric keyboard?

Sven Noben

Sven Noben is founder of Signfuse