Well, by now, I told some people about my idea and received only positive feedback so far. Other ones appear interested. So I decided to elaborate, what I come so far concerning how computers can be a bit more accessible for humans with a hearing handicap.
This entry is a translation from my German one.
But first, what will it be about here?
- State of the art
- My motivation
- The application: Point of view of an user
- The application: Point of view of a developer
- Potential problem areas
State of the art
When it comes to "accessibility" in terms of the internet, its almost everytime about users with a visual handicap. I can understand this, since this people face the biggest hurds. But there are more kind of users, which are confrontated with barrieres, too. Just think of learning-disabled ones, or hard-of-hearing humans (h-o-h for short).
So I decided to focus on humans with a hearing handicap, especially on sign language users, since I sympathise with them.
You could say, that people with a hearing handicap are lucky ones. They're almost always sighted and hence can enjoy web applications like the developers and designers intended to. Everything's fine. Okay, sometimes one have to add some subtitles and they're done. You'd think.
But you're overlooking, that the spoken language isn't their mother tongue. Better think of h-o-h like foreigns, who aren't capable to fully understand your language. You have to think hard about how to phrase your sentences, when writing then. Another drawback is, that emotions and interlined opinions don't always arrive on the signing readers' side.
Several approaches were taken, to introduce an avatar for signing. A German deaf blogger, HeWriteSilent, enumerates some of them in his blog entry. For example, have a look at eSign or WISDOM. Or, pretty young, SiMAX. You'll find more, when you enter "sign language avatar" in your favourite search engine. But none of them could break through. The German deaf pirate, Julia Probst, explained why.
A few days ago, this message flooded the press within the deaf crowd:
Although I'm fascinated by it (even when I saw a first video back in June), I see some problems:
- It is proprietary. This implicates, that an user cannot assume, not being spyed on. Aside you cannot prevent the software from encroaching upon areas, it wasn't meant to access. Moreover, you cannot prevent the project from become discontinued at any point of the future (say, because it isn't profitable enough).
- It's by Mircosoft. Not only haven't they proven not to be savvy in security, but I'm also afraid of them constraining software on their operation system-only. This corresponds first 1. And APIs don't outlast for eternity (see the recent skype API shutdown).
- It's commercial. Because, the source code isn't open (see 1.), Microsoft is able to dictate the price. And when you're interested in h-o-h's affairs, you'll quickly recognise, that they have to face several burdens. Yes, health insurance funds and social assistance offices pay most of the accessible technology, but I'm afraid, they won't do in eternity. Moreover, people depend on their goodwill.
Originally, I wanted to work on a free software solution on the q.t. My motivation arise from this three sources:
However, I do know of one Indian developer who has made an important contribution. His name is Krishnakant [Mane]. He came to a talk I gave, and said that there was no Free Software that could speak words from the screen, and so asked what he ought to do? I said, “Write some.” So a few years later, he came to one of my talks, and reminded me of what I had said to him earlier — and said that he had “written some”. So now he has made major contributions to screen-reading software, which he and thousands of other people use.
- My h-o-h daughter. I want to prepare a world for her, where she can unfold free(er).
As mentioned above, it was meant to be developed on the q.t. Then I wanted to showcase a proof-of-concept and ask the experts for support. But then I've read Aaron Seigo's thoughs on introducing new ideas to free software communities. Aside, a friend of mine from my Google+ times wrote some lines in his blog, which tipped the scale (translated by me):
For me, freedom includes the point, that I feel unobserved in my life and am not bond on something, which sole purpose is to consider me a commodity. It's more then absurd. I've got the notion, that some providers treat me dismissive, patronising and rip me off. A state, I wouldn't expose someone in wittingly. Modesty forbids it.
So I will publish it sooner on GitHub. But I need help, too. I'm especially unsure on the choice of license. I think, I should talk about it on IRC. But more about it below.
The application: Point of view of an user
Let's begin with the FrontEnd. So that piece of software, an end user will most likely get to see. At first, I want to write for the web. I imagine to see a small icon (e.g. a hand, a common symbol among sign language users) beneath text, indicating readiness for translation. Later on I want to offer programs for native use cases. Because I like Python very much, I will write the software in this programming language using a svg python library. But that's in the remote future. My intention is not to translate chats (because it would need much more information in real time to decipher emotions and so on), but to offer something, to translate websites easily in sign language!
I just mentioned SVG. This is an image format for so called vector graphics. Those have the advantage, not to save pictures as set of points, but as a set of curves and lines. Hence, a high resolution at arbitrary size can be guaranteed. Beside that, it's an official standard by W3C, the World Wide Web Consortium, which defines rules for HTML. Moreover it is written in XML, so plain text. On the one hand, it's highly compressable and on the other hand, text is easily modifyable. Images or text can be embedded also :-)
The compression rate is important to me. I mentioned above, that avatars couldn't break through. I assume, these are the reasons:
- Bandwidth. Have you ever tried to watch a video on a smartphone via UMTS or LTE? Lags for granted. In my opinion, deaf users can skip audio. So only the image has to be transmitted.
- Emotions. I'm going to construct my avatar in several layers of SVG. With doing so, a designer is able to focus on special areas without having to worry for the rest. Aside it is possible to exchange only certain elements (e.g. the eyes).
- Caching. Due to exporting certain elements in stand-alone libraries it is possible, to cache only these. Hence, they have to be loaded only once.
I want to highlight some use cases.
As said, I'm in touch with sign language users every now and then. If one don't met one in person, one usually have the alternative of written communication or video chatting. Well, German (or in general: spoken language) isn't the mother tongue of sign language users. I find myself wishing to sign via my smartphone. But it isn't possible, yet.
"Das große Wörterbuch der Deutschen Gebärdensprache" (Schreenshot of the iPhone app) Aside I'm in the process of learning this (wonderful) language. A manual would be handy. Well, there are already apps for iOS and Android, but they're pretty expensive. EUR 9,00 per theme package. I hardly believe, someone is willing to pay for it.
So, let us imagine the program would be already available. How should one use the app? Mrs Kestner came up with a pretty clever taxonomy: Signs are sortable in, wether they're signed with one or two hands, how much fingers are used and where they are located.
I could imagine, that this can be accessed via swipe gestures or submenus. But I need a mock-up for it …
The application: Point of view of a developer
As developer, you give thought about a project in the forefront. For example, I discovered, that sign language differs from region to region. But there are also gestures, which reoccur. Hence I want to keep the libraries as modular as possible. This might look like this:
- A library describing how a move looks like. Which XML snippet has to be manipulated to see a thumb up? Which lines have to be faded-out? Which elements have to be recalculated (but without the description, HOW this is accomplished)?
- A library associating a sequence of words to a gesture, describing the how. Which steps are necessary to move a hand upwards? Which lines have to be manipulated in order to see a surprised face?
- A library devoted to handle the design. Do you know Sims? In the newer parts there are plenty of options to design your figure. I want to stick with a comic figure. See the problem areas below.
In the end, it shall be possible to change the color of skin or of the complexion without interfering or dealing with other libraries. To accomplish that it's necessary to define fixed points. Those define for example the position of the head. Or the beginning of the hips. Such stuff. How plump a body can be regulated later on ;-)
Let's consider the embedding. When I look at the recent distribution of accessible elements in websites and apps, I have to conclude, that the process is disullusioning. Of course, there are a couple of developers, which pats the backs of their fellows. But it drifted away from the main stream. In the beginning, the design was … well, technical. There were text files with images and so. Then framesets appeared. In my opinion, problems started here.
The libraries have to be included easily. I'm not finally sure, wether I will deploy them as jQuery plugin (I'm using jQuery.SVG.js, which is compatible to jQuery) and then readout text through a HTML-class ala "sign-this", or wether it will become a browser add-on. I can "ensure" the decoupling of accessibility from the developers' goodwill with an add-on.
But I tend to the first solution, since there's a fact, which isn't easily readout: the mood. It's a meta-information, which have to be shipped out with.
Emotions are an important part of how a message is interpreted. Imagine a neutral face to a sad event. It'll have a completely different effect.
Just imagine, what else could be done with it!
Let's say, the mood is also defined via HTML classes (at least as long as there isn't a standard for it). Maybe something like "mood-". You could use it for sign language interpretion. But you could recycle it, too. Say for adjusting the background colour. Look, there are apps for ambience. Why don't work collaboratively on it?
The mood could be used for blind users, too. The backing vocal could be adjusted. I have to work more on screenreadern for smartphones (needs Flash, Video, ca. 22 minutes) and desktops, before I can get more in detail. But it's a good chance to promote multiple purpose in the accessibility land.
Potential problem areas
Das große Wörterbuch der Deutschen Gebärdensprache (Screenshot der Desktop-Anwendung)
Well, the beloved patents. I'm worrying about failing due to the Großen Wörterbuch der Deutschen Gebärdensprache (large dictionary of the German sign language).
I've successfully worked together with the publisher. She's really engaged in the affairs of h-o-h. And have to make money with her publishing company. I can understand this. But is it possible to grant a patent on grammar? Would it be possible to avoid potential conflicts with a clever choice of a GPL-compatible license (Apache?)?
The plugin I use is licensed with MIT-Lizenz.
Workplaces. Another topic I spend much thoughts about. I've decided to use comic figures wittingly. On the one hand, they're easily described, on the other hand they're low-quality designed. I don't want to mulct speech-to-text reporters of their jobs. My software ain't to be displayed on websites, where text have a certain life span. Maybe it's even usable for real time translation. It's meant to be so.
Copyleft. Well, it's nice to have source open for the public. And I can only wish it for such a basic technology like the one I want to construct. But I want to allow a commercial usage as well. And hence I'm not sure, wether a copyleft will prevent this effect …