|
|
|
@ -0,0 +1,281 @@
|
|
|
|
|
---
|
|
|
|
|
title: "From Software Noob To Linux Accessibility Master"
|
|
|
|
|
layout: post
|
|
|
|
|
tags: "atspi, dbus, dbus a11y, accessibility, a11y, linux, linux a11y"
|
|
|
|
|
---
|
|
|
|
|
|
|
|
|
|
Here are some interesting problems I have faced when working with DBus, AT-SPI (Accessible Technology--Serial Protocol Interface) and the Rust programming language.
|
|
|
|
|
I realize these are fairly unique constraints, and this information is likely only relevant for a select few, but I thought the experience might be worthwhile to write down:
|
|
|
|
|
for my own sanity when I inevitably experience these same issues later and for others who may want to contribute to our new screen reader project [Odilia](https://yggdrasil-sr.github.io).
|
|
|
|
|
|
|
|
|
|
## DBus
|
|
|
|
|
|
|
|
|
|
[DBus](https://www.freedesktop.org/wiki/Software/dbus/) is a cool API!
|
|
|
|
|
Well it's not an API, but rather a mechanism to share messages across processes in Linux;
|
|
|
|
|
this is generally called IPC or Inter-Process Communication.
|
|
|
|
|
DBus can be used to [send and receive desktop notifications](https://specifications.freedesktop.org/notification-spec/latest/ar01s09.html),
|
|
|
|
|
[shutdown your computer](https://www.freedesktop.org/wiki/Software/systemd/dbus/)
|
|
|
|
|
and, for my purposes [get accessibility events](https://www.freedesktop.org/wiki/Accessibility/Walkthrough/).
|
|
|
|
|
|
|
|
|
|
### Inner Workings
|
|
|
|
|
|
|
|
|
|
DBus is an object-oriented approach to IPC.
|
|
|
|
|
It is split up into 4 main components that work together:
|
|
|
|
|
|
|
|
|
|
1. Objects
|
|
|
|
|
2. Interfaces
|
|
|
|
|
3. Methods
|
|
|
|
|
4. Properties
|
|
|
|
|
5. Buses
|
|
|
|
|
|
|
|
|
|
#### Objects, Methods & Properties
|
|
|
|
|
|
|
|
|
|
Objects are just like objects you learned in your CS classes;
|
|
|
|
|
it is a structure which contains attributes, and methods which can be called on the object.
|
|
|
|
|
DBus' objects are very similar, except that attributes are called properties.
|
|
|
|
|
|
|
|
|
|
Most DBus libraries provide a way for you to use "native objects" (i.e., a Python object, a C++ object, a Rust structure + implementation, etc.); this allows access to DBus methods using the language features available to you.
|
|
|
|
|
So for example, in Python you might write:
|
|
|
|
|
|
|
|
|
|
```python
|
|
|
|
|
obj = get_a_dbus_object()
|
|
|
|
|
print(obj.get_text()) # using a method
|
|
|
|
|
print(obj.locale) # using a property
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
This would print out whatever may be returned from the object's GetText method and what is found in the locale property.
|
|
|
|
|
Notice that DBus methods are always Pascal case (i.e., capitalized at each starting letter of a word).
|
|
|
|
|
|
|
|
|
|
#### Interfaces
|
|
|
|
|
|
|
|
|
|
A DBus interface (not to be confused with a Java interface, or a Rust trait) is a definition of a collection of methods.
|
|
|
|
|
For example, the "Text" interface may have an attribute like "Length" or a method like "GetText".
|
|
|
|
|
So the interface "Text" is just a list of methods and attributes all wrapped up together.
|
|
|
|
|
That's it! That simple!
|
|
|
|
|
|
|
|
|
|
This will come in handy later when we need to check if an object implements a method;
|
|
|
|
|
this way we can check for an entire interface of methods and properties instead of checking for each individually.
|
|
|
|
|
|
|
|
|
|
#### Busses
|
|
|
|
|
|
|
|
|
|
A bus' closest equivalent in standard computer science terms would be an IP address.
|
|
|
|
|
A bus address looks like ":1.39"; think of this like a raw IP address.
|
|
|
|
|
Some addresses have names associated with them like "org.a11y.Bus"; think of this like a DNS A record pointing at an IP (bus) address.
|
|
|
|
|
So a bus is just a place to send IPC requests, just like you'd send HTTP requests to a web server at a specific IP/port combination.
|
|
|
|
|
|
|
|
|
|
### Accessibility Events & Information
|
|
|
|
|
|
|
|
|
|
Let's assume for a moment that you cannot see anything. You are blind.
|
|
|
|
|
If you try to read an article you obviously cannot see what is on your screen, so you need something to read it to you.
|
|
|
|
|
This technology that reads your screen to you is, uncreatively called a screenreader, sometimes abbreviated "SR".
|
|
|
|
|
Well how does a screen reader know what is on the screen? How does it know what a button is? And a link?
|
|
|
|
|
How does it know if content has changed or if an alert has been sent?
|
|
|
|
|
|
|
|
|
|
The former describes accessibility information (i.e., this button contains a certain string of text);
|
|
|
|
|
the latter describes an accessibility event (an `aria-live` region has been updated, or an alert box has been displayed).
|
|
|
|
|
|
|
|
|
|
DBus can send these events and information to your process, if you ask for it.
|
|
|
|
|
This is what you want if I'm to create anything like a screenreader.
|
|
|
|
|
|
|
|
|
|
#### Accessibility Events in Rust
|
|
|
|
|
|
|
|
|
|
So why "Object:StateChanged\0"? Where does this come from?
|
|
|
|
|
|
|
|
|
|
The specification that is used to send this information to our DBus connection is called AT-SPI: Accessible Technology--Serial Protocol Interface.
|
|
|
|
|
To clarify: DBus is the general IPC mechanism for processes in Linux;
|
|
|
|
|
AT-SPI is a standard for how to send accessibility information/events over the DBus protocol.
|
|
|
|
|
|
|
|
|
|
## AT-SPI
|
|
|
|
|
|
|
|
|
|
AT-SPI are a set of XML files that specify *how* to send data across DBus for accessibility events.
|
|
|
|
|
I'm going to be honest: at first this system *seems* very convoluted and unnecessarily complex.
|
|
|
|
|
Over time though, this system has grown on me as I start to see its "complexities" as a sort of after-affect of the core principle of *simplicity* used within DBus and the specifications which use it.
|
|
|
|
|
|
|
|
|
|
I have explained previously that DBus has objects and methods just like a native object in Python, C++ or Javascript.
|
|
|
|
|
|
|
|
|
|
So let's say we want to implement the most basic thing a screenreader can do: read text.
|
|
|
|
|
Let's suppose we already have an item we want to get the text of.
|
|
|
|
|
Now to get the text of it, we call a method on the interface and pass the path.
|
|
|
|
|
This is abstracted away for us, generally speaking, when using any kind of language-specific DBus binding, but it's better to be explicit in this case.
|
|
|
|
|
|
|
|
|
|
No problem! We call `item.get_text()` and that's it, right?
|
|
|
|
|
No.
|
|
|
|
|
This is where, again, this "complexity" comes in.
|
|
|
|
|
Again, it starts out this way, but it will grow on anyone who enjoys the idea of the UNIX principles with time and understanding.
|
|
|
|
|
|
|
|
|
|
So what happens if we do `obj.get_text()`?
|
|
|
|
|
Let's try it on the first list item on my website's [homepage](/):
|
|
|
|
|
|
|
|
|
|
Here is the excerpt as it is written on the day of writing this article:
|
|
|
|
|
|
|
|
|
|
> I have three goals in my software development career:
|
|
|
|
|
> 1. Strong adherence to the <a href="https://?">UNIX principles</a> of software design.
|
|
|
|
|
> 2. Security, privacy and anonymity of the internet.
|
|
|
|
|
> 3. Accessibility of technology to the visually impaired.
|
|
|
|
|
|
|
|
|
|
What would you expect to receive if you ran `get_text()` on the first list item there?
|
|
|
|
|
If you, like me, were a little brainlette, you probably guessed "1. Strong adherence to the UNIX Principles of software design."
|
|
|
|
|
Let's find out if this is correct:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
let text = acc.get_text();
|
|
|
|
|
println!("TEXT: \"{}\"", text);
|
|
|
|
|
|
|
|
|
|
$ cargo run
|
|
|
|
|
TEXT: "1. Strong aherance to the  of software design."
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
If you read that carefully, you'll see there are what look like three spaces where the UNIX principles link should go.
|
|
|
|
|
This is *extremely* deceptive for two reasons:
|
|
|
|
|
|
|
|
|
|
1. One of those is *NOT* a space. It's an [Object Replacement Character](https://www.fileformat.info/info/unicode/char/fffc/index.htm) aka Unicode Point U+FFFC.
|
|
|
|
|
2. It looks like it has just dropped a piece of text without telling us! And without a way to get it back! *Gasp!* Oh the horror!
|
|
|
|
|
|
|
|
|
|
This is what I thought too.
|
|
|
|
|
But allow me to defend this for a minute.
|
|
|
|
|
|
|
|
|
|
What if you had something complex like a table, a block quote, an image or even something like a [MathML equation](example) inside the block of text (in our case, inside a list item, but this applies to any piece of text inside another)?
|
|
|
|
|
If you had a table, would you want to read it out?
|
|
|
|
|
MathML, you might want to say everything upfront, but MathML would need some amount of processing before it be readable as text.
|
|
|
|
|
And even with a link, there is a reason for this.
|
|
|
|
|
|
|
|
|
|
If you can see perfectly find and browse the web like anyone else, with your eyes, you can see what is a visited and unvisited link based on the color of the link. A darker color generally indicated a visited link,
|
|
|
|
|
whereas a lighter color generally indicates an unvisited link.
|
|
|
|
|
When a screenreader gets info about a piece of text, it would need to include that information to its user like "UNIX princples...link" or "UNIX principles...visited link".
|
|
|
|
|
So if I get the text of some item which contains another, should it include all sub items? What about just links? Should it tell you if the link is visited or not?
|
|
|
|
|
|
|
|
|
|
All these questions above would introduce additional complexity to answer if being done within a single query.
|
|
|
|
|
This has given me pause in my youthful "the system is broken" angst that generally plagues my thinking;
|
|
|
|
|
instead I see this is a very sober-minded and UNIX-y design principle that I think makes much more sense than the alternative.
|
|
|
|
|
Here are some major advantages of this method:
|
|
|
|
|
|
|
|
|
|
1. It allows *optional* processing of sub-elements; maybe you don't care what is underneath the element: this saves processing power and complexity.
|
|
|
|
|
2. It allows *custom* processing of sub-elements; you do not have to rely on AT-SPI to tell you what information you want. Perhaps you only need the role of the sub element, not the entire text of it: again, this saves CPU cycles and code complexity.
|
|
|
|
|
3. Allows arbitrary data to be inside any other structural element. This is optimal for HTML, which is built to have more or less arbitrary nesting of elements.
|
|
|
|
|
|
|
|
|
|
In reality: this is actually genius design!
|
|
|
|
|
My next question is: "If it uses the object replacement character so it can replace the children, then what happens if the object replacement character is actually in the text?"
|
|
|
|
|
Well, with some processing you can actually find out where each child goes, or if the object replacement character is actually written in the text itself.
|
|
|
|
|
How so?
|
|
|
|
|
|
|
|
|
|
First off, let's get a list of children.
|
|
|
|
|
We can do this with `obj.get_children()`.
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
# rust way of awaiting and not caring about an error case is: .await.unwrap()
|
|
|
|
|
println!("CHILDREN: {:?}", obj.get_children().await.unwrap());
|
|
|
|
|
|
|
|
|
|
$ cargo run
|
|
|
|
|
CHILDREN: [(":1.7", Path("/org/a11y/atspi/accessible/193\u{0}")), (":1.7", Path("/org/a11y/atspi/accessible/194\u{0}"))]
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
You'll notice that the children are merely a list of tuples;
|
|
|
|
|
each tuple only contains, at its core, two strings:
|
|
|
|
|
|
|
|
|
|
* Sender: A string describing which application has sent the information.
|
|
|
|
|
* Path: A string describing which element is being sent.
|
|
|
|
|
|
|
|
|
|
The sender, you will notice, looks suspiciously like a bus address.
|
|
|
|
|
This is actually what it is. Each process has a bus address, and it is letting you know where it's coming from.
|
|
|
|
|
The path is a (TODO) path to a new object for which we can receive information about through DBus if we want more information.
|
|
|
|
|
|
|
|
|
|
// TODO
|
|
|
|
|
Remember earlier when we used a sender, path and connection to connect to the accessibility bus?
|
|
|
|
|
And later when we created a Proxy to an object that was sent over as part of an event?
|
|
|
|
|
Well, this is the same idea!
|
|
|
|
|
We clone the `Arc<SyncConnection>` connection to use the same connection to talk to dbus, and we use the sender and path to create a proxy we can then use the same methods on as the parent!
|
|
|
|
|
|
|
|
|
|
Pretty cool stuff!
|
|
|
|
|
|
|
|
|
|
Okay, now back to what I was saying about being able to grab information about children to find out if we need to replace the object replacement characters or not.
|
|
|
|
|
(TODO)
|
|
|
|
|
Here's what we do: there is an [interface](???), we talked about this earlier, called [Hyperlink](???xml);
|
|
|
|
|
the Hyperlink interface can actually tell us what cursor position inside the parent the child occupies.
|
|
|
|
|
Some objects we get over DBus will not support this, but the vast majority will.
|
|
|
|
|
I dislike the fact it is called hyperlink, even though I can see that this is the primary use case,
|
|
|
|
|
I think it's reasonable to say that `StartIndex` and `EndIndex` are not exactly unique to hyperlinks (`<a>` tags).
|
|
|
|
|
Minor criticism aside, there is an opportunity here to match with the parent and find out if and where the child belongs to be placed.
|
|
|
|
|
Here's how:
|
|
|
|
|
|
|
|
|
|
If we get the position of every occurrence of the object replacement character from the parent,
|
|
|
|
|
and check each child to see if its `StartIndex` matches the position of the object replacement character,
|
|
|
|
|
then anytime it matches, that is where the child belongs.
|
|
|
|
|
|
|
|
|
|
There is another use for this that I would like to point out.
|
|
|
|
|
I think this is a reasonable case for seeing it pulled into its own interface, or joining accessible.
|
|
|
|
|
That is: structural navigation.
|
|
|
|
|
|
|
|
|
|
## Structural Navigation
|
|
|
|
|
|
|
|
|
|
People who use screenreaders have some special abilities I actually wish browsers implemented by default:
|
|
|
|
|
the ability to jump through the document by specific tags and attributes.
|
|
|
|
|
It's not sophisticated;
|
|
|
|
|
depth first search forward or backward looking for the closest heading, link, button, table, etc.
|
|
|
|
|
This is so ingrained in screebreader users that when a page finishes loading,
|
|
|
|
|
it is customary for the screenreader to announce (speak out lout to the user) the number of tables, headings, and visited and unvisited links that are on the page.
|
|
|
|
|
|
|
|
|
|
If I want to look for the next heading in an HTML document, however,
|
|
|
|
|
I can not start by just checking all children, because it is fairly common to have various tags embedded in your current tag.
|
|
|
|
|
I need to know, which children are after and which are before my caret.
|
|
|
|
|
|
|
|
|
|
## The Carrot 🥕
|
|
|
|
|
|
|
|
|
|
The caret is the same as your cursor in an input box.
|
|
|
|
|
Type right here and watch as your cursor (aka caret) moves with your typing: <input type="text" placeholder="type here">
|
|
|
|
|
|
|
|
|
|
The caret, or cursor, is something that most people are only used to seeing in the context of *editable* text,
|
|
|
|
|
but screenreader users enable a special mode in their browser (usually activated with F7) called "caret browsing".
|
|
|
|
|
Caret browsing allows you to navigate through a webpage using a cursor
|
|
|
|
|
even when the text is not editable.
|
|
|
|
|
This is *awesome*!
|
|
|
|
|
I can not understate how useful this is to me, just for simple keyboard-driven simplicity's sake and trying to eliminate the mouse as much as possible.
|
|
|
|
|
|
|
|
|
|
Try it now! You can always turn it off with F7, just the same as enabling it.
|
|
|
|
|
|
|
|
|
|
This caret can be moved around just like in any run-of-the-mill WYSIWYG (What You See Is What You Get) editors like Word or Libreoffice Writer.
|
|
|
|
|
This is how a screenreader user navigates the web:
|
|
|
|
|
with a cursor.
|
|
|
|
|
They use it to read one character at a time (with left and right arrow), a word at a time (Ctrl+left or right arrow) or entire lines of text (using up and down arrow).
|
|
|
|
|
This becomes, in essence, the active focus of the user: it is always on the cursor (a.k.a. caret).
|
|
|
|
|
|
|
|
|
|
## Keyboard Input
|
|
|
|
|
|
|
|
|
|
Keyboard input with accessible applications follows a very complex path, which can be a serious buzzkill for attempting high-performance screenreaders.
|
|
|
|
|
Let me show you what the issues are; the accessible technology (screenreader, in this case) will be written as "AT" in this diagram:
|
|
|
|
|
|
|
|
|
|
```
|
|
|
|
|
Wayland: Kernel -> libinput -> DE/WM -> accessible application -> AT
|
|
|
|
|
X11: Kernel -> Xorg -> DE/WM -> accessible application -> AT
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
What happens in the case of an inaccessible application?
|
|
|
|
|
It doesn't work, at all. A key press which is sent to an inaccessible application will *not* be sent to an AT application (i.e., a screenreader).
|
|
|
|
|
This is a serious problem, that I don't think *should* exist at all.
|
|
|
|
|
Perhaps there is some mechanism I am missing as to how to interrupt these keys before they pass all the way to an application and then just hope the GUI is accessible;
|
|
|
|
|
supposing that this is not the case, we need a system to interrupt the keys before they are sent all the way down the stack, then sent to the screenreader.
|
|
|
|
|
This is needed for two reasons: 1) performance; it doesn't make sense to send keys that far down the stack, just to hope the application implements accessibility correctly; we should be able to interrupt key presses *before* it gets to the application 2) control; it is best to be able to control things regardless of if an application is running or not. Under a system where an application must be accessible to send us keystrokes, a non-responsive application will not send us keystrokes either.
|
|
|
|
|
To have full control and maximum performance, we need to interrupt the keys at their source.
|
|
|
|
|
|
|
|
|
|
### rdev
|
|
|
|
|
|
|
|
|
|
`rdev` is a Rust crate which can (with the "unstable_grab" feature enabled) grab keys from the Linux kernel before they are passed any further down the stack.
|
|
|
|
|
It allows us to consume events if we do not want to also do the default action; for example, in "Browse Mode" a screenreader user will use the letter h to jump between headings within a page;
|
|
|
|
|
normally this would type the letter h, so to stop this from happening we can consume (or "eat") the event so that it isn't sent any further at all.
|
|
|
|
|
|
|
|
|
|
INSERT CONNECTION PARAGRAPH
|
|
|
|
|
|
|
|
|
|
## Pulling It All Together
|
|
|
|
|
|
|
|
|
|
Now that we have the basics of DBus, AT-SPI, caret browsing and structural navigation, let's put it all together in a final program which can actually accomplish something:
|
|
|
|
|
|
|
|
|
|
```rust
|
|
|
|
|
// use DBus to get the bus address of the accessibility (a11y) bus.
|
|
|
|
|
// connect to the accessibility bus and ask to receive focus change events
|
|
|
|
|
// speak the text of the current element, chopping off by line breaks, and including link information
|
|
|
|
|
|
|
|
|
|
// use odilia-input to get keystrokes
|
|
|
|
|
```
|
|
|
|
|
|
|
|
|
|
ALMOST DONE
|
|
|
|
|
|
|
|
|
|
ADD CONCLUSION
|