update a11y article

2 years ago · a6d5360ce4
parent 55d02557e3
commit a6d5360ce4
1 changed files with 98 additions and 56 deletions
--- a/_posts/2021-12-11-adventures-of-writing-a-screenreader-in-rust.md
+++ b/_posts/2021-12-11-adventures-of-writing-a-screenreader-in-rust.md
@ -1,12 +1,19 @@
 ---
 title: "From Software Noob To Linux Accessibility Master"
 layout: post
+code: true
 tags: "atspi, dbus, dbus a11y, accessibility, a11y, linux, linux a11y"
 ---

+Latest edit on: January 12th, 2021.
+
+{% include toc.md %}
+
+## Introduction
+
 Here are some interesting problems I have faced when working with DBus, AT-SPI (Accessible Technology--Serial Protocol Interface) and the Rust programming language.
 I realize these are fairly unique constraints, and this information is likely only relevant for a select few, but I thought the experience might be worthwhile to write down:
-for my own sanity when I inevitably experience these same issues later and for others who may want to contribute to our new screen reader project [Odilia](https://yggdrasil-sr.github.io).
+for my own sanity when I inevitably experience these same issues later and for others who may want to contribute to our new screen reader project [Odilia](https://odilia.app/).

 ## DBus

@ -20,7 +27,7 @@ and, for my purposes [get accessibility events](https://www.freedesktop.org/wiki
 ### Inner Workings

 DBus is an object-oriented approach to IPC.
-It is split up into 4 main components that work together:
+It is split up into 5 main components that work together:

 1. Objects
 2. Interfaces
@ -63,6 +70,8 @@ A bus address looks like ":1.39"; think of this like a raw IP address.
 Some addresses have names associated with them like "org.a11y.Bus"; think of this like a DNS A record pointing at an IP (bus) address.
 So a bus is just a place to send IPC requests, just like you'd send HTTP requests to a web server at a specific IP/port combination.

+How does this relate to a screenreader?
+
 ### Accessibility Events &amp; Information

 Let's assume for a moment that you cannot see anything. You are blind.
@ -77,22 +86,17 @@ the latter describes an accessibility event (an `aria-live` region has been upda
 DBus can send these events and information to your process, if you ask for it.
 This is what you want if I'm to create anything like a screenreader.

-#### Accessibility Events in Rust
+## AT-SPI & Rust

-So why "Object:StateChanged\0"? Where does this come from?
-
-The specification that is used to send this information to our DBus connection is called AT-SPI: Accessible Technology--Serial Protocol Interface.
+The specification that is used to send this information to our DBus connection is called AT-SPI: Accessible Technology-Serial Protocol Interface.
 To clarify: DBus is the general IPC mechanism for processes in Linux;
 AT-SPI is a standard for how to send accessibility information/events over the DBus protocol.

-## AT-SPI
-
-AT-SPI are a set of XML files that specify *how* to send data across DBus for accessibility events.
+AT-SPI is a set of XML files that specify *how* to send data across DBus for accessibility events.
 I'm going to be honest: at first this system *seems* very convoluted and unnecessarily complex.
-Over time though, this system has grown on me as I start to see its "complexities" as a sort of after-affect of the core principle of *simplicity* used within DBus and the specifications which use it.
+Over time though, this system has grown on me as I start to see its "complexities" as a sort of after-affect of the core principle of *simplicity* used within DBus and the specifications which utilize it.

 I have explained previously that DBus has objects and methods just like a native object in Python, C++ or Javascript.
-
 So let's say we want to implement the most basic thing a screenreader can do: read text.
 Let's suppose we already have an item we want to get the text of.
 Now to get the text of it, we call a method on the interface and pass the path.
@ -109,13 +113,15 @@ Let's try it on the first list item on my website's [homepage](/):
 Here is the excerpt as it is written on the day of writing this article:

 > I have three goals in my software development career:
-> 1. Strong adherence to the <a href="https://?">UNIX principles</a> of software design.
+> 1. Strong adherence to the <a href="h
+ttp://www.catb.org/~esr/writings/taoup/h
+tml/ch01s06.html">UNIX principles</a> of software design.
 > 2. Security, privacy and anonymity of the internet.
 > 3. Accessibility of technology to the visually impaired.

 What would you expect to receive if you ran `get_text()` on the first list item there?
 If you, like me, were a little brainlette, you probably guessed "1. Strong adherence to the UNIX Principles of software design."
-Let's find out if this is correct:
+Let's find out if this is correct (note I am only using code snippets to avoid complexity):

 ```rust
 let text = acc.get_text();
@ -128,33 +134,34 @@ TEXT: "1. Strong aherance to the  of software design."
 If you read that carefully, you'll see there are what look like three spaces where the UNIX principles link should go.
 This is *extremely* deceptive for two reasons:

-1. One of those is *NOT* a space. It's an [Object Replacement Character](https://www.fileformat.info/info/unicode/char/fffc/index.htm) aka Unicode Point U+FFFC.
+1. One of those is *NOT* a space. It's an [object replacement character](https://www.fileformat.info/info/unicode/char/fffc/index.htm) aka Unicode Point U+FFFC.
 2. It looks like it has just dropped a piece of text without telling us! And without a way to get it back! *Gasp!* Oh the horror!

 This is what I thought too.
 But allow me to defend this for a minute.

-What if you had something complex like a table, a block quote, an image or even something like a [MathML equation](example) inside the block of text (in our case, inside a list item, but this applies to any piece of text inside another)?
+What if you had something complex like a table, a block quote, an image or even something like a [MathML equation](https://developer.mozilla.org/en-US/docs/Web/MathML) inside the block of text (in our case, inside a list item, but this applies to any piece of text inside another)?
 If you had a table, would you want to read it out?
-MathML, you might want to say everything upfront, but MathML would need some amount of processing before it be readable as text.
-And even with a link, there is a reason for this.
+With MathML, you might want to say everything upfront, but MathML would also need some amount of processing before it be readable (or speakable) as text.
+And even with a humble link, there is a reason for this object replacement character:

-If you can see perfectly find and browse the web like anyone else, with your eyes, you can see what is a visited and unvisited link based on the color of the link. A darker color generally indicated a visited link,
+If you can see perfectly fine and browse the web like anyone else, with your eyes, you can see what is a visited and unvisited link based on the color of the link. A darker color generally indicated a visited link,
 whereas a lighter color generally indicates an unvisited link.
 When a screenreader gets info about a piece of text, it would need to include that information to its user like "UNIX princples...link" or "UNIX principles...visited link".
-So if I get the text of some item which contains another, should it include all sub items? What about just links? Should it tell you if the link is visited or not?
+So if I get the text of some item which contains some sub items, should it include all sub items? What about just links? Should it tell you if the link is visited or not? Should you make that an option to the `GetText` call?

-All these questions above would introduce additional complexity to answer if being done within a single query.
+All these questions above would introduce additional complexity to the `GetText` call.
 This has given me pause in my youthful "the system is broken" angst that generally plagues my thinking;
 instead I see this is a very sober-minded and UNIX-y design principle that I think makes much more sense than the alternative.
 Here are some major advantages of this method:

 1. It allows *optional* processing of sub-elements; maybe you don't care what is underneath the element: this saves processing power and complexity.
 2. It allows *custom* processing of sub-elements; you do not have to rely on AT-SPI to tell you what information you want. Perhaps you only need the role of the sub element, not the entire text of it: again, this saves CPU cycles and code complexity.
-3. Allows arbitrary data to be inside any other structural element. This is optimal for HTML, which is built to have more or less arbitrary nesting of elements.
+3. Allows arbitrary data to be inside any other structural element.

-In reality: this is actually genius design!
-My next question is: "If it uses the object replacement character so it can replace the children, then what happens if the object replacement character is actually in the text?"
+In essence, it puts the developer in greater control of what the screenreader knows about the page!
+
+My next question is: "If AT-SPI uses the object replacement character so it can replace the children, then what happens if the object replacement character is actually in the text itself?"
 Well, with some processing you can actually find out where each child goes, or if the object replacement character is actually written in the text itself.
 How so?

@ -176,34 +183,76 @@ each tuple only contains, at its core, two strings:
 * Path: A string describing which element is being sent.

 The sender, you will notice, looks suspiciously like a bus address.
-This is actually what it is. Each process has a bus address, and it is letting you know where it's coming from.
-The path is a (TODO) path to a new object for which we can receive information about through DBus if we want more information.
+This is actually what it is.
+Each process has a bus address, and it is letting you know where it's coming from.
+The path is a path to a new object for which we can receive information about through DBus if we want more information.
+Like so (Rust is weird with all its `unwrap()`sm, but stick with me here):
+
+```rust
+let child1_base = obj.get_children().await.unwrap().get(0).unwrap();
+let child1 = Proxy::new(
+  Arc::clone(connection), # some previously initiated connection
+  child1_base.sender,
+  child1_base.path
+);
+println!("CHILD1: {}", child1.get_text());
+
+$ cargo run
+CHILD1: UNIX principles
+```

-// TODO
-Remember earlier when we used a sender, path and connection to connect to the accessibility bus?
-And later when we created a Proxy to an object that was sent over as part of an event?
-Well, this is the same idea!
-We clone the `Arc<SyncConnection>` connection to use the same connection to talk to dbus, and we use the sender and path to create a proxy we can then use the same methods on as the parent!
+This code looks a little terse, but I assure you it makes sense:

-Pretty cool stuff!
+* A proxy object is a way to represent a DBus object as a native language object (in this case, Ruse).
+* The `connection` variable is some previously defined variable that you would need to start a DBus connection anyway.
+* `Arc::clone(x)` copies an [automatic reference counted](https://en.wikipedia.org/wiki/Automatic_Reference_Counting) variable so it may be used additional times. Don't worry about the details of this, it has something to do with how Rust as a language handles passing thread-safe variables. A bit out of scope for what we're really talking about here.

 Okay, now back to what I was saying about being able to grab information about children to find out if we need to replace the object replacement characters or not.
-(TODO)
-Here's what we do: there is an [interface](???), we talked about this earlier, called [Hyperlink](???xml);
-the Hyperlink interface can actually tell us what cursor position inside the parent the child occupies.
+
+```rust
+# .await.unwrap() is a Rust-ism, ignore it for now
+let c1_pos = child1.start_index().await.unwrap();
+println!("Position of child #1: {}", c1_pos);
+let text = obj.get_text().await.unwrap();
+# assume we have already created a function for get_first_of
+if c1_pos == text.get_first_of("\U{FFFC}") {
+  let full_text = text.replace("\U{FFFC}", child1.get_text());
+} else {
+  # ignore the .clone(); again, a Rust-ism
+  let full_text = text.clone();
+}
+println!("FULL TEXT: {}", full_text);
+
+$ cargo run
+FULL TEXT: Strong adherance to the UNIX principles of software design.
+```
+
+Here's what this code does:
+there is an interface, we talked about this earlier, called [Hyperlink](https://github.com/odilia-app/atspi-codegen/blob/main/xml/Hyperlink.xml);
+the Hyperlink interface can actually tell us the cursor position of the child element within the parent.
 Some objects we get over DBus will not support this, but the vast majority will.
 I dislike the fact it is called hyperlink, even though I can see that this is the primary use case,
-I think it's reasonable to say that `StartIndex` and `EndIndex` are not exactly unique to hyperlinks (`<a>` tags).
+I think it's reasonable to say that `StartIndex` and `EndIndex` are not exactly unique to hyperlinks (`<a>` tags);
+this applies to any nestable element with a different [semantic meaning (HTML)](https://developer.mozilla.org/en-US/docs/Glossary/semantics).
 Minor criticism aside, there is an opportunity here to match with the parent and find out if and where the child belongs to be placed.
-Here's how:
+You can see how this is done in a very primitive way above;
+here is how it would work in more complex cases:

 If we get the position of every occurrence of the object replacement character from the parent,
 and check each child to see if its `StartIndex` matches the position of the object replacement character,
 then anytime it matches, that is where the child belongs.
+Then we replace the object replacement character in-place with the text of that element, or sometimes just the role of an element;
+for example something may be spoken like this (\* indicates an audio indicator notifying the user that the containing is screenreader information and not text):
+
+> "...Einstein's theory of relativity, \*unvisited link\*, shows us that there is more to time than just "seconds": \*table\* in the above table, we can see how time dilation may be caused by high speeds."

-There is another use for this that I would like to point out.
-I think this is a reasonable case for seeing it pulled into its own interface, or joining accessible.
-That is: structural navigation.
+Obviously, this is not a great example;
+why would anybody put a table within a paragraph?
+I'm not sure, but it illustrates the point I'm making: that the screen reader will have a very controlled ability to decide what is said through these AT-SPI methods.
+
+There is another use for grabbing the cursor index of children that I would like to point out.
+I think this is a reasonable case for seeing it pulled into its own interface:
+structural navigation.

 ## Structural Navigation

@ -212,16 +261,16 @@ the ability to jump through the document by specific tags and attributes.
 It's not sophisticated;
 depth first search forward or backward looking for the closest heading, link, button, table, etc.
 This is so ingrained in screebreader users that when a page finishes loading,
-it is customary for the screenreader to announce (speak out lout to the user) the number of tables, headings, and visited and unvisited links that are on the page.
+it is customary for the screenreader to announce (speak out lout to the user) the number of tables, headings, visited links and unvisited links that are on the page in front of them.

 If I want to look for the next heading in an HTML document, however,
 I can not start by just checking all children, because it is fairly common to have various tags embedded in your current tag.
 I need to know, which children are after and which are before my caret.

-## The Carrot 🥕
+## The Caret 🥕🐇

 The caret is the same as your cursor in an input box.
-Type right here and watch as your cursor (aka caret) moves with your typing: <input type="text" placeholder="type here">
+Type right here and watch as your cursor (aka caret) moves with your typing; move it left and right with your arrow keys: <input type="text" placeholder="type here">

 The caret, or cursor, is something that most people are only used to seeing in the context of *editable* text,
 but screenreader users enable a special mode in their browser (usually activated with F7) called "caret browsing".
@ -253,7 +302,9 @@ It doesn't work, at all. A key press which is sent to an inaccessible applicatio
 This is a serious problem, that I don't think *should* exist at all.
 Perhaps there is some mechanism I am missing as to how to interrupt these keys before they pass all the way to an application and then just hope the GUI is accessible;
 supposing that this is not the case, we need a system to interrupt the keys before they are sent all the way down the stack, then sent to the screenreader.
-This is needed for two reasons: 1) performance; it doesn't make sense to send keys that far down the stack, just to hope the application implements accessibility correctly; we should be able to interrupt key presses *before* it gets to the application 2) control; it is best to be able to control things regardless of if an application is running or not. Under a system where an application must be accessible to send us keystrokes, a non-responsive application will not send us keystrokes either.
+This is needed for two reasons:
+1) performance: it doesn't make sense to send keys that far down the stack, just to hope the application implements accessibility correctly; we should be able to interrupt key presses *before* it gets to the application
+2) control: it is best to be able to control things regardless of if an application is running or not. Under a system where an application must be accessible to send us keystrokes, a non-responsive application will not send us keystrokes either.
 To have full control and maximum performance, we need to interrupt the keys at their source.

 ### rdev
@ -262,20 +313,11 @@ To have full control and maximum performance, we need to interrupt the keys at t
 It allows us to consume events if we do not want to also do the default action; for example, in "Browse Mode" a screenreader user will use the letter h to jump between headings within a page;
 normally this would type the letter h, so to stop this from happening we can consume (or "eat") the event so that it isn't sent any further at all.

-INSERT CONNECTION PARAGRAPH
+With all I've covered here so far, let's see if I can wrap it up.

 ## Pulling It All Together

-Now that we have the basics of DBus, AT-SPI, caret browsing and structural navigation, let's put it all together in a final program which can actually accomplish something:
-
-```rust
-// use DBus to get the bus address of the accessibility (a11y) bus.
-// connect to the accessibility bus and ask to receive focus change events
-  // speak the text of the current element, chopping off by line breaks, and including link information
-
-// use odilia-input to get keystrokes
-```
-
-ALMOST DONE
+All this information (which I gained mostly from asking [TheFakeVIP](https://thefakevip.xyz/) questions) has been pooled together in a new screenreader project named [Odilia](https://odilia.app/).
+Most of the core work has been done by others, but I occasionally contribute to it as well and I want to make blind individuals have access to a blazingly fast screen-reading experience on Linux.

-ADD CONCLUSION
+Happy a11y hacking!