Task 3: Voice-to-Text

writing is . . .

“language made material” (Haas, 1996, p. 3)
“time made spatial” (Haas, 1996, p. 12)
“the marvelous technology that allows the past to speak directly to the future” (Granadesikan, 2008, p. 1)
“a technology of the intellect” (Schmandt-Besser & Erard, 2009, p. 20)
“a psychological crutch” (Haas, 1996, p. 7)
“a dangerous, shadowy illusion of wisdom” (Plato… – or Haas, 1996, p. 7)
“residue” (Ong, 2002, p. 11)
like “an external storage device” (Gnanadesikan, 2008, p. 3)

. . . none of this captures what a typical voice-to-text program creates from our speech.

This week we were challenged to speak an unscripted, 5-minute story into a voice-to-text app. I have used this kind of technology before, in small doses, when speaking to Siri or texting. I was curious about the accuracy of other apps and longer stretches of speech, so I decided to use two programs at the same time to record and transcribe my spoken words to typed text. I realized that it may be a fruitless endeavour; however I thought it might also yield some interesting results for comparison. Below you can find the two different transcribed versions of my oral narrative (spoken only once). The yellow version was through speechnotes.co on my laptop and the blue version was through Transcribe, an IOS app I downloaded onto my iPad. While telling my story I attempted to imagine an audience rather than the apps and opted to not use the voice punctuation commands. Interestingly, when comparing the two texts below, one is pretty much a long run-on sentence with no punctuation to guide the reader. The other is, in stark opposite, a bunch of fragment sentences, still lacking commas to guide the reader. Both are challenging to read and would be great to use as an exercise for punctuation practice in my grade 10 classes as it would highlight, even more than the typical comma jokes do, the importance of punctuation in English written communication to show the reader how to read the text.

A poster from my classroom to emphasize the importance of commas.

It was interesting to compare the two versions of the anecdote. Where one got it wrong, the other got it right. Below you can see the highlighted areas of what the AI didn’t get “right”. I’ll admit that, if I were to seriously consider using voice-to-text as a productivity tool, I would choose speechnotes.co for it’s accuracy. While the lack of punctuation makes the created text very difficult to read, the program recognized more of my phrasing and held fewer errors. I’ll just have to speak the punctuation, focusing on writing aloud, rather than orating.

I orated the anecedote below, unscripted, once and recorded it using two different apps on two different devices:

This is the resulting text using the app **Transcribe** on my iPad.

This is the resulting text using **Speechnotes.co** on my laptop.

The highlighted text is where the AI deviated from what was actually said. Click on the image to see a larger version of the text.

“Oral cultures indeed produce powerful and beautiful verbal performances of high artistic and human worth, which are no longer even possible once writing has taken possession of the psyche.”
(Ong, 2002, p. 14)

This task was challenging for me. As a lover-of-writing with a poor memory I find off-the-cuff oral story-telling daunting. This exercise of seeing my oral narrative preserved as written text highlights the skills I feel I need to develop to enhance my unscripted stories so they can match what I can weave with pen, pencil or keyboard. I know that this is because I need think time, something that I work hard to afford my students so that they aren’t expected to speak on the spot without time to organize their thoughts. But, oh, to sound like Stuart McClean, Mindy Kaling, or Ian Ross (Joe from Winnipeg) when telling a story – yes, I know their’s are scripted, and it’s their oration that I love, but I still want my off-the-cuff narratives to sound like written text, carefully crafted and deliberate. I believe this is what Ong is referring to, in his 1972 lecture on Communication, Technology and Thought, when he expresses that writing changed the way we talk through the orderly organization of thought that became possible. I certainly felt better when I heard Ong (1972) mention that “no oral culture remembers anything verbatim” in his lecture. Gnanadesikan (2017) extends this idea, noting that “spoken words… are inherently ephemeral. So written language seems more real to us than spoken language” (p. 4). This is a literate perspective, certainly, and contrasts Plato’s stance that “writing strips language of its dynamic qualities” (Haas, 1996, p. 12).

I had to work hard to keep myself from scripting my story. Upon reflection, and in relation to this week’s readings, I find this natural tendency to want to write an outline and jot ideas down all the more interesting. Our lost art of oration has changed the way we express ideas and remember. But there still exists people who can tell great stories, recall and organize elements so that their off-the-cuff narratives are every bit as engaging and artful as an experience with a scripted text. Is it genetic? I’ve often wondered about this skill, because I find it challenging to just speak a story, and make it good. Maybe Plato is right. Maybe it really is just about practice and repetition.

“Writing, commitment of the word to space, enlarges the potentiality of language… [and] restructures thought.”
(Ong, 2002, p. 7)

Ong (2002) makes an interesting point, that oral expression can exist (and has) without writing, but writing cannot exist without orality because “written texts all have to be related… to the world of sound, the natural habitat of language, to yield their meanings. ‘Reading’ a text means converting it to sound, aloud or in the imagination ” (p. 8) as I am mentally doing while I write these words. It is interesting that written narratives, with time to craft, arrange and script, take shape so differently than do off-the-cuff speaking. This is due to the pausing, organizing, and cognitive processes involved in the process of writing versus the process of speaking. If I return to my own experience (the anecdote above), I notice that my diction shifts in my writing, even when journaling or free-writing, in that my vocabulary retrieval process is quite different than if I’m speaking. This is because, while telling a story, especially with a real audience, there is no time to think- it is momentary , immediate and fleeting. The words only remain in the audience’s mind – but they live in our minds differently than written text. Maybe this is why Ong (2002) notes that “texts have clamored for attention so peremptorily that oral creations have tended to be regarded generally as variants of written productions or, if not this, as beneath serious scholarly attention ” (p. 8).

There is agreement among linguists, historians, psychologists, and philosophers that writing has changed the way we think and learn. Haas (1996) discusses that the material nature of writing allows us to reflect, establishes the notion of history, and Ong (2002) emphasizes that inquiry is not possible without permanent text, since primal oral cultures learn by doing through “apprenticeship… discipleship… listening… repeating… assimilating… [and] participating… not by study” (p 8). This isn’t to mean that literate cultures do not learn in these ways. Rather, it emphasizes the impact of this technology on the way we think – our ability to examine, pause, re-examine, review, compare, interpret, organize, and so on, when we analyze written text and the ideas, history, style, phrasing, diction, mood, tone, impact , and meaning that it holds, could hold, or needs to be revised. This leads to “writing’s relationship to knowledge, to truth, and to power” (Haas, 1996, p. 4).

Even rhetoric speech art is a product of writing. I think about how differently my anecdote above would be if I had planned it or scripted it. Even having some time to think the story through, or practice it by saying it out loud a few times before recording the transcription would have elevated the final result. In this way, both Ong (2002) and I disagree with Plato’s assertion that writing is the antecedent of oration because “writing from the beginning did not reduce orality but enhanced it” (p. 9) . As I explore this duality of orality and writing and reflect on my own poor skills in the story-telling business, I find myself thinking of Plato’s comment that writing “fosters forgetfulness” (Haas, 1996, p. 7). Has my focus on writing negatively impacted my ability to think quickly and remember? Or am I just a product of my society, having generationally lost the ability to remember without the aid of writing? This chicken-and-egg rumination leads me to think about oral art forms such as MCs in the early rap battles of the 1970s, improv actors, and those who have practiced and honed the art of oral story-telling.

Oh, to be a true bard.

What voice-to-text apps fail to preserve and translate is the real-time inflection, pausing emphasis, humour, tone, chuckle, body language, and facial expressions that accompany oral speech. These all add to the story-telling (and hearing) experience. When writing, we use punctuation and diction to establish tone and mood. I think my favourite error is the line: “So my sister is a diesel dwarf” (which is meant to be: “So my sister lived in Duisseldorf”. It seems that the Transcribe App had difficulty with the German city name, because it didn’t get it right once: “diesel door” and “Dusseldorp”. Speechnotes.co, on the other hand, had an easier time recognizing and creating accurate text for Dusseldorf Autocorrect, in typing, texting, and in speech-to-text AI, often get it wrong, and often to a frustrating degree that drastically changes meaning. Take the line: “and had me sit in that like a science base where we are we’re not supposed to said” (bottom of Transcribe version) which sounds like gibberish, especially since I was referring to the fact that the conductor smuggled me onto a train and stowed me away in a compartment meant for workers, not passengers. The program recorded English words, yes, but they didn’t flow together in a sentence that held context or meaning. This is because, as Gnanadesikan (2008) notes, writing “records language, but not actual speech” (p. 9). In this way these voice-to-text technologies are attempting to equate two very different mediums of communication. Not only is the AI weak in its ability to decipher words through accent, pronunciation, and enunciation, we can not capture the essence of oration in written text if it is not written in the first place.

Even then, a scripted speech spoken aloud has much more power than read quietly to oneself, because “speech is totalizing” as the listener is in the moment “enveloped” by it (Haas, 1996, p. 9). This is why, when we study the works of Shakespeare (or any playwright for that matter) in classrooms, we need to engage with the text aloud, through video, recordings, reader’s theatre, and so on, in order to fully experience the text as it is intended to be experienced. We wouldn’t quietly read a podcast to ourselves and enjoy it to the same degree as listening to it, would we?

While Gnanadesikan (2008) whimsically equates the written word to time travelling, Ong (2002) reminds us that “writing tyrannically locks [words] into a visual field forever” (p. 11) which is made even more dangerous by the permanency and public nature of the internet. This takes what would traditionally have been an oral conversation, or thoughts aloud, and permanently etches it in a forum, message system, or social media platform. Even our oral speech is becoming permanent (as evident in the odd versions of my story above), as well as in audio & video recordings, such as we see on TikTok, YouTube, and podcasts. Haas (1996) made an interesting point when assessing the amount of written materials available, heightened by the advent of the internet, meaning that “no one individual can ever participate fully in the total cultural tradition” (p. 12). And this is just going to continue to grow with the participatory nature of media, the digitization of text and orality, and the development of emerging technologies. And, a Haas reminds us, technologies are infused with the value systems of those who develop and use them.

“Writing, because it is artificial, alienates us from the natural world and therefore heightens our humanity.”
(haas, 1996, p. 9)

references

Gnanadesikan, A. E. (2008).“The First IT Revolution.” The writing revolution: Cuneiform to the internet, 25, 1-10. http://doi.org/10.1002/9781444304671

Haas, C. (1996). “The Technology Question.” Writing technology: Studies on the materiality of literacy, 3-23. https://doi.org/10.4324/9780203811238

Kaling, M. (2016, Aug 29). Why Not Me by Mindy Kaling [audiobook excerpt]. Libro.fm, YouTube. Retrieved on September 24, 2021, from https://youtu.be/9bl6aYaAT0I

McLean, S. (2012, Jun 20). Stuart McLean – The Vinyl Cafe Storyteller. prospeakerscanada, YouTube. Retrieved on September 24, 2021, from https://youtu.be/zww8E1mXBAo

Ong, W. J. (1972). Walter Ong on Communication, Technology, and Thought [Lecture Audio Recording]. YouTube. Retrieved on September 21, 2021, from https://youtu.be/t2Z7ezRpz1c.

Ong, W. J. (2002). Orality and literacy: 30th anniversary edition (2nd Ed). https://doi.org/10.4324/9780203426258 

Plato. (2008). Plato’s Phaedrus (B. Jowett trans.) [eBook edition]. Project Gutenberg. https://www.gutenberg.org/files/1636/1636-h/1636-h.htm (original work written 360 B.C.E.)

Ross, I. (2016, Aug 23). Joe from Winnipeg: It’s good to be back. RiffRaft, YouTube. Retrieved on September 24, 2021 from https://www.youtube.com/watch?v=holTTQRhZb.

Schmandt-Besserat, D. and Erard, M. (2007). “Origins and Forms of Writing.” (C. Bazerman, Ed.). Handbook of research on writing: History, society, school, individual, text, 7-26. https://doi.org/10.4324/9781410616470

3 thoughts on “Task 3: Voice-to-Text”

annekenussbaum says:

September 27, 2021 at 9:52 am

So, I went to the Royal BC Museum (https://royalbcmuseum.bc.ca/collections/human-history) on Sunday, Sept 26th, and was enveloped by the oral languages of the Indigenous peoples from all over BC. We walked into the Human History Exhibit hearing voices. There were towers with common phrases written in both English and different Indigenous languages with buttons for us to hear them. My child and I spent quite a bit of time listening to all of the phrases and voices. Interestingly, the written version (in the Roman alphabet) that accompanied each of the oral phrases didn’t capture the essence of what we were hearing. Nor could I have properly pronounced the sounds without hearing it. There is a duality between oral and written language – oral first, with the power to carry history and stories. We watched a short video afterward that highlighted a number of people who are working to keep their mother tongues alive. One Cree speaking man, whose name I didn’t catch, said “I speak my language because it keeps me connected to my culture, to my history. It is like a cord directly from my heart to my ancestors.” While scanning online resources, I stumbled upon First Voices, an online space to “promote Indigenous oral culture and revitalize the linguistic history of their people” (https://www.firstvoices.com/content/get-started). This made me think about Haas’s (1996) comment about the notion of history being attached to writing. I’ve been pondering this statement, mulling it around, and I do not agree that a culture needs to be literate in order to “develop a sense of history” (Haas, p. 11). Yes, writing enables us to record and refer back to events, creating a history for a culture. However, this is also the case in oral cultures, the history just lives in people and objects and family. Take the masks of the Coast Salish people. I have been fortunate enough to have experienced two different stories from two different families. These oral theatrics bring the narrative alive. The story is practiced, honoured, and shared as a gift. This is similar to the ancient bards who recited the Iliad or the Odyssey, epic poems that captured elements of history. I think we need to beware of creating a hierarchy of writing and orality and remember the duality between these two aspects of language.

LikeLike

graemeb93gmailcom says:

December 6, 2021 at 8:18 pm

Hi Anneke. I’m so impressed by the detail and consideration you put into this post. I’m at the 11th hour completing my linking assignment. I find myself thinking, “why didn’t I do that too?” when it comes to your integration of visuals, quotations and text. I especially noticed how you start with something specific that serves as a launchpad for discussion, as opposed to starting with the general and moving toward the specific. Thought-provoking for me, for sure!

LikeLike

Pingback: Linking: Making Connections – Write the Other Way