Speech recognition for non-email purposes has been a mixed bag for me.
I use it, sure, and I’ve done what I could to train my Dragon, but the accuracy is still not good enough despite the claims of Nuance, the maker of Dragon NaturallySpeaking 13 Premium.
To the rescue, let’s hope, comes Dictate Your Book: How to Write Your Book Faster, Better and Smarter (59 pages and $3 in the Kindle edition and free as a Kindle Unlimited loaner).
For more detailed guidance for dictators—um, not the Putin type, please—a good choice might be The Productive Author’s Guide to Dictation: Speak Your Way to Higher (and Healthier!) Word Counts (The Productive Author’s Guide to Writing Book 1) by Cindy Grigg. It’s 252 pages long and costs $5 for the Kindle and $10 in paperback.
But here I’ll focus on Dictate Your Book. Author Monica Leonelle (personal Web site here) says she has achieved 98 percent accuracy and has doubled her peak writing speed from 2,000 to 4,000 words an hour. I’ve been using headsets, but if you’re at your desk, Leonelle recommends a podcast-quality microphone such as the Audio Technica’s AT2020 or the Blue Yeti. They are condenser mikes. For the very best results, she says, buy a dynamic microphone like the RODE Podcaster.
The latter is too pricey for me, at about $230, so I’m going with the Blue Microphones Yeti USB Microphone – Blackout Edition, which Amazon is now selling for $113, or around $16 cheaper than the silver model [update: both are now $129). It’s Nuanced certified and the company awarded it a score of six Dragons for accuracy, better than any other.
A related item on the Nuance site offers this advice for Yeti owners: “The gain knob on the device should be turned all the way up and the cardioid pattern (3rd from the left on the switch) should be selected.”
The Amazon site places the silver model within the Multipurpose Dynamic Microphones category, but the Blue Yeti site suggests that it’s in fact a condenser model. You may not favor the USB option if you want to use the Yeti in the field—which, however, could be a problem because of the mike’s size (ten inches high). In fact, Leonelle is partial to XLR connections, which you can convert to USB.
I’m also going with a $7 windscreen from Tetra-Teknica, which will protect my Blue Yeti against dust and moisture and also soften P, T and B sounds.
But what if you want to dictate to your iPad or Android device rather than using your desktop? Dragon Anywhere is at the top of Leonelle’s list. It comes in paid ($15 a month or $150 a year) and free versions.
Among many other tips, Leonelle recommends having your keyboard handy so you can make corrections faster than with the dictation software..
I’ll see how that works out. My Unicom keyboard is on the clicky side, meaning that I’ll probably have to delete the resultant noise-related garbage, if I don’t switch off the microphone.
Why dictation in the first place?
Of course, there is the issue of whether you should be dictating in the first place. For email on my iPad it’s a natural since it’s so easy to make corrections and since I don’t need the full power of Word (for most stuff) or Fade In (my script writing software). But for longer documents? Well, Leonelle makes a good case for dictation as a way to reduce the strain on your wrists, and I’ll go along with that.
Accuracy issues notwithstanding, Dragon was a lifesaver for me when my upper arm started aching. For a week or two, I worked mainly with dictation until my arm returned to normal. I later abandoned my swivel chair for a recliner with padded arms to rest my elbows on, and I haven’t had troubles since then, but I wouldn’t mind some redundancy in case the pain returns. If your aches are increasing with age, then Drag might increase your career’s longevity.
Still other benefits of dictation are the abilities to multitask in certain ways (for example, walk around or perform household chores if you’re using a Bluetooth headset) and avoid Internet distractions (you’ll focus on your writing and on the dictation software—minus competition from, say, Net radio without earbuds).
The pro-dictation case is much of Lionelle book. I’d have welcomed yet more tips, but this is still a good value for $3 despite the low page count. I’ve emphasized the hardware-related tips since I suspect they’ll be the freshest and most useful ones to most TeleRead community members.
Anther dictation books out there
For another perspective, I’ve also bought Dictation: Dictate Your Writing: Write over 1,000,000 words a year (44 pages and $3 in the Kindle edition and free as a Kindle Unlimited loaner and $7 as a paperback), by Kevin Gise.
“Personally,” Gise writes, “I use a SpeechWare USB 3-1 Table Mike [link added]. It was a little pricey at close to $300 but it comes with noise canceling technology and wide band audio. The microphone is made specifically for voice recognition and is extremely accurate from more than a foot away.” If time allows, I may pass on a few tips from Dictation. That said, I found it curious that Nuance apparently regarded the Table Mike as one Dragon less accurate than the Yeti selling for a fraction of the price. Was Gise referring to a different model?
Of course an inevitable question arises. Do you really want to write “over a million words a year”? Dictation can speed up writing—it does not speed up thinking.
That said, the Leonelle book itself strikes me as rather thoughtful (and perhaps I’ll find the Gise book the same way once I’ve read it, despite his possibly being wrong about the accuracy of SpeechWare).
I suspect that if you added up her drafts, they would be much longer than the 59 pages of the final version.
Your own tips?
Once again, the TeleRead comments area beckons. What are your own tips and other thoughts on dictation, especially for books?
Related: Author Level Up’s informative YouTube interview with Lionelle on selection of dictation gear. Also see Joanna Penn’s interview with her.
My interest in this comes from a different angle, creating text from recorded speech in order to produce transcripts and soft subtitle tracks in video files. Recorded speech can offer more control over the confounding variables of extemporaneous speech and microphone quality, mic settings, etc.
Back in 2013, I devised a method for using enhanced dictation in macOS 10.9 as part of a speech-to-text engine. See: http://frank-lowney.blogspot.com/2013/11/dictation-audio-file-to-text-how-to.html
I really should go through this and update it because much has changed since 2013. With all of those people trying to talk to Siri, Apple may have learned a thing or two about comprehending human speech without training the listening system. As well, Audio HiJack Pro, the app I used to fool macOS into treating recorded speech as if it were extemporaneous, has been upgraded at least once since then.
The recording I used in this experiment was of Sam Waterston reading Lincoln’s Gettysburg Address so you’d think that a trained voice would be much more intelligible than the average speaker.
My results then were a bit underwhelming despite having controlled for some of the most troublesome variables. It certainly amplified my respect for human listeners who do so much better – at least for now.
@Frank: Congrats on all your good work, and keep us posted of your progress. Sooner or later the technology will be good enough, if it isn’t already. I have a selfish reason for knowing – it’s easy to imagine the benefits for journalism and other writers.
Dictation software doesn’t like me. I speak a sentence or two, and it’s riddled with so many errors that the result isn’t worth the bother of correcting.
That’s probably because I’ve lived so many places—from Alaska to Florida and even the better part of a year in Israel—that my accent is a mess.
Those who want to get into this seriously might want to devote as much attention to their room acoustics as to the mike. Here’s a video on creating sound absorption panels from thrift-store towels.
There’s something to that. My TV was almost impossible to understand until I realized why. It was in a corner between two highly reflective walls. Hanging towels on both walls deadened the reflections and made listening easier.
My more serious problem is hearing rather than speaking. My hearing drops off like a cliff after 8.2 KHz. I have trouble understanding higher pitched women’s voices in podcasts. There’s a host of people with other issues. Here is the free iOS app I use to measure my hearing.
In most of those cases, speech processing could help. In my case, sound up to 200-10KHz could be compressed into 200-6KHz much like HDR, High Dynamic Range pictures compress the range of light in photos. Voices would still be higher or lower pitched, just over a smaller range. That’d work with music too.
It’d be great if Android and iOS added that feature to their operating systems. Then it could be applied to everything from phone calls to audiobooks. I’ve already requested that Apple Accessibility add the feature to iOS at:
You might want to second my request with Apple. If you use Android, you might request the same from them. Adding it would offer hearing assistance to millions of people around the world at no additional cost.
@Mike: Intriguing ideas. May Apple pay attention. As with all-text bold, there’s clearly a need!
Reblogged this on Don Massenzio's Blog.
I was able to replicate my experiment using the latest versions of the Rogue Amoeba software (Audio HiJack Pro) and macOS (10.11.6). It still works but isn’t really useful since there’s no punctuation as you might expect from a human stenographer (e.g. court reporter). Just a long run-on sentence. Dictation software requires the speaker to supply the punctuation via commands such as these: https://support.apple.com/en-us/HT202584
Perhaps some future cortical implant will finally free us from QUERTY and other keyboards.
Reblogged this on Matthews' Blog.
Dragon Dictation just announced a 24% improvement in speech recognition, see: http://www.pcworld.com/article/3107943/software/nuance-taps-into-deep-learning-to-improve-dragon-speech-recognition-by-24-percent.html
Re my own modest efforts to adapt macOS STT technology to recorded speech, I was able to identify and work around a few of the issues that had stymied me earlier. Here’s a screencast showing what I was able to do by adding Audio HiJack Pro to the equation: https://youtu.be/V8PzB3gTHHY
Still note perfect but so much better than starting from scratch to build a transcript.
@Frank: Continued best wishes for your experiment. I may or may not buy the latest upgrade right now – it’s pretty expensive.