Thinking about Talkr

August 18, 2005August 20, 2005

My friend Jean-Claude Guedon pointed me to an interesting new company, Talkr, a few days back. It’s a service that uses a text to speech engine to create automatic podcasts from RSS feeds. It grabs the text from an RSS feed and generates mp3s of the posts, then makes them available for you to download to an iPod or other device.

With a free membership, you have access to a certain number of free audio feeds, plus audio derived from three RSS feeds of your choice. The company is hoping you’ll upgrade to their basic or premium membership packages, which allow you to convert 20 hours of audio from 20 blogs, or 50 hours of audio from 50 blogs, for $4.95 or $8.95 a month.

I was curious to find out how this blog sounded in a text to speech format, so I started following the link “Want a Free Podcast of Your Blog?” This quickly led me to a scary looking partnership application which asked me to enter into a one-year independent contractor relationship with Talkr, where Talkr would generate an audio version of my blog, optionally include ads in my posts, and split ad and subscription revenue with me. In the grand scheme of things, it’s not all that onerous a contract (though it does a poor job of specifying who owns the derivative work that Talkr creates from my content)… but it’s pretty odd that a link from the front page of a site takes one straight into a legally binding agreement. Sorta like asking for a pre-nup agreement before the first kiss.

Rather than become a Talkr partner and recieve $7 (payable net 30 days, but only when I’ve generated $100 in affiliate income) from anyone who follows links from this post and becomes a subscriber, I decided to take advantage of my free membership and generate audio feeds for this blog and Global Voices. It was a surprisingly simple experience – enter the URL of a blog with an RSS feed, wait while the engine grabs the RSS feed and creates the files, then use an autogenerated page to choose which posts to listen to.

The resulting files don’t seem to have any DRM attached to them – I’ve downloaded two and uploaded them here for your enjoyment, the audio version of my recent post on outsourcing, and a post on blogs blocked in China on Global Voices, though the URLs they’re posted under are password protected on the Talkr site. The engine they’re using is fast and impressively good… which is to say, it’s pretty painful in comparison to listening to actual human speech, but vastly better than what I would have expected. (Which just goes to show that it’s a while since I’ve played with text to speech. AT&T has a great online demo that allows you to try converting text to speech in four languages with thirteen possible voices. I recommend Charles, who speaks UK English…)

While I’m impressed, I can’t really imagine listening to blogs this way. It’s possible to listen and understand what the synthetic voice is saying, but it’s way more work than listening to an actual human voice, and I suspect I’d get pretty tired after a few minutes of awkward pauses and intonations. Furthermore, I blog using a lot of links, which aren’t apparent in an audio version of my posts. And sites like Global Voices, which have multiple authors, aren’t well supported – all posts are identified as being from Global Voices rather than from an individual author. (Then again, I don’t have a visual impairment, and I can afford a subscription to Audible to eat up the long hours between Lanesboro and Boston. I can certainly imagine situations in which this would become very appealing…)

I’ve got some open questions about the intellectual property issues surrounding Talkr. I’ve never consented to a relationship with Talkr where they can produce derivative works based on my content, though they’ve offered me such a relationship. And since all the content on this blog is available under a Creative Commons attribution license, they’re fully within their rights to create derivates from my work, so long as they credit me. And they’re not tacking ads onto my content, though there’s a “bumper” that promotes Talkr, so they’re not directly generating income from my work. (Correction: as Talkr’s CEO points out in the comments section, they’ve not yet implemented these bumpers, and they’re only going to do so with partners who’ve agreed to a licensing deal, so the IP issue I was raising with the bumper is a non-issue.)

But I don’t get the sense that Talkr is checking for Creative Commons licenses before converting content (at least, nothing on the site indicates that they’re doing this), and I can imagine a situation where a blogger might be very pissed that Talkr is creating derivatives of her work without her authorization. (Imagine that you’re trying to make money from your blog. Readers who listen to your posts don’t hear the ads that line your website, and therefore you don’t get paid for delivering those ads.) Anyone know if Talkr is thinking through these issues and what they plan to do when bloggers complain about having their feeds translated into speech?

Bonus link: the Linux.com article about open-source text to speech engine Flite (Festival Lite) has an excellent collection of links to commercial and non-commercial TTS engines.

8 thoughts on “Thinking about Talkr”

Matthew Hurst August 18, 2005 at 10:22 pm

An interesting read. I actually thought I had come up with this idea a couple of weeks ago and blogged about it (including a TTS speech file, of course). I then found out that someone had been on to this waaaaay back: Botcast, October 2004.

Here are my posts:

http://datamining.typepad.com/data_mining/podcasting/index.html

As for quality, I really like Cepstral’s voices, which have the added advantage of being designed for a system that can run on a mobile device. The Cepstral people were also involved with Festival at Edinburgh way back in the day. The Edinburgh crowd also produced rhetoricalsystems.com, subsequently sold to scansoft.com – Rhetorical’s voices were certainly among the best on the planet at the time of sale.
Kate August 18, 2005 at 11:08 pm

Do I assume correctly that a synthetic voice doesn’t convey emotion? Seems to me, given the amount speech conveys in tone, this could result in unintended miscommunication… though maybe not enough to raise real concern.
Ethan August 19, 2005 at 9:03 am

Matt – Why am I not surprised that you know vastly more about TTS than I will ever know? :-) I’ll take a look at Cepstral should I ever decide to take this on seriously. I suspect if I were to do this, I’d probably do the encoding and voice file hosting on my own…

Kate – good question. The simple answer is “no” – while these speech systems are pretty slick, they don’t actually understand the text they’re speaking, which means they can’t add emotion to their audio files. But I was impressed that the engine Talkr is using was well-written to the point that it knew to raise vocal tone at the end of a question. For a few seconds, it fooled me into thinking the system was quite a bit smarter than I would have predicted…
Andy Carvin August 19, 2005 at 3:37 pm

Hi Ethan,

I was invited to beta-test Talkr around six months ago, and have been playing around with it occasionally since then. I’m interested in the tool primarily as an accessibility tool for the blind, as more and more blogs are being designed in a rather sophisticated way, making it harder for assistive technology to read the blogs properly. Having an rss feed of an automated text-to-audio podcast might be of real use to the disabled community, not to mention people with limited literacies. I’m also wondering if this could eventually be used in combination with mobcasting as allowing people to use their phones to listen to blogs as well as podcasts. So I’ll be really interested in reading that article about the open source text-to-speech engine you mentioned. -andy
Matthew Hurst August 19, 2005 at 9:57 pm

There is another paradigm of speech synthesis called concept to speech (CTS) in which the content is encoded in a semantic form rather than the surface form (text). This allows for far more control over the quality and things like emotion and stress. Imagine a blog editor in which you could annotate information to assist the speech synthesiser to better produce nuanced speech. Would you be prepared to spend an extra 1, 2, 5 mins per post to add this markup?
Chris Brooks August 19, 2005 at 10:03 pm

Hi Ethan,

Thank you for a thought-provoking post on Talkr. Please take the following replies with a grain of salt, as Talkr’s CEO it is my job to express a bit of bias. Furthermore, I hope you’ll excuse my jumping around a bit to respond to several of your points.

>theyâ€™re not tacking ads onto my content, though
>thereâ€™s a â€œbumperâ€ that promotes Talkr

We reserve the right to add a “bumper” that mentions Talkr if you become a Talkr Partner — but we don’t add that bumper unless you agree to the partnership. (If fact, we currently don’t add it at all.)

>In the grand scheme of things, itâ€™s not all that
>onerous a contract

I would just add that the one-year term of the contract is severable with or without cause by giving 30 days notice.

>though it does a poor job of specifying who owns the
>derivative work that Talkr creates from my content

That’s a fair point. To be honest, this is not an area where I have a great deal of expertise. My gut reaction is that the original author should retain rights to the audio version of the content, as long as they attribute that particular audio version to Talkr. But I need to research that and bounce it off of a few people before encoding that in the Terms of Service.

>but itâ€™s pretty odd that a link from the front page
>of a site takes one straight into a legally binding
>agreement. Sorta like asking for a pre-nup agreement
>before the first kiss.

Poor salesmanship on our part. I’ll try to get that cleaned up.

>The resulting files donâ€™t seem to have any DRM
>attached to them

Correct.

>though the URLs theyâ€™re posted under are password
>protected on the Talkr site.

True — listeners have to sign up for a (free) membership to Talkr to access blogs that have not joined Talkr Partners. For blogs that do join, we provide code to paste into your blog template so that readers can listen directly from your blog.

>Anyone know if Talkr is thinking through these
>issues and what they plan to do when bloggers
>complain about having their feeds translated into
>speech?

First, we are happy to remove feeds from Talkr, should the author request it. We explain that policy here: http://www.talkr.com/faq/remove_my_blog.html.

Second, we think about this a bit differently: Talkr provides a technology platform that allows end users to convert the RSS feeds that interest them into speech. In this sense, Talkr is just another feed aggregator. If the blogger includes an advertisement in their feed, we will include that advertisement in the resulting speech as best we can. (Unfortunately, RSS ads are often images, so the best we can often do is explain that “an image with the caption was included at this point.”)

Finally, let me address what I think is the most serious issue that you raise:

>While Iâ€™m impressed, I canâ€™t really imagine
>listening to blogs this way.

Talkr (and podcasting in general) is not a replacement for blogging. You’re right, there’s no link strucure to provide a post with context; there’s no commonly accepted way to integrate comments, or trackbacks or pings.

But let’s say that you had an hour’s commute every day, and you wanted to spend 15 minutes of that time catching up on anything that was said by the New York Times, Wall Street Journal or CNN on Geekcorps. In addition, you wanted to hear any new posts from your favorite 3 bloggers in Uganda. And, for good measure, whatever was being published on the Berkman blog. If that’s a problem worth solving (and perhaps that’s a big “if”), it’s a problem you have to solve with something like Talkr.
Ethan August 20, 2005 at 8:50 am

Andy – I’m with you on the idea that good RSS text to mp3 engines could be very useful to the visually impaired. It would be very interesting to know whether Chris is in touch with folks like Jim Fruchterman at Benetech – that’s a possible partnership that could benefit both companies.

Matt – answering your question about CTS – not a chance. My general experience with tagging is that I only seem to do it when it has a direct benefit to me. I periodically use Technorati tags because I know it will get posts more attention. But I don’t use tags like XFN because I don’t experience any benefit from them. I’d need lots of very devoted readers via a service like Talkr before I’d invest the time.

Chris, thanks so much for taking the time to respond. Sorry for the error regarding the bumper – I’ll correct that in the body of the post.

On the issue of intellectual property – I raise this at least as much as an academic issue than as a blogger. I’m based at a thinktank at Harvard Law School that obsesses about intellectual property issues, and you’ve just generated a really interesting new debate: are you providing an alternative way to encounter content that’s already been published or are you creating derivative works from works that are – frequently – under copyright. I’m inclined to side with you on the issue, but the courts generally have sided with the other interpretation. One of my colleagues, John Palfrey, is rapidly becoming expert on issues of RSS and copyright law – I’m hoping he’ll weigh in on the question on blog or in person (he’s on paternity leave at the moment) and I’ll share any feedback I get from him.

On the issue of whether I’ll find myself using Talkr… and, by extention, whether Talkr is the next big thing or smart but doomed…

I’m your dream customer, in many ways. I commmute 3 hours each way to Boston once or twice a week and fill the hours with NPR, Red Sox games and content from Audible.com. I listen to a few podcasts, but not many, as most of the bloggers I want to keep up with don’t have podcasts. And I’m sufficiently technical that I can create custom RSS feeds that would target very specific topics, as you allude to in your response.

My question is whether or not a) the limitations of the audio interface will counterbalance the benefit of being able to consume content while I drive; and b) whether the experience of listening to a synthesized voice gets easier or harder over time. And while I’ve expressed some skepticism on the topic here, the real answer is “I don’t know”.

So I’ll try it and let you know. I’ll set up a couple more feeds and give them a listen for the next month’s drives and see what I think.

Thanks, everyone, for feedback on this. I take the interest in this comment thread as a good indicator that there’s – at the very least – fascination with the idea that someone could solve the problem of automating podcasts, whether or not Talkr currently meets my needs.
Chris Brooks August 22, 2005 at 4:54 pm

Hi Ethan,

Thank you for your follow-up, and your willingness to play with Talkr a bit. I would be happy to provide you with an account that can access more than 3 feeds, if that’s helpful. I’m very much looking forward to your feedback.

Thanks,
Chris

Comments are closed.