I’m blogging from the WWW conference in Chiba, where I’ve just given my talk, opening the day-long workshop on weblog ecosystems. (There’s a podcast of my talk online, thanks to Kathy Gill, who’s broadcasting the conference.) Kazunari Ishida from Tokyo University of Agriculture gave an interesting talk about “latent weblog communities”. He’s trying to detect communities of people who are talking about similar topics by looking for connected sets of links (connected bipartate graphs). He’s got an algorithm – Weak Pair – which does a bunch of matrix multiplication (which is well over my head) to try to find connected clusters of blog posts. The hope is to be able to introduce these groups and help them self organize into a catalog that would be a search engine alternative.
I’m interested in his term “whimsical links” – by which he means, links where a blogger is “off-topic”. I suspect it’s a mistake to conclude that every blog is on a single topic, but his results suggest that he’s able to find some small clusters where bloggers are on the same topic. He’s also had some success in finding multiple blogs from the same author, because they tend to all interconnect.
Shinsuke Nakajima from NAIST introduces three ways to think about key bloggers: topic-finders, agitators and summarizers. He talks most about the second two types and methods for detecting them. Summarizers, unsurprisingly, link to lots of people. Agitators can be found by looking for a drastic change in entries posted within a thread, or a drastic change in topic. Nakajima is interested in identifying influential bloggers so they could be used to complement mainstream websites or television. (He seems to believe that this would reduce bias in news. I’m skeptical) – he’s developed automated technique for identifying summarizers and agitators.
Something that caught my eye in Nakajima’s talk – his team is tracking 500k blogs, with 10m entries… but this set includes only 1 million links, which seems really small to me. Natalie Glance from Blogpulse confirms that this isn’t small at all, but typical in English blogs as well as Japanese language ones.
Ko Fujimora echoes this point in his talk about the new EigenRumor algorithm. His team is also tracking 10m entries from 305k sites, and discovers that only 16.5% blogs have one or more links. (In other words, there are a lot of journals out there…) Only 1.25% of blog posts link to other blogs and only 9.28% of blogs had links from one other blog… which might present a challenge for his ranking metric, EigenRumor. Unlike other ranking systems, EigenRumor ranks three scores: hub (the ability of a blogger to evaluate blogposts), authority (the ability of a blogger to create useful blogposts) and reputation (the ability of a blogger to provide posts in conformity with a community direction.)
The math is way, way beyond me, but folks here seemed to feel like EigenRumor could perform better than algorithms like PageRank in very small content sets, like linked communities of blogs.
More in a few moments, but I’m now trying to take notes on the next talk…