I recently posted an efficient algorithm for computing the similarity of two Wikipedia pages (or any two nodes in a network) using cocitation similarity. Another type of similarity which may be worth considering is bibliometric coupling, in which two pages are similar if the pages they link to are similar. What is interesting is that it is only a few minor tweaks to the cocitation algorithm to compute bibliometricc coupling. Here’s the bibliometric coupling psuedocode (Perl style):

%nodes, %links //the wikipedia pages and pagelinks
%reverse = reverse(%links) //flipping the pagelinks around
%biblioCounts //2d hash for temporarily storing counts
%scores //2d hash storing the final similarity scores

foreach node (keys %nodes){ 

   foreach linkedNode (keys %links->{node}) //count cocitations for node (wiki page)
      foreach node2 (keys %reverse->{linkedNode})
         $biblioCounts{node}->{node2} ++;

  citationCount = keys(%linked->{node})
  foreach node2 ( keys %biblioCounts{node}) //similarities scores for node
      citationCount2 = keys(%linked->{node2})
      scores{node}->{node2} =
		2 * biblioCounts{node}->{node2} / (citationCount + citationCount2)
}
Share