Saturday, January 20, 2007

Drupal 4.7 - Aggregator Shows 0 Items

Ran across a bug in Drupal yesterday in the Aggregator module. The aggregator was showing 0 items imported although I knew the feed had items. I knew because it was my own verified feed. The feed was fine on all aggregators I had tried it on - just failed in Drupal 4.7. After investigating, I found the problem. On line ~904 of aggregator.module it does a check of the item to figure out if it should be adding it as a new item - or updating an existing item. IMHO, the check is flawed. Here is the check:

if ($link && $link != $feed['link'] && $link != $feed['url']) {
$entry = db_fetch_object(db_query("SELECT iid FROM {aggregator_item} WHERE fid = %d AND link = '%s'", $feed['fid'] , $link));
}
else {
$entry = db_fetch_object(db_query("SELECT iid FROM {aggregator_item} WHERE fid = %d AND title = '%s'", $feed['fid' ], $title));
}


To determine if a news item already exists, it first checks for items with the same link and assumes if the items have the same link, they are the same items. If it can't find any items within the feed with the same link, it then looks for items with the same title within the feed and again assumes that the item is the same. Both of these checks are a little dubious imo - neither are specified as being required to be unique within the specs.

Basically, it assumes that item links and/or item titles are unique within the feed. The RSS 2.0 spec doesn't say this for sure.. not sure about 1.0 or Atom. All of these feed types do have id's (GUID for RSS 2.0, dc:identifier for 1.0, id for Atom) that are globally unique identifiers for the feed item.

Unfortunately, within the aggregator module, these actual item id's are not stored - so there is no way later to compare them to figure out what is / is not new.

For a quick and dirty solution, I removed the check entirely - so each item is new regardless if it is an update; however, you I suppose you could leave in the check for identical titles if you like.

UPDATE: I've just updated my site to Drupal 5.0 and it appears that the 0 items problem doesn't exist in this Drupal version.

3 comments:

  1. That's good if Drupal 5 solved the problem. Is it working for many different feeds or only verified for the one you're dealing with specifically?

    ReplyDelete
  2. I've selfishly only tested my feed; however, it is a validated feed. So given the way standards are _supposed_ to work, it should work for all.

    Of course, your mileage may vary.

    ReplyDelete
  3. I should also note that the problem was only with feeds that have non-unique links/urls or topics. It's not easy for me to quickly identify feeds that match this behaviour. If you happen to know one, post it and I'll test.

    ReplyDelete