Be aware: nodes with PHP code and Drupal search index

Be aware: nodes with PHP code and Drupal search index

FFW Marketing
AfFFW Marketing
april 16, 2010

We finally decided to go to Apache Solr as our search solution. Basic installation and configuration went well out of the box (both solr engine and the Drupal module). The only thing left was waiting on the index to fill.

Indexing is done on a cron run. Solr relies on Drupal's code Search engine, for hook_update_index invocation. So cron was set to be launched every 5 minutes instead of 1h to speed things up (650k+ nodes to index...).

However something broke... Drupal logs started showing "Cron run exceeded the time limit and was aborted."

We finally decided to go to Apache Solr as our search solution. Basic installation and configuration went well out of the box (both solr engine and the Drupal module). The only thing left was waiting on the index to fill.

Indexing is done on a cron run. Solr relies on Drupal's code Search engine, for hook_update_index invocation. So cron was set to be launched every 5 minutes instead of 1h to speed things up (650k+ nodes to index...).

However something broke... Drupal logs started showing "Cron run exceeded the time limit and was aborted."

Launching cron.php from a web browser was giving 403 Access Denied Drupal message. However running cron manually (admin/reports/status/run-cron) worked correctly. After Googling a bit found a wonderful debug solution (http://drupal.org/node/123269) - adding an extra line to module.inc:

...
foreach (module_implements($hook) as $module) {
$function = $module .'_'. $hook;

if ($hook == 'cron') watchdog('cron', "hit $module cron"); // add this line
...

At next run "hit search cron" in the log targeted the culprit.

Search module invokes hook_update_index during a cron run. Further investigation showed that failure was happening during node_update_index invocation. Searching in DB for PHP enabled nodes (there were not that much) gave the final answer:

<?php

if (!user_access('create article content')) {
drupal_access_denied();
exit();
}

?>
...

During nodes indexing PHP code inside nodes is evaluated! This is reasonable, since you want to get a clean output and not to index the code :D.
In this case however it led to anonymous cron calls getting Access Denied and breaking cron run.

Be careful with what you store in your nodes. PHP code does not belong to DB and should be used in node either for simple things (like ('Login', 'user/login');) or rarely and with care.