Quick and dirty apachesolr for drupal

This blog post is more than 5 years old, so the content may be out of date.

Here's a quick and dirty guide to setting up apachesolr to work on a Drupal instance.

Do not use this for production!

This is aimed at quickly setting up a rough-and-ready apachesolr server on Ubuntu.

Installation and setup

sudo apt-get install solr-jetty

Why Jetty? Quick and dirty…

sudo vi /etc/default/jetty

Change the NO_START line so that jetty can be launched (without this change, /etc/init.d/jetty start will not work).

# Defaults for jetty see /etc/init.d/jetty for more
 
# change to 0 to allow Jetty to start
NO_START=0

The solr index is stored by default in /usr/share/solr/data - this directory needs to be created, and owned by jetty.

sudo mkdir /usr/share/solr/data
sudo chown jetty:jetty /usr/share/solr/data

Download the drupal module:

drush dl apachesolr

Copy three files from the apachesolr drupal module to the solr conf.

sudo cp -f sites/all/modules/contrib/apachesolr/solr-conf/solr-1.4/* /etc/solr/conf

Commit-time

Apache solr's "commit time" is the time in between sending apachesolr a new document to index, and it writing the document to its index. Until a document is "committed", it won't be found in search results.

[Edit] For some reason I had in mind that the maxTime was in seconds, of course it's in milliseconds. I'd still reduce it for dev, but 2 minutes is a much more sensible setting than 33 hours…HT to @darthsteven for the reminder. However, it is also another reminder to COMMENT ALL THE THINGS.

Edit the solrconfig.xml file…by default, the 'commit' time is 120,000 milliseconds: i.e. 2 minutes. A few seconds is a much more sensible time for dev. Here's the default configuration: change the maxTime from 120000 to something sensible for dev, like 5000 (the value is time before commit, in milliseconds).

sudo vi /etc/solr/conf/solrconfig.xml
    <autoCommit>
      <maxDocs>2000</maxDocs>
      <maxTime>1200000</maxTime>
    </autoCommit>

Start teh solrs

$ sudo /etc/init.d/jetty start
 * Starting Jetty servlet engine. jetty
 * Jetty servlet engine started, reachable on http://myhostname:8080/. jetty
   ...done.

You should be able to access the solr site via the web UI (give it 20-30 seconds or so to start up, it is a java app after all). The url should be something like http://myhostname:8080/solr/.

If you can't access that, check the network settings (and any firewalls). Jetty binds to the IP address specified via /etc/default/jetty, in the JETTY_HOST parameter.

Solr-views integration

A note of caution: if you set up an apachesolr view, and you want to add the node title as a field, look for label in the list of fields. For some reason, the document sent to apache uses the term 'label' rather than 'title'. Knowing this in advance could have saved me several hours of debugging. :-/

Reviewing the contents of the solr index

Sometimes you want to manually inspect the solr index, without going via drupal.

Going via the web interface - e.g. http://myhostname:8080/solr/admin/ - enter *:* into the 'query string' input box, and click search. You'll see a URL like http://myhostname:8080/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&..., but no results. This is because the default query parser is dismax, which doesn't support this query.

Hack the url to switch to using the standard parser: change ?q= to use ?q.alt=. You should now see results.

Blog tags:

Add new comment

Filtered HTML

  • Web page addresses and e-mail addresses turn into links automatically.
  • Allowed HTML tags: <a> <em> <strong> <cite> <blockquote> <code> <ul> <ol> <li> <dl> <dt> <dd>
  • Lines and paragraphs break automatically.
  • You can enable syntax highlighting of source code with the following tags: <code>, <blockcode>, <apache>, <bash>, <c>, <cpp>, <drupal5>, <drupal6>, <java>, <javascript>, <php>, <python>, <ruby>. The supported tag styles are: <foo>, [foo]. PHP source code can also be enclosed in <?php ... ?> or <% ... %>.

Plain text

  • No HTML tags allowed.
  • Web page addresses and e-mail addresses turn into links automatically.
  • Lines and paragraphs break automatically.
By submitting this form, you accept the Mollom privacy policy.