Quick and dirty apachesolr for drupal
This blog post is more than 4 years old, so the content may be out of date.
Here's a quick and dirty guide to setting up apachesolr to work on a Drupal instance.
Do not use this for production!
This is aimed at quickly setting up a rough-and-ready apachesolr server on Ubuntu.
Installation and setup
sudo apt-get install solr-jetty
Why Jetty? Quick and dirty…
sudo vi /etc/default/jetty
Change the NO_START line so that jetty can be launched (without this change, /etc/init.d/jetty start will not work).
# Defaults for jetty see /etc/init.d/jetty for more # change to 0 to allow Jetty to start NO_START=0
The solr index is stored by default in /usr/share/solr/data - this directory needs to be created, and owned by jetty.
sudo mkdir /usr/share/solr/data sudo chown jetty:jetty /usr/share/solr/data
Download the drupal module:
drush dl apachesolr
Copy three files from the apachesolr drupal module to the solr conf.
sudo cp -f sites/all/modules/contrib/apachesolr/solr-conf/solr-1.4/* /etc/solr/conf
Apache solr's "commit time" is the time in between sending apachesolr a new document to index, and it writing the document to its index. Until a document is "committed", it won't be found in search results.
[Edit] For some reason I had in mind that the maxTime was in seconds, of course it's in milliseconds. I'd still reduce it for dev, but 2 minutes is a much more sensible setting than 33 hours…HT to @darthsteven for the reminder. However, it is also another reminder to COMMENT ALL THE THINGS.
Edit the solrconfig.xml file…by default, the 'commit' time is 120,000 milliseconds: i.e. 2 minutes. A few seconds is a much more sensible time for dev. Here's the default configuration: change the maxTime from 120000 to something sensible for dev, like 5000 (the value is time before commit, in milliseconds).
sudo vi /etc/solr/conf/solrconfig.xml
<autoCommit> <maxDocs>2000</maxDocs> <maxTime>1200000</maxTime> </autoCommit>
Start teh solrs
$ sudo /etc/init.d/jetty start * Starting Jetty servlet engine. jetty * Jetty servlet engine started, reachable on http://myhostname:8080/. jetty ...done.
You should be able to access the solr site via the web UI (give it 20-30 seconds or so to start up, it is a java app after all). The url should be something like http://myhostname:8080/solr/.
If you can't access that, check the network settings (and any firewalls). Jetty binds to the IP address specified via /etc/default/jetty, in the JETTY_HOST parameter.
A note of caution: if you set up an apachesolr view, and you want to add the node title as a field, look for label in the list of fields. For some reason, the document sent to apache uses the term 'label' rather than 'title'. Knowing this in advance could have saved me several hours of debugging. :-/
Reviewing the contents of the solr index
Sometimes you want to manually inspect the solr index, without going via drupal.
Going via the web interface - e.g. http://myhostname:8080/solr/admin/ - enter *:* into the 'query string' input box, and click search. You'll see a URL like http://myhostname:8080/solr/select/?q=*%3A*&version=2.2&start=0&rows=10&..., but no results. This is because the default query parser is dismax, which doesn't support this query.
Hack the url to switch to using the standard parser: change ?q= to use ?q.alt=. You should now see results.