Analyze Your Solr Index on Pantheon

Ever curious about what exactly is getting indexed in Solr? Drupal abstracts Solr to a point where it can be difficult to understand how Solr is queried directly. It's actually not that difficult.

This article walks through building a Solr query by defining field lists and filter queries, analyzing results with each step. It also addresses where you can apply these filters for custom search pages in the standard Apache Solr module's configurations.

This info can be applied to any Drupal/Solr setup, but I'll focus on a Pantheon environment, as is the platform of choice for the current project I'm working on with Hook 42.

Out of the box, Solr provides an admin UI to query, but direct access isn't available from Pantheon. They do however provide a query UI with the Pantheon Solr module that works just fine.

Go to: Config -> Search and metadata -> Pantheon Apache Solr -> Query (admin/config/search/pantheon/query)

It should look something like this:

The path will be /select. I'll define the query keys to build the query below.

General Query

To start, let's run a general query for the word "vacation". For the sake of testing, replace "vacation" with a word more relevant to search on your own site.

The key q is required and represents the keywords to search for.

/select?q=vacation

You should see a list of scores in the middle of the code, without much else.

array (
      0 => 
      SimpleXMLElement::__set_state(array(
         'float' => '0.24845348',
      )),
      1 => 
      SimpleXMLElement::__set_state(array(
         'float' => '0.21960393',
      )),
)

Change output to JSON for readability

Output can be switched from XML to JSON with the wt key. This cleans up the output and makes this tool a little nicer to work with.

/select?q=vacation&wt=json

Your list of scores should look more like this now:

    "docs":[
      {
        "score":0.21960393
      },
   ]

Set what fields you want to see in the results

You can define the field list with the fl key. The field list value is a comma separated list of indexes you are interested in seeing in the output for.

In this example, I'm interested in seeing bundle (node type), label (node title), and a custom entity reference index, sm_field_locations.

Note: You can view a list of indexes at the following path: admin/reports/apachesolr/solr

"bundle" and "label" are fields that will be available to all nodes, but "sm_field_locations" is a field specific to my install. To follow this example, a look at the index list and select a custom field you're interested in seeing.

The query now looks like this:

/select?q=vacation&wt=json&fl=bundle,label,sm_field_locations

The results now display the fields requested, if they exist. You will notice the locatino fields don't have field_location, but my travel bundle does.

    "docs":[
      {
        "bundle":"location",
        "label":"United States"
      },
      {
        "bundle":"travel",
        "label":"6 Night stay in the (now) Quiet town of Washington D.C.",
        "sm_field_locations":[
          "node:100432",
        ]
      },
      {
        "bundle":"travel",
        "label":"3 Night stay in Vegas.",
        "sm_field_locations":[
          "node:105223",
        ]
      },
  ]
Note: You can add "score" to the field list if you would like to see the relevancy score.

Filter results to a specific value

Now let's say you want to restrict the results to a specific node type (bundle).

You can define the filter query with the fq key.

The format is index:value so if we wanted to filter for only "travel" nodes, the value of fq would be bundle:travel.

You can add multiple filter queries using operators like "AND/OR". An example to filter for a location of "United States" and a bundle of "location" would be: bundle:location AND label:"United States"

That value of course needs to be urlencoded since it is being passed in a query string.

Also note, certain characters, such as a colon, are reserved. Notice how sm_field_locations is an array of strings. Strings containing colons must be wrapped in quotes, as we did for "United States".

Now I'll add a query filter so only travel nodes appear in the results.

/select?q=vacation&wt=json&fl=bundle,label,sm_field_locations&fq=bundle:tour

Just to demonstrate, I'll change the query filter so that only travel nodes which have a field_locations reference to node 100432 appear in the results.

/select?q=vacation&wt=json&fl=bundle,label,sm_field_locations&fq=bundle:tour%20AND%20sm_field_locations:%22node:100432%22

My results now show results that conform to the filters.

    "docs":[
      {
        "bundle":"travel",
        "label":"6 Night stay in the (now) Quiet town of Washington D.C.",
        "sm_field_locations":[
          "node:100432",
        ]
      },
  ]

Beyond analyzing data

You can take this information and apply it to custom search pages without any coding.

The Solr Search page list is located at this path: admin/config/search/apachesolr/search-pages

When you edit a page, you can set the filters for a specific search, by checking the Custom Filter box and adding your filter query value.

That form also allows you to enable the ability to pass Solr paramters directly in the URL.

In the "Advanced search page options", you will see the following option... which should hopefully make more sense now.

Allow user input using the URL
Allow users to use the URL for manual facetting via fq[] params (e.g. http://example.com/search/site/test?fq[]=uid:1&fq[]=tid:99). This will only work in combination with a keyword search. The recommended value is unchecked

I'm hoping this takes some of the mystery out of what's going on behind the scenes with Solr. Digging into the data isn't as familiar as firing up your favorite DB manager, but it isn't so bad with the right tools and direction.