Solr queries

Solr is what’s known as an index, which helps us keep track of the data, metadata, and resource map objects stored in our metadata catalog, Metacat.

Solr allows you to quickly search through any coordinating or member node. Some more common use-cases include:

  • Recovering lost PIDs - if you publish new data objects (publish_object) but forget to save the PIDs, you can recover them with a query. For example, you can search for all data objects that are not associated with a resource map and are published using your ORCiD (q=-resourceMap:*+AND+submitter:*XXXX-YYYY-ZZZZ-WWWW)
  • Working with groups of packages - if you want to find all data, metadata, and resource maps associated with a PI, you can use their last name (q=origin:*SURNAME*) or perhaps their ORCiD (q=rightsHolder:*XXXX-YYYY-ZZZZ-WWWW) or both (q=origin:*SURNAME*+OR+rightsHolder:*XXXX-YYYY-ZZZZ-WWWW) to retrieve information about packages associated with them

Once you understand the logic of queries, it becomes a flexible and useful tool that you can integrate into your R workflow. You can use queries to answer a variety of interesting questions, for example:

  • What are the most recently updated data sets?
  • What metadata and data objects are in a given data package?
  • What is the total size (in terms of disk space) of all objects stored in Metacat?

Querying Solr is possible by adding a query onto the end of a base URL or through the dataone::query() function in R. For now, we’ll just cover the basics of Solr queries in R.