How we use Elasticsearch to enhance our web applications
A few weeks ago we were asked by a client how we use Elasticsearch within our web sites and web apps. It was a good question, and while answering, we realised that Elasticsearch integrations have become an integral part of many of the bespoke web applications we’ve created.
So, to find out just how much of an impact the platform has had, and why we continue to use it, I put a series of questions to James, our Director of Technology. Here’s what he said…
Which of our clients use Elasticsearch?
We make web applications for the likes of the NHS, the UK Government and associated bodies. Elasticsearch is used primarily by our customers within our Jellybean and Hub applications. We also use it internally to help design & build pipeline projects in our innovation Labs.
How is Elasticsearch used at Browser?
In our Symfony 2 based Jellybean CMS platform, Elasticsearch is used to index every piece of content on the system. In the admin area, every content list (e.g. your list of site pages) can be filtered with a search term, and as such, Elasticsearch forms the primary point of contact for listing, ordering, and paginating data. This means the platform itself only has to refer to the database once we know exactly which 20 (or so) records we need to display.
On the public-facing side of a Jellybean based website, we similarly use Elasticsearch for any lists of data, e.g. the latest five news posts in category X, using the same simple API. Should the site require a full content search feature, we don’t need to do any extra work other than plugging in the search terms – Elasticsearch is already being used behind the scenes.
We used Elasticsearch during our work on the Transport Focus (previously Passenger Focus) website to index their 3000-strong database of research publications. The search applications the site used previously gave varying results, often missing matches we expected, but Elasticsearch returned the results we expected straight out the box. This was key for us, as often within a project there isn’t time to tweak a search system as much as you’d like as a developer, so intuitive results like this are valuable.
Additionally, our file management product (which has since morphed into our SaaS intranet product, Twine) relies heavily on Elasticsearch allowing end users to quickly find the assets that they need. The platform’s flexibility allows us to build the right search and navigation experience for each client.
Lastly, and most recently, we created a custom Media Centre application for Mondottica, to use as an online catalogue for their product range; being able to jump to any one of their 2000+ products with only a partial SKU code was a priority search feature. We used Amazon RDS to store the data, and Elasticsearch to index the various permutations of the data that we needed to be searchable.
How many users does Elasticsearch cater for?
We have approximately 20,000 registered users using our SaaS systems that rely on Elasticsearch to display content once logged in, but many many more accessing public Jellybean websites powered by Elasticsearch.
What’s the size of the data?
We’re very selective about the data we index, to keep things as lean and fast as possible, so the index size is relatively small.
What’s the number of records searched by Elasticsearch?
In the region of 200,000.
How important is search to Browser and how does it directly or indirectly impact Browser’s business?
In terms of industry competitiveness, having a rock-solid search system available is absolutely mandatory. There would be very little point in us designing systems to organise our customer’s data, without a fast, flexible way to retrieve it.
What search solution were we using before Elasticsearch and why did we change?
We experimented with various solutions including Amazon CloudSearch and Zend’s PHP Lucene implementation. Elasticsearch delivered the best combination of having a low cost-base, a simple API, reliable results, and future-proof scalability.
What challenges or problems did we have with this solution? Why did we want to change?
We didn’t settle entirely on any of these alternative options before deciding on Elasticsearch, but each of the alternatives offered less attractive packages in terms of cost, speed, or barrier to entry.
Is fast query response time important? What speed of query response do our applications require?
We build tools focussed on great usability; nothing can destroy usability faster than slow performance. When our customers’ public-facing websites rely on Elasticsearch, we rely on Elasticsearch to respond within milliseconds.
How does Elasticsearch perform in terms of real-time updating (fast indexing)?
Real-time updating isn’t crucial to our implementations at the moment, but the index needs to be up-to-date within a couple of seconds at the most. If our CMS users added a record or updated their content, they would expect to be able to find this new content straight away, and Elasticsearch meets this need perfectly well.
Is platform scalability an important consideration for Browser?
Scalability isn’t currently a big concern for us as our single-server Elasticsearch setup handles our dataset with ease. It is something we always have our eye on, however, so knowing Elasticsearch has excellent scalability options was an important factor in choosing it; we’ve witnessed teams commit to a particular search system only to have to abandon it when needing to scale to a cluster of search servers.
Can you explain how Elasticsearch fits into our architecture?
We run Elasticsearch on an Amazon AWS instance, separate from our load-balanced web servers, database, and caching systems.
We created our own content listing & searching library (as a PHP Symfony 2 bundle), independent from any underlying search system, and wrote a driver for the pieces of the Elasticsearch API we required. As mentioned previously, Elasticsearch is the primary point of contact for listing content on the public-facing and admin-facing areas of our systems: we use it to apply filters, sorting, pagination, and search queries, and retrieve from it a list of (usually 20) documents to display.
Does Elasticsearch provide an advantage in terms of deployment/integration with your system?
The simplicity of managing Elasticsearch is a big plus. We’re able to integrate routine processes such as building indices straight into our automated deployment process quickly and easily.
Which particular features of Elasticsearch are especially helpful?
The most helpful feature, that we tend to use over and over again, is that we can order our results by the ‘_score’ (which is Elasticsearch’s measure of search relevancy). This approach has retrieved the results we expected in nearly every scenario, straight out the box.
Does Elasticsearch provide an advantage to the development team?
We use Elasticsearch’s ‘multi_field’ type to index two versions of most textual data: one version that is tokenized and searchable by default; and another version that is ‘not_analyzed’, which we use for sorting.
Elasticsearch’s RESTful API is very easy to work with – we wrote the driver for our PHP library ourselves using curl, with very few hiccups.
What are the main bottom-line business benefits of Elasticsearch?
Speed and efficiency in setting up complex bespoke search functionality.