Indexing and simple search with Elasticsearch and Symfony

Share Button

Lire la version française

This article talks about the implementation of a search with Elasticsearch on a Symfony project.

Install Elasticsearch


Set of data used in this article

To realise our search, we’re going to create “basic” entities we can index : blog posts/articles.

<?php

namespace Obtao\BlogBundle\Entity;

use Doctrine\ORM\Mapping as ORM;
use FOS\ElasticaBundle\Configuration\Search;

/**
 * Article
 *
 * @ORM\Table(name="article")
 * @Search(repositoryClass="Obtao\BlogBundle\Entity\SearchRepository\ArticleRepository")
 * @ORM\HasLifecycleCallbacks
 * @ORM\Entity(repositoryClass="Obtao\BlogBundle\Entity\Repository\ArticleRepository")
 */
class Article
{
    /**
     * @var integer
     *
     * @ORM\Column(name="id", type="integer", nullable=false)
     * @ORM\Id
     * @ORM\GeneratedValue(strategy="IDENTITY")
     */
    protected $id;

    /**
     * @var string
     *
     * @ORM\Column(name="title", type="string", length=250, nullable=false)
     */
    protected $title;

    /**
     * @var string
     *
     * @ORM\Column(type="text", nullable=false)
     */
    protected $content;

    /**
     * @ORM\Column(name="created_at", type="datetime")
     */
    protected $createdAt;

    /**
     * @ORM\Column(name="published_at", type="datetime", nullable=true)
     */
    protected $publishedAt;

    /**
    * @ORM\PrePersist
    */
    public function prePersist()
    {
        $this->createdAt = new \DateTime();
    }

    public function isPublished()
    {
        return (null !== $this->getPublishedAt());
    }

    // others getters and setters

}

Create and configure the mapping file

To implement the indexing, you must specify the format for your documents . The fields name, their type, the filter to be applied to the search query and the indexed string, …

We’ll only talk about the entities mapping, and not the configuration of the index itself. For more details about the way to index and search the documents, read our article.

In your config.yml file, import a new fos_elastica.yml file that will contain all the configuration, and add the required parameters (host and port) in your parameters.yml file.

# app/config/config.yml

imports:
    - { resource: fos_elastica.yml }

# app/config/parameters.yml.dist

parameters:
    elastic_host : localhost
    elastic_port : 9200

# app/config/fos_elastica.yml

fos_elastica:
    clients:
        default: { host: %elastic_host%, port: %elastic_port% }
    indexes:
        obtao_blog:
            client: default
            types:
                article:
                    mappings:
                        id:
                            type: integer
                        createdAt :
                            type : date
                        publishedAt :
                            type : date
                        published : 
                            type : boolean
                        title : ~
                        content : ~
                    persistence:
                        driver: orm
                        model: Obtao\BlogBundle\Entity\Article
                        finder: ~
                        provider: ~
                        listener: ~

  • Clients : Define which clients are available for search (here, only a “default” client)
  • Indexes : The name of the index in Elasticsearch. You can compare it with the name of a database in SQL. It gathers types of documents as a database gathers tables.
  • Types : Specify the different types of documents that will be indexed. In this example, we have only one type : the article. A type can be compared to a database table.
    • mapping : List of your document properties and types. It can be compared to SQL table column.
    • persistence : Define how FOSElasticaBundle will index your documents depending on your Symfony entities.
      • Driver : Driver to use (here, as often, ORM)
      • Model : Allow to define an Elasticsearch document from a Symfony entity. It’s the easiest way to index documents : using the built-in models already defined for your application. 
      • Finder : Search interface. For the moment, we use the default one. With this service, you can perform a search to Elasticsearch
      • Provider : Indexing interface. For the moment, we use the default one. With this service, you can define how to index in Elasticsearch
      • Listener : The list of the listeners for which the indexing is called (default : insert, update, delete. Used in most cases)

Now, you know the “basic” configuration and how to define your Elasticsearch documents depending on your Doctrine entities.

Index / See your indexing results

It’s the moment to insert some test datas in your database, and then to fill your Elasticsearch index with the command : 

$ app/console fos:elastica:populate

This command uses the provider and loop on all your Doctrine objects to fill the index. 

The method is easy : for each entry defined in your mapping configuration (fos_elastica.yml), the corresponding getter is called and the returned value will be insert in your Elasticsearch index.

So, you can insert computed data that does not exist in Doctrine, simply defining a getter (like here with published/isPublished).  

One the indexing is over, you can see your documents in the plugin Head “Browser” (http://localhost:9200/_plugin/head/)

Thanks to the listener, each time a Doctrine insert/insert is performed, your document will be updated in the Elasticsearch index.

Create a Symfony search object

We are going to create a search object on which the form that will help you to handle the search will be based.

This object will contain various properties which will be mapped to our filters, sort and pagination criterias. Take a look below to see what it might look like :

<?php

namespace Obtao\BlogBundle\Model;

use Symfony\Component\HttpFoundation\Request;

class ArticleSearch
{
    // begin of publication range
    protected $dateFrom;

    // end of publication range
    protected $dateTo;

    // published or not
    protected $isPublished;

    protected $title;

    public function __construct()
    {
        // initialise the dateFrom to "one month ago", and the dateTo to "today"
        $date = new \DateTime();
        $month = new \DateInterval('P1Y');
        $date->sub($month);
        $date->setTime('00','00','00');

        $this->dateFrom = $date;
        $this->dateTo = new \DateTime();
        $this->dateTo->setTime('23','59','59');
    }

    public function setDateFrom($dateFrom)
    {
        if($dateFrom != ""){
            $dateFrom->setTime('00','00','00');
            $this->dateFrom = $dateFrom;
        }

        return $this;
    }

    public function getDateFrom()
    {
        return $this->dateFrom;
    }

    public function setDateTo($dateTo)
    {
        if($dateTo != ""){
            $dateTo->setTime('23','59','59');
            $this->dateTo = $dateTo;
        }

        return $this;
    }

    public function clearDates(){
        $this->dateTo = null;
        $this->dateFrom = null;
    }

    public function getDateTo()
    {
        return $this->dateTo;
    }

    public function getIsPublished()
    {
        return $this->isPublished;
    }

    public function setIsPublished($isPublished)
    {
        $this->isPublished = $isPublished;

        return $this;
    }

    public function getTitle()
    {
        return $this->title;
    }

    public function setTitle($title)
    {
        $this->title = $title;

        return $this;
    }
}

In this object, we have defined our search criterias. The goal is not necessarily to map all the object fields to search as they are. For example, we don’t want to allow to search on the “content” property, that’s why the ArticleSearch object has no “content” property. Conversely, the “publishedAt” property of the Article object becomes “dateFrom” and “dateTo” in the ArticleSearch object as we want to search the articles published between two dates. We also have defined a “isPublished” property as we only want to retrieve the published or unpublished articles. We could have added two properties “createdFrom” and “createdTo” to find all the articles created between two dates.

Actually, the “date” (“dateFrom”/”dateTo”) and “isPublished” filters are not compatible : specify any date implies that we only search published articles (as we filter on the publication date). So, specify a date range et ask for the unpublished articles will never return any result. It’s not a great example but our goal is not to make the app of the century but to show what you can do with Elasticsearch and Symfony.

You can obviously add other properties depending on your needs and wishes.

Create the associated search form

This object will be associated to a form that will allow us to choose our criterias. Here is the form (classic but efficient as you can see) :

<?php

namespace Obtao\BlogBundle\Form\Type;

use Obtao\BlogBundle\Model\ArticleSearch;
use Symfony\Component\Form\AbstractType;
use Symfony\Component\Form\FormBuilderInterface;
use Symfony\Component\OptionsResolver\OptionsResolverInterface;

class ArticleSearchType extends AbstractType
{
    public function buildForm(FormBuilderInterface $builder, array $options)
    {
        $builder
            ->add('title',null,array(
                'required' => false,
            ))
            ->add('dateFrom', 'date', array(
                'required' => false,
                'widget' => 'single_text',
            ))
            ->add('dateTo', 'date', array(
                'required' => false,
                'widget' => 'single_text',
            ))
            ->add('isPublished','choice', array(
                'choices' => array('false'=>'non','true'=>'oui'),
                'required' => false,
            ))
            ->add('search','submit')
        ;
    }

    public function setDefaultOptions(OptionsResolverInterface $resolver)
    {
        parent::setDefaultOptions($resolver);
        $resolver->setDefaults(array(
            // avoid to pass the csrf token in the url (but it's not protected anymore)
            'csrf_protection' => false,
            'data_class' => 'Obtao\BlogBundle\Model\ArticleSearch'
        ));
    }

    public function getName()
    {
        return 'article_search_type';
    }
}

Search with Elasticsearch

Now you can implement your search function. The principle is simple : in a controller, you instanciate the search form and, if it’s submitted, you call the search(ArticleSearch $articleSearch) of the ArticleRepository class. Indeed, in order to keep the code well organized and do the thing in the right way, you should place everything related to the search queries building in a dedicated class.
Remember, we have compared the search in Elasticsearch with a query in a database. You would never create a Doctrine query in a controller, right? So, here it’s the same situation (the ones who have answered “yes” can run naked in the nettles during 20 minutes).

Here is the controller (simplified) :

<?php

namespace Obtao\BlogBundle\Controller;

use Obtao\BlogBundle\Form\Type\ArticleSearchType;
use Obtao\BlogBundle\Model\ArticleSearch;
use Symfony\Bundle\FrameworkBundle\Controller\Controller;
use Symfony\Component\HttpFoundation\Request;

class ArticleController extends Controller
{

    public function listAction(Request $request)
    {
        $articleSearch = new ArticleSearch();

        $articleSearchForm = $this->get('form.factory')
            ->createNamed(
                '',
                'article_search_type',
                $articleSearch,
                array(
                    'action' => $this->generateUrl('obtao-article-search'),
                    'method' => 'GET'
                )
            );
        $articleSearchForm->handleRequest($request);
        $articleSearch = $articleSearchForm->getData();
        
        $elasticaManager = $this->container->get('fos_elastica.manager');
        $results = $elasticaManager->getRepository('ObtaoBlogBundle:Article')->search($articleSearch);

        return $this->render('ObtaoBlogBundle:Article:list.html.twig',array(
            'results' => $results,
            'articleSearchForm' => $articleSearchForm->createView(),
        ));
    }
}

This should work. Finally, here is the search method which builds the query for Elasticsearch. If you read this article, you are probably here for that, and thank to be still here.

<?php

namespace Obtao\BlogBundle\Entity\SearchRepository;

use FOS\ElasticaBundle\Repository;
use Obtao\BlogBundle\Model\ArticleSearch;

class ArticleRepository extends Repository
{
    public function search(ArticleSearch $articleSearch)
    {
        // we create a query to return all the articles
        // but if the criteria title is specified, we use it
        if ($articleSearch->getTitle() != null && $articleSearch != '') {
            $query = new \Elastica\Query\Match();
            $query->setFieldQuery('article.title', $articleSearch->getTitle());
            $query->setFieldFuzziness('article.title', 0.7);
            $query->setFieldMinimumShouldMatch('article.title', '80%');
            //
        } else {
            $query = new \Elastica\Query\MatchAll();
        }
         $baseQuery = $query;

        // then we create filters depending on the chosen criterias
        $boolFilter = new \Elastica\Filter\Bool();

        /*
            Dates filter
            We add this filter only the getIspublished filter is not at "false"
        */
        if("false" != $articleSearch->getIsPublished()
           && null !== $articleSearch->getDateFrom()
           && null !== $articleSearch->getDateTo())
        {
            $boolFilter->addMust(new \Elastica\Filter\Range('publishedAt',
                array(
                    'gte' => \Elastica\Util::convertDate($articleSearch->getDateFrom()->getTimestamp()),
                    'lte' => \Elastica\Util::convertDate($articleSearch->getDateTo()->getTimestamp())
                )
            ));
        }

        // Published or not filter
        if($articleSearch->getIsPublished() !== null){
            $boolFilter->addMust(
                new \Elastica\Filter\Terms('published', array($articleSearch->getIsPublished()))
            );
        }

        $filtered = new \Elastica\Query\Filtered($baseQuery, $boolFilter);

        $query = \Elastica\Query::create($filtered);

        return $this->find($query);
    }

}

This is it. I spare you the template that has no interest in this article (you can find it here ). You now have a list of articles with a little search form that allow you to filter the results.

In the next article, we’ll see how to paginate this list and add sort criterias (in addition to our filters).

To know more about Elasticsearch in a Symfony project (but not only), read our other articles on the topic

Share Button

12 thoughts on “Indexing and simple search with Elasticsearch and Symfony

  1. In your file app/config/fos_elastica.yml at the line 5 you configure your host and port
    I want to do the same thing, but connecting to a cluster.
    I have tried to use like this:
    default:
    – { host: localhost, port: 9200 }
    – { host: localhost, port: 9201 }
    – { host: localhost, port: 9202 }

    But it does not work
    Do you know what should I do?
    Thanks in advance

    • Hi,

      What error do you have?

      An Elasticsearch cluster is a group of nodes handled by a master node. Your application can contact this master node.
      The default behaviour is that the first node became master node, when other nodes wake up and search for a cluster to join, as a master exists they became only “data” nodes.

      If you have the head plugin on your server, take a look at the homepage. You will see your cluster and the attached nodes.
      In the >1.0.0 version, the master node is marked by a Star before the name (a dot for other nodes)
      In the <1.0.0 version, the master node was marked by an orange color. (White for the others)

      A first try would be to connect only the master and see if your application can connect to it.
      In head plugin screen, you can get more infos about a node and http_address value contains path and port to your cluster.

      Hope this will help,
      François

      • Hi Francois,
        Thanks for the quick answer
        I think there is something wrong with your concept about cluster connection.
        As you we see here http elastica.io/getting-started/installation.html#section-connect-cluster, the elastica library itself shows how to configure a connection with a cluster, we need to pass all nodes we want to connect.

        The ruby and perl libraries also works in the same way, and the reason is High Availability, if one of the nodes goes down, the client still have other(s) node(s) to connect and keep the application working.

        That’s the reason of my question.

        • You are right! Master nodes are there to handle shards allocation.
          I wanted to know if you can connect to one of your nodes (your master node).

          Your error was in yaml config file read?
          If you success to connect to one of your nodes, please try this configuration :

          fos_elastica:
          clients:
          default:
          servers :
          – {host: localhost,port: 9200}
          – {host: localhost, port: 9201}
          – {host: localhost, port: 9202}

          If the first node is down (9200), you will see 2 requests :
          1 to 9200 => returns an error
          1 to 9201 => returns your response

          I don’t know if “round robin algorithm” works here. But your use case is covered (If one of your nodes is down, there is a spare one)

          François

  2. when i run the command fos:elastica: populate i get this error Fatal error: Class ‘Obtao\BlogBundle\Entity\Repository\ArticleRepository’ not found in C:\NetBeansProjects\elasticsearch\vendor\doctrine\orm\lib\Doctrine\ORM\Repository\DefaultRepositoryFactory.php on line 75.
    please could you help me figure this out

      • Hi !
        Are you using the blog sandbox from github ? Or creating a new application ?

        If you are not, just remove this line
        * @ORM\Entity(repositoryClass=”Obtao\BlogBundle\Entity\Repository\ArticleRepository”)

        From Article entity

    • @Gamo Nana
      The issue has been fixed.
      Steps as below:
      1. Create file ..\Acme\DemoBundle\Resources\config\doctrine\Article.orm.yml
      with content:
      # src/AppBundle/Resources/config/doctrine/Product.orm.yml
      Acme\DemoBundle\Entity\Article:
      type: entity
      table: article
      id:
      id:
      type: integer
      generator: { strategy: AUTO }
      fields:
      title:
      type: string
      length: 100
      content:
      type: string
      length: 100
      createdAt:
      type: datetime
      length: 100
      publishedAt:
      type: datetime
      length: 100

      2. Add getter function to Article.php
      id;
      }

      /**
      * Get id
      *
      * @return integer
      */
      public function getTitle()
      {
      return $this->title;
      }

      /**
      * Get content
      *
      * @return integer
      */
      public function getContent()
      {
      return $this->content;
      }

      /**
      * Get id
      *
      * @return integer
      */
      public function getCreatedAt()
      {
      return $this->createdAt;
      }

      /**
      * Get id
      *
      * @return integer
      */
      public function getPublishedAt()
      {
      return $this->publishedAt;
      }

      /**
      * @ORM\PrePersist
      */
      public function prePersist()
      {
      $this->createdAt = new \DateTime();
      }

      public function isPublished()
      {
      return (null !== $this->getPublishedAt());
      }

      // others getters and setters

      }

      I have try and worked for me.

      Thanks

  3. Hi,
    great Article-Series. Are planning on writing something about “Organize your code in the right way”. I would be very interested in your thoughts on that.

    Thanks, Hannes

  4. i cant seem to get past this error Attempted to call an undefined method named “search” of class “FOS\ElasticaBundle\Repository” in the listAction function in the controller

Leave a Reply

Your email address will not be published. Required fields are marked *

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Protected by WP Anti Spam