Ask To Apps
  • Home
  • WordPress
  • Elasticsearch
  • PHP
  • Linux
  • Website Peformance

Tag : duplicate-document

Duplicate record problem in elasticsearch

02/03/2014 Article

At the time indexing document in elasticsearch I was having duplicate record issue. It’s mainly due to schema free nature of elasticsearch. By default each document indexed is associated with an id and a type. If we have not specified _id value then it will by default as md5 key.
e.g.

curl -XPOST 'http://localhost:9200/database/user' -d '
{
"user_login": "appa",
"name": "Appasaheb Sawant",
"postDate": "2013-03-11",
"body": "I am a Sr. Software Engineer." ,
"email": "appasaheb.sawant@gmail.com"
}
curl -XPOST 'http://localhost:9200/database/user' -d '
{
"user_login": "appa",
"name": "Appasaheb Sawant",
"postDate": "2013-03-11",
"body": "I am a Sr. Software Engineer." ,
"email": "appasaheb.sawant@gmail.com"
}

It will insert two records, in this case its duplicate. But we can easily solve this. We just need to specify _id as unique.
e.g.

curl -XPOST 'http://localhost:9200/database/user/1' -d '
{
"user_login": "appa",
"name": "Appasaheb Sawant",
"postDate": "2013-03-11",
"body": "I am a Sr. Software Engineer." ,
"email": "appasaheb.sawant@gmail.com"
}

curl -XPOST 'http://localhost:9200/database/user/1' -d '
{
"user_login": "appa",
"name": "Appasaheb Sawant",
"postDate": "2013-03-11",
"body": "I am technical lead." ,
"email": "appasaheb.sawant@gmail.com"
}

Above commands will index only one document and second command will update first index. It will show record with body as “I am technical lead.”

Categories: Elasticsearch, Uncategorized

Tags: Duplicate document, Elasticsearch

About Author:

Recent Posts

  • Install and configure logstash-forwarder
  • Redirect request on php script through squid proxy
  • Alerting for Elasticsearch : Log watcher in elasticsearch using ES Watcher
  • Detect face from image using python script with OpenCV
  • Change mysql root password on centos
  • Search part of word in elasticsearch using nGram – auto-complete search
  • Connect VPN on centos linux using command line
  • Custom river plugin in elasticsearch
  • Backup elasticsearch with snapshot and restore api
  • PHP code to exact keywords from text.

Tags

apache Apache Lucene cache Distributed Elastic Index Elasticsearch elasticsearch performance Git Clone Git Hub Git Hub Configuration Git Hub Installation grep Import Install MySQL JSON over HTTP Linux Linux Command Linux Commands Linux search local file lsyncd md5sum Multile Domain Multisite MySQL Open Source optimization performance php performance real time search remote file Remote Git Hub remote sync RESTful Scale Schema Free Search Engine Search Index Search keyword static cache Sub Domain sync Very fast Wordpress Wordpress multisite

Copyright Ask To Apps 2021 | Proudly powered by WordPress

facebook twitter google linkedin Email Rss