Tag : duplicate-document
Tag : duplicate-document
At the time indexing document in elasticsearch I was having duplicate record issue. It’s mainly due to schema free nature of elasticsearch. By default each document indexed is associated with an id and a type. If we have not specified _id value then it will by default as md5 key.
e.g.
curl -XPOST 'http://localhost:9200/database/user' -d ' { "user_login": "appa", "name": "Appasaheb Sawant", "postDate": "2013-03-11", "body": "I am a Sr. Software Engineer." , "email": "appasaheb.sawant@gmail.com" } curl -XPOST 'http://localhost:9200/database/user' -d ' { "user_login": "appa", "name": "Appasaheb Sawant", "postDate": "2013-03-11", "body": "I am a Sr. Software Engineer." , "email": "appasaheb.sawant@gmail.com" }
It will insert two records, in this case its duplicate. But we can easily solve this. We just need to specify _id as unique.
e.g.
curl -XPOST 'http://localhost:9200/database/user/1' -d ' { "user_login": "appa", "name": "Appasaheb Sawant", "postDate": "2013-03-11", "body": "I am a Sr. Software Engineer." , "email": "appasaheb.sawant@gmail.com" } curl -XPOST 'http://localhost:9200/database/user/1' -d ' { "user_login": "appa", "name": "Appasaheb Sawant", "postDate": "2013-03-11", "body": "I am technical lead." , "email": "appasaheb.sawant@gmail.com" }
Above commands will index only one document and second command will update first index. It will show record with body as “I am technical lead.”
Categories: Elasticsearch, Uncategorized