Duplicate record problem in elasticsearch

At the time indexing document in elasticsearch I was having duplicate record issue. It’s mainly due to schema free nature of elasticsearch. By default each document indexed is associated with an id and a type. If we have not specified _id value then it will by default as md5 key.
e.g.

curl -XPOST 'http://localhost:9200/database/user' -d '
{
"user_login": "appa",
"name": "Appasaheb Sawant",
"postDate": "2013-03-11",
"body": "I am a Sr. Software Engineer." ,
"email": "appasaheb.sawant@gmail.com"
}
curl -XPOST 'http://localhost:9200/database/user' -d '
{
"user_login": "appa",
"name": "Appasaheb Sawant",
"postDate": "2013-03-11",
"body": "I am a Sr. Software Engineer." ,
"email": "appasaheb.sawant@gmail.com"
}

It will insert two records, in this case its duplicate. But we can easily solve this. We just need to specify _id as unique.
e.g.

curl -XPOST 'http://localhost:9200/database/user/1' -d '
{
"user_login": "appa",
"name": "Appasaheb Sawant",
"postDate": "2013-03-11",
"body": "I am a Sr. Software Engineer." ,
"email": "appasaheb.sawant@gmail.com"
}

curl -XPOST 'http://localhost:9200/database/user/1' -d '
{
"user_login": "appa",
"name": "Appasaheb Sawant",
"postDate": "2013-03-11",
"body": "I am technical lead." ,
"email": "appasaheb.sawant@gmail.com"
}

Above commands will index only one document and second command will update first index. It will show record with body as “I am technical lead.”

Categories: Elasticsearch, Uncategorized