Elasticsearch: Search Camp

Nov 17 2016

A short course in using Elasticsearch 2.x.

Outline

Overview
Data Modeling Basics Searching
Suggestions Aggregations
Advanced Data Modeling Relevancy
Wrap up

Overview

The Elasticsearch documentation is quite good with plenty of examples on how to write queries. Checkout the guide. For this tutorial, we’ll be using kibana. Please install ES 2.4.x and Kibana 4.6.x – see the product compatibility guide for more details.

Guide https://www.elastic.co/guide/en/elastic search/reference/current/setup.html
Kibana-appsense http://localhost:5601/app/sense
Versions-Kibana4.6/Elasticsearch 2.4/Apache Lucene 5.5

Overview

Rest APIS

The ES provides a REST API for applications to use with the standard semantics associated with the GET, PUT, POST, DELETE methods. Many if not most of the operations support standard set of options and use JSON as the data types such as:

JSON String, Boolean, Number, …
Time types: Time Type – 10y := 10 years. Also do date math e.g now-1d
Distance Types: 25mi := 25 miles
Size Types: 25mb := 25 Megabytes

As expected, these REST APIS make use of the standard response codes 200, 201, 404, 500 to represent operation success, resource created, resource not found and server error.

The APIS fall into:

Document Store APIs i.e. APIs that provide CRUD operations for storing Documents in Elasticsearch.
Search APIs for finding relevant documents.
There are APIs for bulk operations as well.
Other APIs include aggregation support, completion or suggestions and management APIs.

Data Modeling Basics

Elasticsearch provides a NOSQL document store similar to something like mongo. Documents are uploaded to this store and indexed for subsequent querying. The ES document store does not support transactions, however a single request’s modification of an ES document is atomic.

While modifications to documents in the document store happens effiectively synchronously, indexing is an asynchronous process and it may take some time before a modified document appears in the search results.

Document

An elasticsearch document is

JSON document
Flat data structure
Key/Value pairs
Support Data Types

CRUD – Lab

Direct installation

#See install elasticsearch 2.4 and start it.
# install kibana 4.6 and run it.
# download the sample account data from ES and extract the
# accounts.json file and place it in your camp working folder.
# download and save elasticsearchCampLoadData.sh to the
# same folder as the accounts.json data file. Then in a console
$ bash ./elasticsearchCampLoadData.sh
# Finally goto http://localhost:5601/app/sense and 
# play with the examples here.

Docker

Instead of installing the images, you might want to use docker instead

docker pull elasticsearch:2.4.6
docker pull kibana:4.6
docker run -d --name my-es -v "$PWD/esdata" -p 9200:9200 -p 9300:9300 elasticsearch:2.4.6
docker run -d --name my-kibana -v "$PWD/plugins:/plugins" -e ELASTICSEARCH_URL=http://$(hostname):9200 -p 5601:5601  kibana:4.6.6 --plugins /plugins
docker exec my-kibana kibana plugin --install elastic/sense
docker stop my-kibana
docker start my-kibana
# open browser to http://localhost:5601/app/sense
# install kibana 4.6 and run it.
# download the sample account data from ES and extract the
# accounts.json file and place it in your camp working folder.
# then in the same folder, run curl as follows:
curl >/dev/null --fail --silent --show-error -s -XPOST "localhost:9200/bank/account/_bulk" --data-binary "@accounts.json"

# WARNING! You'll need to change the server URL in the app sense 


# console to be http://{hostname}:9200 instead of http://localhost:9200

Docker-Compose

# install docker compose
# download the elasticsearchCamp-docker-compose.yml and save it to a directory and rename it to docker-compose.yml
# execute the following:
docker-compose up -d
docker-compose run kibana kibana plugin --install elastic/sense
docker-compose restart

# WARNING! You'll need to change the server URL in the app sense 
# console to be http://elasticsearch:9200 instead of http://localhost:9200

Additional Data

In addition to the accounts we’ve preloaded, let’s add some extra data:

POST bank/account/_bulk
{ "index" : { "_id" : "2001" } }
{"account_number":2001,"balance":38395,"firstname":"Gay","lastname":"Case","age":57,"gender":"F","address":"1343 Loop Mountain Avenue","employer":"University of Missouri","email":"gay@um.com","city":"Columbia","state":"MO"}
{ "index" : { "_id" : "2002" } }
{"account_number":2002,"balance":405,"firstname":"Marcie Gay","lastname":"Masterston","age":30,"gender":"M","address":"443 Marshal Fieldington Avenue","employer":"Packbell", "email":"marcie.gay.masterston@pacbell.com","city":"Colinas","state":"CA"}
{ "index" : { "_id" : "2003" } }
{"account_number":2003,"balance":72370,"firstname":"Gay","lastname":"Reddy","age":43,"gender":"M","address":"150 Oakbridge Avenue","employer":"University of California","email":"gay.reddy@ucsf.com","city":"Palo Alto","state":"CA"}
{ "index" : { "_id" : "2004" } }
{"account_number":2004,"balance":15270,"firstname":"Mike","lastname":"Martin","age":42,"gender":"M","address":"22157 Mighty bridge Avenue","employer":"University of California","email":"mike.martin@ucsf.com","city":"Palo Alto","state":"CA"}

Document Types

Keep the following in mind when defining ES document types within an index:

Property definitions must be consistent within single index
All types in one index stored in a single Lucene schema.
Implies either single type (or only related data) in one Index
Document Property types have either explicitly defined or automatically assigned mapping types.
JSON types: String, Numeric, Boolean, Arrays
Object Types: Object, Nested, Parent/Child
Geo: Point & Shape
Completion
Other: IPv4, TokenCount, …

Predefined properties for all Documents

Property Name	Default Behavior	Description
_id	mandatory	Document id
_index	mandatory	The index name
_type	mandatory	The document type
_version	mandatory	The version number. This is auto-incremented upon each modification
_source	optional. Default is enabled	if enabled, then the original JSON document that was indexed will be stored
_all	optional. Default is enabled	if enabled, makes all the values in JSON document available in a special _all field

Index and CRUD Operations

Index, Re-Index

You use PUT to both create and modify a document in the document store. In both cases you must supply the document id. If you prefer to have ES assign the document id during creation, you can use POST instead of PUT to create the document.NOTE:

PUT my_index/my_type/1
{
    "message": "some arrays in this document...",
    "tags": [ "elasticsearch", "wow" ],
    "lists": [
        {
            "name": "prog_list",
            "description": "programming list"
        },
        {
            "name": "cool_list",
            "description": "cool stuff list"
        }
        ...
    ]
}

NOTE: You can use POST if you want ES to allocate the id

GET, PUT, DELETE

  ${METHOD} ${IndexName}/${DocType}/${id}
  ${JSON-Document}

e.g.

curl -X{METHOD} “http://localhost:9200/${IndexName}/${DocType}/${id}” -d "${JSON-Document}"

Other examples:

    GET my_index/my_type/1

DELETE (status 200 Deleted 404 Notfound)

    DELETE my_index/my_type/1

Multiple Searches

Elasticsearch supports passing multiple search requests in one HTTP request. See multi-search.

POST .kibana/_msearch
     {"index" : "bank"}
     {"query" : {"match_all" : {}}, "from" : 0, "size" : 10} {"index" : "bank", "search_type" :      "dfs_query_then_fetch"} {"query" : {"match_all" : {}}}
     {}
     {"query" : {"match_all" : {}}}
     {"query" : {"match_all" : {}}} {"search_type" : "dfs_query_then_fetch"} {"query" : {"match_all" : {}}}

Useful to perform 2 queries, say one to get normal results and another to get ancillary results.

Versioning

Elasticsearch supports external versioning, that is passing the version number in separately from the document itself. e.g.
# create the doc
PUT twitter/tweet/1
{
“message” : “elasticsearch now has versioning support, double cool!”
}

# now change it supplying the wrong version:

PUT twitter/tweet/1?version=2
{
    "message" : "elasticsearch now has versioning support, double cool!"
}

The value provided must be a numeric, long value greater or equal to 0, and less than around 9.2e+18. More details can be found here.

Miscellaneous

Use _create to implement the “create if not exists” semantic.
Use a routing parameter for collocating objects in the same shard. This is required for parent/child documents. All child documents must be collocated in the same shard as their parent.
A GET operation to retrieve the actual document is available as soon as the initial PUT completes, but the document will take a bit to show up in search. That is, analyzing the document and updating the indices for full text search is done asynchronously.
Can control what is returned in GET using _source parameter
Can use scripts when updating docs
Warning: aggregations and query DSL preferred mechanisms
_update to ensure document is already there
Also have MGET API

Searching

Elasticsearch is all about full text searching. To allow searching, ES indexes
each JSON document which means creating several inverted indexes for all the
documents in a shard. Inverted Index contains a “term” with a list of tuples
containing:

(doc id, field, num-times, and array of (term-position, start- char-pos, end-char-pos, ...))

There is also a distributed data structure containing: (term, num-docs having the term)

When performing a search, a “query” (free format text) is “analyzed” for “terms”. That is, ES takes a string and determines zero or more terms to look for in its Inverted Indices. ES uses one or more Inverted Indices to get a list of documents.

Example

1. The quick brown fox jumped over the lazy dog
2. Quick brown foxes leap over lazy dogs in summer

Term	Doc-1	Doc-2
brown	x	x
quick	x

We’d like to treat Quick the same as quick and perhaps treat foxes the same as fox. To do so, we need to change the way ES “analyzes” the document and the query. ES uses analyzers to split text into Terms. An analyzer consists of:

Some Character filter(s) – throws away unwanted chars (e.g. html_strip)
A Tokenizer – splits the string into separate tokens (tricky for some Asian languages)
Some Token filter(s) – these modify the tokens removing (stop words), (stemming), augments (synonyms, ngram), lowercasing, …

Analyzers – Lab

Goto http://localhost:5601/app/sense

GET /_analyze
{
    "analyzer" : "standard",
    "text" : "The quick brown fox jumps over the lazy dogs."
}

Try these analyzers as well and see how they breakdown the text:

“simple”
“standard”
“whitespace”
“english”

Analyzers – Lab

Create a custom analyzer

PUT another_index
{
    "settings" : {
        "analysis" : {
            "analyzer" : {
                "camp_analyzer" : {
                    "tokenizer" : "whitespace",
                    "filter" : [ "lowercase", "stop", "snowball" ]
                }
            }
        }
    }
}

With our new analyzer, we can see how it parses our text:

GET another_index/_analyze
{
    "analyzer": "camp_analyzer",
    "text": "The quick brown fox jumps over the lazy dogs."
}

One thing to note about token filters, they can add multiple tokens each having the same position. E.g the synonym filter injects synonyms having the same term positions as the origina. This is written like:

leaping => (leap, jump, leaping)

Common Character Filters

Name	Description
mapping	Provide a set of mappings to translate characters to others
html_strip	remove html
pattern_replace	regular expressions to do match/replace

Common Tokenizers

Name	Description
standard	Grammar based tokenizer (works well for European language documents) (implements Unicode Standard Annex #29 )
nGram	creates tokens from substrings of original token
edgeNGram	like ngram
keyword	Just emits whole token unchanged
whitespace	breaks at white space
pattern	takes a regular expression
custom	use a custom java class to implement

Common Token Filters

Name	Description
standard	nothing
asciifolding	convert non-ascii to ascii if possible
lowercase	to lower
stop	Remove stop words
snowball	algorithmic stemmer
phonetic	phonetic

Often, we want text analyzed in multiple ways to facilitate matching and relevance scoring. After the terms are calculated, ES scores docs according to how best they match. These are ranking models:

Classic similarity (default for 2.x)
BM25 similarity (default for 5.x)

Both the classic and BM25 similarity algorithms are based on TF/IDF (term frequency x inverse document frequency). The basic idea is that a document is considered more relevant the more times that term appears in it. However, it looses score if that term is very common – that is the more times that term appears in all other documents, the less weight it has in relevancy.

Mappings

When data is indexed, ES tries to guess the type of data. If it knows the type, it can handle say:

dates different from keywords,
keywords different from english text
english text different from numbers

etc.

We can specify mappings before adding data. If not, ES will guess for us. Using the below in the sense app, see what indexes have been created and take a look at the mappings for bank.

GET _cat/indices

GET bank/_mapping

PUT blogs
{
    "mappings": {
        "user": {
            "_all": { "enabled": false },
            "properties": {
                "title": { "type": "string" },
                "name": { "type": "string" },
                "age": { "type": "integer" }
            }
        },
        "blogpost": {
            "properties": {
                "title": { "type": "string" },
                "body": { "type": "string" },
                "user_id": {
                    "type": "string",
                    "index": "not_analyzed"
                },
                "created": {
                    "type": "date",
                    "format": "strict_date_optional_time||epoch_millis"
                }
            }
        }
    }
}

We can specify an analyzer to use when “indexing” and another to use when “searching”. Further, we can use multi-field specification to have the same text analyzed in several different ways. Finally, can have text not_analyzed.

Multi-field Lab

Create an explicit mapping for my_type.city and provide a multi-field raw field that is not analyzed.

PUT /yetanother_index
{
    "mappings": {
        "my_type": {
            "properties": {
                "city": {
                    "type": "string",
                    "fields": {
                        "raw": {
                            "type": "string",
                            "index": "not_analyzed"
                        }
                    }
                }
            }
        }
    }
}

Let’s index 2 docs. This one:

PUT /yetanother_index/my_type/1
{
    "city": "New York"
}

And this one:

PUT /yetanother_index/my_type/2
{
    "city": "York"
}

Finally, let’s try out a search

GET /yetanother_index/_search
{
    "query": {
        "match": {
            "city": "york"
        }
    },
    "sort": {
        "city.raw": "asc"
    },
    "aggs": {
        "Cities": {
            "terms": {
                "field": "city.raw"
            }
        }
    }
}

When searching, you can use a URI based (query_string) search request or pass in the request body a JSON document that uses the QueryDSL syntax. Query DSL is recommended over the URI approach. For more info on the URI based approach, see
the search-uril-requests entry

Checkout the searchDSL and Query DSL info

Note the below has both a query and a filter.

GET bank/_search?pretty {
    "query": {
        "bool" : {
            "should" : {
                "match": {
                    "firstname" : "Gay"
                }
            },
            "minimum_should_match" : 1,
            "filter" : {
                "match" : { "gender" : "F" }
            }
        }
    }
}

CAUTION:WHAT?

HTTP GET Requests with a body?

While allowed by the standards, not all HTTP clients allow you to pass a body in an HTTP GET so ES allows you to use a POST instead of a GET for searching.

Search requests can take parameters to specify a timeout. NOTE: ES continues processing the request, it just returns a response to the client when the timeout occurs.
(see circuit breakers )

Types of Queries

match_all

GET bank/_search
{
    "query": {
        "match_all" : {}
    }
}

match

GET bank/_search
{
    "query": {
        "match": {
            "address" : { "query" : "Cumberland" }
        }
    }
}

match with the _all field

GET bank/_search
{
    "query": { "match": {
        "_all" : "Gay Brewer" }
    }
}

match with fuzziness

GET bank/_search
{
    "query": { "match": {
        "_all" : { "query" : "Gay Brower", "fuzziness": 1 }
    } }
}

multi_match with boosting

GET bank/_search
{
    "query": { "multi_match": {
        "type" : "best_fields",
        "query" : "Gay",
        "fields": [ "firstname", "lastname^10", "address"]
    } }
}

Try switching the boosting to firstname instead

multi_match types

best_fields – use the score for the best matching field
most_fields – combine the scores for all the fields – treat as increasing relevance
cross_fields – essentially treat the fields (the ones with the same analyzer) as being one field

match phrase

GET bank/_search
{
    "query": { "match_phrase": {
        "address": "375 Cumberland Street" 
    } }
}

GET bank/_search
{
    "query": { "match_phrase": {
        "address": {
            "query" : "722 Street",
            "slop" : 1
        }
    } }
}

simple_query_string

GET bank/_search
{
    "size": 40,
    "query": {
        "simple_query_string" : {
            "query": "(Mike | Michael)",
            "analyzer": "standard",
            "fields": ["firstname^5","_all"],
            "default_operator": "and"
        }
    }
}

WARNING: avoid query_string

more_like_this

GET bank/_search
{
    "size" : 200,
    "query": {
        "more_like_this" : {
            "fields" : ["firstname", "lastname"],
            "like" : "Michael",
            "min_term_freq" : 1,
            "max_query_terms" : 12
        }
    }
}

We can specify a filter in addition to a query

GET bank/_search
{
    "query": { "bool" : {
        "should" : { "match": {
            "firstname" : "Gay" }
        },
        "minimum_should_match" : 1,
        "filter" : {
            "match" : { "gender" : "F" }
        }
    } }
}

Query and Filter Contexts

Use the same DSL in queries and in filters
The difference between a “query” and a “filter” is that queries are scored and filters are not. Filters can be cached

We can specify an analyzer in the query

GET bank/account/_search?pretty
{
    "query": {
        "bool" : {
            "should" : {
                "match": {
                    "firstname" : {
                        "query" : "Gay",
                        "analyzer" : "english"
                    }
                }
            },
            "minimum_should_match" : 1,
            "filter" : {
                "match" : { "gender" : "F" } 
            }
        }
    }
}

# note nothing was found. See how the english analyzer tokenies our query
GET _analyze
{
  "analyzer":"english",
  "text": "Fredericks"
}

# what happens if we change the analyzer?

Control the size and starting offset

GET bank/_search
{
    "size": 8,
    "from": 11,
    "query": {
        "match_all" : {}
    }
}

source filtering

GET bank/_search
{
    "query": { "match_all" : {} },
    "_source": ["firstname", "gender"]
}

GET bank/_search
{
    "query": { "match_all" : {} },
    "_source": false
}

range query/filter

GET bank/_search
{
    "query": { "bool": {
        "filter": [
            { "range": { "age": { "lte": 40 } } }
        ]
    } }
}

you can also use dates. Assuming that you had a date field. e.g.

GET mydata/_search
{
    "query": { "bool": {
        "filter": [
            { "range": { "start_date": { "lte": "2015-01-01" } } }
        ]
    } }
}

You can use ‘now’ for the current time and do date math if needed.

GET mydata/_search
{
    "query": { "bool": {
        "filter": [
            { "range": { "start_date": { "lte": "now-1y" } } }
        ]
    } }
}

profiling

GET bank/_search
{
    "profile": true,
    "query": { "match_all" : {} }
}

Suggestions and Aggregations”

Suggestions
See you complete me. WARNING suggestions have changed in ES 5.x.

Why treat suggestions different than search? In short:

Need for speedSuggestions use a special type “completion” This results in building an Finite State Transducer (FST) like below for hotel, marriot, mercure, munchen, munich. Suggestions require fields with a special type ‘completion’.

GET myindex/mydoctype/_mapping This returns:
{
    "myindex": {
        "mappings": { 
            "mydoctype": {
                "properties": {
                    "mytype_suggest": { 
                        "type": "completion",
                        "analyzer": "simple",
                        "payloads": false, 
                        "preserve_separators": true,
                        "preserve_position_increments": true,
                        "max_input_length": 50
                    }, ...

We can either upload a field called mytype_suggest or we can populate it from another field using copy_to

{
    "myindex": {
        "mappings": {
            "mytype": {
                "properties": {
                    "mytype_name": {
                        "type": "string",
                        "copy_to": [ "mytype_suggest" ]
                    },
                    "mytype_suggest": {
                        "type": "completion",
                        "analyzer": "simple",
                        "payloads": false,
                        "preserve_separators": true,
                        "preserve_position_increments": true,
                        "max_input_length": 50
                    }, ...

To get a suggestion for a single document type:

GET myindex/_suggest 
{
    "text" : "San", 
    "myindex" : {
        "completion" : {
            "field" : "mytype_suggest"
        }
    }
}

To get suggestions for several document types:

GET myindex/_suggest 
{
    "text" : "San",
    "mytype" : {
        "completion" : {
            "field" : "mytype_suggest"
        }
    },
    "anothertype" : {
        "completion" : {
            "field" : "anothertype_suggest" 
        }
    }
}

There is some very basic support for filtering suggestions, see the suggester-context entry. Context suggest allows setting up contexts to help filter suggestions. This is not as flexible as search, but it is much faster than doing a full text search.

Suggestions – Payloads
Without doing anything, all that is returned is a string. At time of indexing, you can give a payload. First, need a mapping that allows payloads:

PUT bank/location/_mapping 
{
    "location" : {
        "properties" : {
            "name" : { "type" : "string" },
            "loc_suggest" : {
                "type" : "completion",
                "analyzer" : "simple",
                "payloads" : true
            }
        }
    }
}

Next, when indexing, provide a payload

PUT bank/location/1 
{
    "name" : "One Embarcadero Center, San Francisco",
    "loc_suggest" : {
        "input" : [ "Embarcadero One", "One Embarcadero", "San Francisco", "SF"
],
        "output" : "One Embarcadero Center, San Francisco",
        "payload" : { "location_id" : 1 }
    }
}

Check what is returned

GET bank/_suggest 
{
    "text" : "emb",
    "locations" : {
        "completion" : {
            "field" : "loc_suggest"
        }
    }
}

Aggregations

Aggregations are useful for getting an overview of what data is available. Aggs provide the following functionality:

Aggs create Buckets (groups of documents)
Can have metrics applied on bucket
Can nest aggregations

For one way to expose aggregations, see Etsy

Bucket types

Terms
Significant Terms
Children
Nested

Metrics

Count
Sum
Avg, Max, Min
Stats
Percentiles
Geo bounds

To enable Aggregations, ES uses a different data structure than its inverted indices. Instead of looking up docs based on values, it has another mechanism of looking up values (and their associated doc ids) based on the field. This is the column store.

This column store is saved to disk and is referred to as the “doc-values”.

Terms Aggregation

GET bank/_search 
{
    "size" : 0,
    "query": {
        "bool" : { "should" : {
            "match": {
                "firstname" : "Michael"
            } 
        },
        "minimum_should_match" : 1,
        "filter" : {
            "match" : { "gender" : "M" } 
        }
        }
    },
    "aggs": { "employers": {
        "terms": {
            "field": "employer", "size": 10
        }
    } }
}

Buckets in the TERMS aggregation are formed using unique values.
The search query will return the top number of TERMs for all the documents matching the query.

Significant Terms Aggregation

GET myindex/_search 
{
    "size" : 0, "query": {
        "bool" : { "should" : {
            "match": {
                "mytype_name" : "Mary"
            }
        },
        "minimum_should_match" : 1,
        "filter" : {
            "match" : { "mytype_gender" : "F" } 
        }
        } },
        "aggs": { "specialties": {
            "significant_terms": { 
                "field": "specialties",
                "size": 10
            }
        } }
}

Buckets in the SIGNIFICANT TERMS aggregation are formed using unique values that are common for the filtered group but not common in the whole set of documents.
Known as the uncommonly common aggregation

Advanced Data Modeling

ES recommends that while considering your data model, you define questions that you want to ask. E.g.

Show me posts sorted by number of comments.
What queries have been performed by users in SF?
What widgets with these dohickies and belonging to this whatamacallit are available?

Other considerations:

Nested Objects to facilitate certain subqueries.
Parent Child to facilitate certain subqueries.
Routing
Include_in_all
copy_to
store
_all
dynamic field mappings
index templates
geo points and geo shapes
limitation to nested object depth

Nested Objects

The ES Guide has some good examples for why to use nested objects.
If a JSON document has an array of sub-objects, the values get flattened. We can tell ES to preserve the object structure by marking a property as being nested.
ES will keep nested sub-objects in their own separate documents so that search works as expected.

Parent Child

ES allows a parent child relationship to be setup. This is a 1-N relationship.
Both the parent and its children must reside in the same shard. Can use children to find parents, to boost relevance etc and vice-versa.

Relevancy

See
relevancy discussion. Relevancy can be tuned using the following:

Boost particular fields
Using compound query clauses
within should to increase relevance

Using scripts to modify scoring

Changing the similarity algorithm

Wrap up

Percolator

This is a framework for allowing users to register queries and get notifications when new documents match against the query.

Highlighting

Can cause ES to return text that resulted in the match back to the caller. ES will annotate the text with HTML to emphasize the matching values.

Scroll And Reindex

ES provides a way to iterate over all the documents in an index if needed.
NOTE: there is a reindex API that replaces most usages of Scroll

Java Clients

ES 2.4 does not have a Java REST client although many companies are using JEST, however ES 5.0 has its own java REST client. Using REST is preferred over the Node or Transport client.

Elasticsearch Links

Discussion Forums – https://discuss.elastic.co
Meetups – https://elasticsearch.meetup.com
Docs – https://elastic.co/docs
Community – https://elastic.co/community
More Resources – https://www.elastic.co/learn

Appendix

Data Models : Sample MyType

  
{
     "mytype_id" : 15603176,
     "document_type" : "mytype",
     "mytype_type" : "P",
     "mytype_friendly_name" : "Katz, Ben Z., MD",
     "mytype_first_name" : "Ben",
     "mytype_gender" : "M",
     "mytype_initials" : "Z",
     "mytype_last_name" : "Katz",
     "lat_lon" : [ "41.93,-87.65", "41.90,-87.62" ],
     "mytype_school" : "New York Univ Sch Of Med",
     "mytype_name" : "Katz, Ben Z., MD",
     "mytype_states" : [ "IL" ]
  }

Search query

curl -XGET 'http://127.0.0.1:9200/myindex/_search?pretty' -d '
{
    "size": 20,
    "query" : {
        "bool" : {
            "should" : {
                "match": {
                    "mytype_name" : "z"
                }
            },
            "minimum_should_match" : 1,
            "filter" : {
                "bool" : {
                    "must" : [
                        {
                            "geo_distance": {
                                "distance": "1km",
                                "mytype_lat_lon": {
                                    "lat": 41.93,
                                    "lon": -87.65
                                }
                            }
                        },
                        {
                            "terms" : {
                                "mytype_states": [ "CA", "NV" ]
                            }
                        }
                    ]
                }
            }
        }
    }
}'