gRPC-Pattern: NLP framework as microservice

Chema Dec. 4, 2018

docker nlp grpc microservice nltk

Pattern is a great framework for natural language processing (NLP) for Python. Supports many languages out-of-box. grpc-pattern is a Docker container that allows access to its services through the gRPC API.

Pattern, a good multilingual NLP framework

Natural Language Processing (NLP) is one of the areas with more research today. Any machine learning project where text is processed, some NLP library is needed. Python tops the list of programming languages for ML. There are several libraries for this, but undoubtedly the reference is NLTK. 

NLTK has a multitude of utilities and allows you to use any language. But that if outside of English, it is necessary to build the language itself.

This is where the main quality of Pattern comes in, the out-of-box language support. After installing it easily with "pip" is available to process text in English, Spanish, French, Italian, German and Dutch.

Pattern's main task is the syntactic and morphological analysis of any text. For each word, identify Part-of-speech, components, etc.

	
	from pattern.en import parsetree
parsed = parsetree('The cat is over the roof. And where is the dog?')
parsed
[Sentence('The/DT/B-NP/O cat/NN/I-NP/O is/VBZ/B-VP/O over/IN/B-PP/B-PNP the/DT/B-NP/I-PNP roof/NN/I-NP/I-PNP ././O/O'), Sentence('And/CC/O/O where/WRB/O/O is/VBZ/B-VP/O the/DT/B-NP/O dog/NN/I-NP/O ?/./O/O')]

gRPC, the glue for microservices

There is a new, more efficient and effective way to create and consume APIs: gRPC.
Powered by Google, gRPC brings the classic RPC technology to modern times.

Using Protocol buffers under the hood, gRPC eliminates the communication overhead that XML or JSON provides.

It also incorporates the main advantage and difference of RPC. A specification is written in protocol buffer format and then many generators get the stub code ready to implement. You just have to implement the functions to serve.

It generates the code of both client and server for a multitude of languages: Python, Java, Go, Java, C++, Node.js, Ruby, Android, ...

The result, NLP in your project easy as pie

We develop machine learning projects where natural language processing is necessary and we already knew Pattern of other projects.

In many of them we apply the architecture of microservices where different languages are mixed, mainly Python and Go, architectures, databases, etc.

We needed to have Pattern as microservice.

The best way we have seen is through a container that is easily integrated into a project through docker-compose.

This is what we have done.

For use it, just run:

	
	docker run -p 50051:50051 digitalilusion/grpc-pattern

Now you can consume the service with your favorite language. 

In the repository you can find "api.proto", the gRPC specification of API. With this file, you can generate the stubs for your favorite language.

Access from the shell

If you want to try the service quickly, you can use the grpcc CLI tool. 

Install by npm if you don't have it.

	
	npm i -g grpcc

Now you just need to run it to get a REPL shell and test it.

	
	grpcc -i -p api.proto -a localhost:50051

Using the REPL interface,  we parse a example text

	
	client.parse({'language': 1, 'text': 'The boy is very happy'}, pr)

The specific language module is lazy loaded. You need to wait for several seconds first time. Later it's very fast.

	
	{
  "isOk": true,
  "reason": "",
  "sentences": [
    {
      "words": [
        {
          "text": "very",
          "type": "RB"
        },
        {
          "text": "happy",
          "type": "JJ"
        }
      ],
      "chunks": [
        {
          "type": "NP",
          "words": [
            {
              "text": "The",
              "type": "DT"
            },
            {
              "text": "boy",
              "type": "NN"
            }
          ]
        },
        {
          "type": "VP",
          "words": [
            {
              "text": "is",
              "type": "VBZ"
            }
          ]
        },
        {
          "type": "ADJP",
          "words": [
            {
              "text": "very",
              "type": "RB"
            },
            {
              "text": "happy",
              "type": "JJ"
            }
          ]
        }
      ]
    }
  ]
}

grpc Pattern demo with grpcc

 

This website uses cookies

The cookies on this website are used to personalize content and analyze traffic. In addition, we share information about your use of the website with web analytics partners, who may combine it with other information you have provided or collected from your use of their services. You accept our cookies if you continue to use our website.
 

I agree See cookies

This website uses cookies

The cookies on this website are used to personalize content and analyze traffic. In addition, we share information about your use of the website with web analytics partners, who may combine it with other information you have provided or collected from your use of their services. You accept our cookies if you continue to use our website.