gRPC-Pattern: NLP framework as microservice
Chema Dec. 4, 2018
Pattern is a great framework for natural language processing (NLP) for Python. It supports many languages out-of-box. grpc-pattern is a Docker container that allows access to its services through the gRPC API.
Pattern, a good multilingual NLP framework
Natural Language Processing (NLP) is one of the areas with more research today. Any machine learning project where text is processed, some NLP library is needed. Python is at the top of the list of programming languages for ML. There are several libraries for this, but undoubtedly the reference is NLTK.
NLTK has a multitude of utilities and it allows you to use any language. Although if you do not understand English, it is necessary to build the language itself.
This is where the main quality of Pattern comes in, the out-of-box language support. After installing it easily with "pip" is available to process text in English, Spanish, French, Italian, German and Dutch.
Pattern's main task is the syntactic and morphological analysis of any text (e.g. for each word, identify Part-of-speech, components, etc.).
from pattern.en import parsetree
parsed = parsetree('The cat is over the roof. And where is the dog?')
parsed
[Sentence('The/DT/B-NP/O cat/NN/I-NP/O is/VBZ/B-VP/O over/IN/B-PP/B-PNP the/DT/B-NP/I-PNP roof/NN/I-NP/I-PNP ././O/O'), Sentence('And/CC/O/O where/WRB/O/O is/VBZ/B-VP/O the/DT/B-NP/O dog/NN/I-NP/O ?/./O/O')]
gRPC, the glue for microservices
There is a new more efficient and effective way to create and consume APIs: gRPC.
Powered by Google, gRPC brings the classic RPC technology to modern times.
Using Protocol buffers under the hood, gRPC eliminates the communication overhead that XML or JSON provides.
It also incorporates the main advantage and difference of RPC. A specification is written in protocol buffer format, so many generators get the stub code ready to implement. You just have to implement the functions to serve.
It generates the code of both client and server for a multitude of languages: Python, Java, Go, Java, C++, Node.js, Ruby, Android, ...
The result, NLP in your project easy as pie
We develop machine learning projects where natural language processing is necessary and we already knew Pattern of other projects.
In many of them we apply the architecture of microservices where different languages are mixed, mainly Python and Go, architectures, databases, etc.
We needed to have Pattern as microservice.
The best way we have seen is through a container that is easily integrated into a project through docker-compose.
This is what we have done.
For use it, just run:
docker run -p 50051:50051 digitalilusion/grpc-pattern
Now you can use the service with your favorite language.
In the repository you can find "api.proto", the gRPC specification of API. With this file, you can generate the stubs for your favorite language.
Access from the shell
If you want to try the service quickly, you can use the grpcc CLI tool.
It can be installed by npm if you don't have it.
npm i -g grpcc
Now you just need to run it to get a REPL shell and test it.
grpcc -i -p api.proto -a localhost:50051
Using the REPL interface, we parse a example text
client.parse({'language': 1, 'text': 'The boy is very happy'}, pr)
The specific language module is lazy loaded. You need to wait for several seconds first time. Later it's very fast.
{
"isOk": true,
"reason": "",
"sentences": [
{
"words": [
{
"text": "very",
"type": "RB"
},
{
"text": "happy",
"type": "JJ"
}
],
"chunks": [
{
"type": "NP",
"words": [
{
"text": "The",
"type": "DT"
},
{
"text": "boy",
"type": "NN"
}
]
},
{
"type": "VP",
"words": [
{
"text": "is",
"type": "VBZ"
}
]
},
{
"type": "ADJP",
"words": [
{
"text": "very",
"type": "RB"
},
{
"text": "happy",
"type": "JJ"
}
]
}
]
}
]
}
Do not miss anything!
Subscribe to our mailing list and stay informed