Develop our first Alexa Skill: Accessible Pubs

Chema Jan. 22, 2019

nlp skill alexa osm develop lambda aws

Alexa arrived in Spain at the end of 2018. There are already thousands of apps available, called Skills. We did not want to miss the opportunity to experiment with its SDK and create a simple skill. We tell you how we have done it and how the experience has been.

The Skill idea

Given our commitment to accessibility, it will provide accessibility information to its users while creating a new app where the voice will be used as a control and information interface.

As many of you know, we actively work with OpenStreetMap and its ecosystem APIs.

We wanted to do something that addresed both aspects.

After turning it over, we had the app concept: a searcher of accessible pubs/bars.

The procedure is very simple. The user opens the skill and asks to show you accessible bars in your area (if you have geolocation enabled) or the city you want. Alexa recognizes the "intent" (we will see it later) and invokes the corresponding backend webhook. The backend makes a call to the OSM API (Nominatim/Overpass) looking for those bars that are accessible (initially it only shows those that are accessible for wheelchairs / carts). Finally, they present the results ordered by proximity and 3 in 3.

Online SDK

Amazon needs developers. You need to create and 'spoil' an ecosystem of useful Skills for your users. That's why it takes care of every detail, so that, developers are welcome and encouraged to create skills.

Proof of this is that you can create an Alexa Skill using only the web browser. No need to download anything, no containers, or anything. Only from the web console of the Alexa SDK.

Once you register one in the Alexa SDK, the process of creating a Skill is divided into 3 phases:

  • Build: The first step is to define the magic word. That word or words that will launch your skill. It is known as activation skill. Then you have to define the intents. An intent is an action to execute for your skill. An intent must have one or several words for Alexa to recognize it. This is the part where you have to do a good analysis job and use your imagination to add all possible combinations. This is very helpful to improve the usability of your skill. Of course, the expressions of an intent must have high cohesion between them and high independence with respect to the expressions of the other intents. When the intents/expressions model is clear, clicking on "Build" generates a recognition model. It may take several minutes and this model is transparent to us. It is internal to Alexa. Regarding the backend, we have to add the webhook of our backend where Alexa will call with the activated intent or the AWS Lambda endpoint where our code is.
  • Test: Needless to say, you have to test the skill well. The Alexa development console makes easier this work because it allows you to test it directly through the PC microphone or by directly entering text as if it were a chatbot. If you want to try Skills with geolocation and you do not have a speaker with Alexa, you can try your apps in development directly with the Alexa app available for Android and iOS.
  • Distribution: The work involved in publishing a skill in the marketplace is simple. Nothing compared to Google Play or App Store. Mainly you need a good description, icons and links to the privacy policy and terms of use. With the work done and after clicking on distribute the app is in the status of "In Review". The time that was under review was approximately 3 weeks (disclaimer, new year caught by means). If everything is alright, Alexa sends an email when the app is already available to everyone.

We have not gone into details about the process, we think there is good documentation about it on the Internet

We will focus on the aspects of the skill that we have developed.

Intents

Apart from the default intents of Alexa (CancelIntent, YesIntent, NoIntent, HelpIntent, etc) we have created 2 specific ones:

  • GetPlacesSearchIntent: This intent allows you to search for accessible bars in the city that you are told. The city is captured in a slot that we call 'city' and that goes as an argument when the webhook is invoked. The result is a list of accessible bars ordered from the center of the city and paged (from 3 to 3). Some of the expressions used are: "acceso a silla de ruedas en {city}", "locales accesibles en {city}", "pubs en {city}", etc.
  • GetPlacesNearIntent: It is similar to the previous one but this especially uses the geolocation of the user (device) to look around. In this case, the results also add places nearby (some meters away). Some of the expressions used are: "en este barrio", "por aquí cerca", "restaurantes por mi alrededor", etc.

Alexa Console Screenshot

Alexa Console Screenshot where we specify the intents

The backend: AWS Lambda

For the backend we used a Python template of "facts" for AWS Lambda.

This part of the development is the most unpleasant because you have to open another window with AWS Lambda for its development and then go back to the Alexa console to try it.

In AWS Lambda you have to install the dependencies (typically in requirements.txt) as if they were packages of your app (like "vendor" in compose or "node_modules" in npm). A good solution would be to use a "vendor" type directory and exclude it from version control,  and consequently it does not conflict with version control.

It is a somewhat tedious job. We have to use npm to install the packages locally, move them to vendor, compress the folder and upload it to the AWS console.

Regarding the code, we include a simplified snippet of Accessible Bars.

 

 

	
	class GetPlacesSearchHandler(AbstractRequestHandler):
    """Handler for coffee intent."""
    def can_handle(self, handler_input):
        # type: (HandlerInput) -> bool
        return is_intent_name("GetPlacesSearchIntent")(handler_input)

    def handle(self, handler_input):
        # type: (HandlerInput) -> Response
        logger.info("In GetPlacesSearchHandler")
        attribute_manager = handler_input.attributes_manager
        session_attr = attribute_manager.session_attributes
        page = 0
        lastQuery = session_attr.get('lastQuery', None)
        qCity = None
        for slotName, slotValue in handler_input.request_envelope.request.intent.slots.items():
            if slotName == 'city':
                qCity = slotValue.value
                break
        if qCity is None:
            speech = 'Vaya, parece que no he entendido bien la ciudad, ¿puedes repetirlo? gracias'
            handler_input.response_builder.speak(speech).ask(speech).set_card(SimpleCard(SKILL_NAME, speech))
            return handler_input.response_builder.response
        if lastQuery != qCity:
            page = 0
        session_attr['lastQuery'] = qCity
        logger.info("City: {}".format(qCity))
        city = getCity(qCity)
        if city is None:
            speech = 'Vaya, no encuentro ninguna ciudad que se llame {}, ¿puedes probar otra vez?'.format(qCity)
            handler_input.response_builder.speak(speech).ask(speech).set_card(SimpleCard(SKILL_NAME, speech))
            return handler_input.response_builder.response

        # session_attr["restaurant"] = restaurant["name"]
        ret = getResults(*city)
        attribute_manager = handler_input.attributes_manager
        session_attr = attribute_manager.session_attributes
        # session_attr["restaurant"] = restaurant["name"]
        logger.info("Num results: {}".format(len(ret)))
sb = SkillBuilder()
...
sb.add_request_handler(GetPlacesNearHandler())
...
lambda_handler = sb.lambda_handler()

 

 

Testing the skill

The most complicated thing in trying the skill is to know well how Alexa works. If you have never used Alexa, it is recommended to test it beforehand as such, learn how it works, download other skills and play with them.

It can be very frustrating at the beginning of the development if you do not know the basic concept of Alexa as the activation word, the default intents, etc.

Screenshot of Accessible Pubs on Alexa for Android

Testing the skill on Alexa for Android

Conclusions

It has been a great experience. We believe that there is much still to be done in the backend part.

First in the improvement of the development cycle and then in the contextual conversations model. Right now Alexa behaves like "Speech-To-Text with stereos".

It would be interesting for the SDK itself to provide these tools that would allow the development of much more complete and apparently smarter skills. We will surely soon see it.

They also have to improve things like time for Skills review process.

On our side, we will closely monitor the evolution of the Alexa ecosystem and its SDK. Will it be a revolution this year? We will see it.

You can install "Accessible Pubs" from Amazon Alexa Skills Marketplace (Spain)
 
 

Do not miss anything!

Subscribe to our mailing list and stay informed

Accept the terms & privacy conditions

Subscription done! Thanks so much

This website uses cookies

The cookies on this website are used to personalize content and analyze traffic. In addition, we share information about your use of the website with web analytics partners, who may combine it with other information you have provided or collected from your use of their services. You accept our cookies if you continue to use our website.
 

I agree See cookies

This website uses cookies

The cookies on this website are used to personalize content and analyze traffic. In addition, we share information about your use of the website with web analytics partners, who may combine it with other information you have provided or collected from your use of their services. You accept our cookies if you continue to use our website.