NAO feat. ChatGPT

 NAO feat. ChatGPT
November 30, 2023
We had the opportunity to present our products and services at the traditional SPS trade fair in Nuremberg. One of the most popular presentations was the smart robot NAO that interacted with passing visitors. This blogpost uncovers some technical aspects of this solution.

Why NAO robot?

or several years, Sabo has been interested in the use of artificial intelligence and machine learning to support software solutions in industry. We invented and are constantly developing our product Sabot (Smart Assistant Bot), our Intelligent Voice Assistant (IVA) solution for controlling machines like the MIWE bakery ovens. The user can then not only control the device with his voice commands, but also use the intelligent capabilities of the assistant itself.

For the purpose of the trade show, we wanted to wrap the voice user interface in an attractive package, and the cute NAO robot seemed like an ideal candidate.

NAO could you help us

There were three main areas of operating the NAO:

  1. The Idle mode was focused on interactions with visitors in terms of some ice-breaking activities.
  2. The Service mode was intended to provide fluent conversation to order a refreshment for visitors.
  3. The Sabot mode was a demonstration of controlling a Miwe bakery.

Following paragraphs describes our experience with implementation of the Service mode.

Hey barman, I’ll have an espresso!

In the Service mode the NAO robot acted as a barman. A visitor could put questions, let the robot explain its offer, added or removed items and submitted the order. The robot was able to speak in English and German.

The texts below are transcripts. Conversations took place in the form of natural speech.

For more curious visitors …

Robot:  How can I assist you with your order?
Visitor: What refreshments do you offer?
Robot: We offer hot drinks, cold drinks, and snacks. How can I assist you with your order?
Visitor: Hot drinks.
Robot: Here are the hot drinks we offer: Caffé Crema, Espresso, Cappuccino, Latte Macchiato, Hot Chocolate, Tea.
Visitor: OK. Please give me an espresso.
Robot: Sure! I've added an Espresso to your order. Would you like a single or double shot?
Visitor: Single with milk.
Robot: Got it! I've added a single shot Espresso with milk to your order. Is there anything else you would like to add?
Visitor: And something to eat please.
Robot: Certainly! We have a variety of snacks available. Here are some options: Swiss Platter, Sweet Snack, Savory Snack, Pretzel, Croissant.
Visitor: Two croissants.
Robot: Great choice! I've added 2 Croissants to your order. Is there anything else you would like to add?
Visitor: No thanks.
Robot: You're welcome! Your order has been submitted. If you need any further assistance, feel free to ask.

… and for busy visitors

Visitor: A single espresso with a milk and two croissants.
Robot: Sure! I've added a single shot Espresso with milk and 2 Croissants to your order. Is there anything else you would like to add or modify?

After collecting and submitting an order, the order was visually published via a web application on a near large display. So the visitor saw their ordering and waited for a minute to prepare the real refreshment.

NAO’s intestines

The NAO robot is equipped with some sensors and I/O devices. There is a graphical IDE named the NAO Choregraphe supporting NAOqi graphical language. The language can build scenarios consisting of multiple steps. A step could contain code written in Python language and using NAO SDK.

NAO Audio Provider is a component hosted directly on NAO robot providing audio from its microphone.

NAO Audio Source Service is a dedicated Python application hosted on a computer that is interconnected to the NAO robot. Its responsibility is just routing audio data fragments from NAO microphone via a socket to NAO Backend Service.

NAO Backend Service is a .Net 6 service orchestrating these main sub-tasks:

  • Receiving NAO audio fragments.
  • Processing Speech-To-Text recognition logic by using Azure Cognitive Services. We use continuous speech recognition for performance reasons. This approach also splits speech into sentences by talk pauses automatically.
  • Routing the recognized text to the respective currently activated text processor. There are two - for the Sabot mode and for the Service mode. In the following text, we are interested in the text processor for the Service mode.
  • Handling the textual response from the active text processor.
  • Providing the textual responsea via a pulling mechanism from the NAO robot.

NAO Text Response Handler is a Python component hosted directly on the NAO robot. It constantly checks for a textual response on NAO Backend Service. In case there is any response, the NAO robot passes the text into its built-in Text-To-Speech processor.

NAO Order Visualizer is a web application that listens to a socket on NAO Backend Service. Once a new order appears, the order is shown as a visual confirmation for the visitor.

Who is hiding in the background? ChatGPT!

We integrated a solution for ordering based on ChatGPT-3.5 from OpenAI provided as the Azure OpenAI Service. The integration could be logically divided into a few sub-tasks:

  1. Designing and tuning ChatGPT prompt.
  2. Defining and handling ChatGPT functions.
  3. Orchestration the chatting.

ChatGPT Prompt

I am not surprised that ChatGPT prompt engineering has become a standalone field in software development. To tune the behavior of a chat bot is not a trivial task. A changed word in the prompt could impact the behavior significantly. We experimented with various versions and wording till we got a solution passing our expectations.

You can check fragments from our barman prompt:

ChatGPT Functions

Using ChatGPT functions is a powerful feature provided by some language models starting from gpt-3.5-turbo-0613. I cannot imagine achieving the ordering functionality without using the functions.

We defined three functions named action_ItemAddedIntoOrder, action_ItemRemovedFromOrder and action_OrderSubmitted. A function’s name, descriptions, parameters, everything matters and could impact behavior of the bot.

The function for adding an item into the order was defined:

Name: action_ItemAddedIntoOrder

Description: A product item added to the order.

Parameters [name, description]:

[Position, Position of the product item in the order starting from 1.]
[Quantity, Quantity of the product. Default is 1.]
[Product, The product item's name added to the order.]

ChatGPT can detect the point in conversation that is handled by invoking the method. E. g.

Visitor: A single espresso with a milk.

is interpreted as a calling:

In the handling of the function’s arguments, it is necessary to do some validation logic. The returning response must be adjusted according to required behavior.

And what helped us much was a mention the function name in the prompt for some specific cases. E. g. this rule helped us to properly handle adding the next item of the same product within the order:

- If there is anything added into the order even if the added item is similar to the previous one, call `action_ItemAddedIntoOrder` again.

Orchestration Chatting

The main component abstracting the order bot logic was designed like that:

The implementation of the abstraction was built from the orchestrating class OrderBotController and several sub-components responsible for partial business logic.

There were more than 60 integration tests for OrderBotController. ChatGPT responding is typical for its nondeterministic character. It is hard to assert exact assistant responses by the tests. However by introducing the ChatGPT functions, you can check their invocations including their parameters. The tests bring very quick and valuable feedback to any change in the function handling as well as in the prompt definition. By using this approach, the tuning was more safety and efficient.

Thank you NAO

The NAO robot was accepted in a very positive way. It was the cutest barman at the fair.

We developed the first version. It was a very exciting journey and experience. We also collected many ideas and suggestions on how to improve our solution for the next version. And we are going to benefit from the given experiences. The technologies related to ChatGPT are evolving very quickly, so new features occur often. And projects like that bring us better understanding of the potential and constraints of related technologies.

Microsoft Technologies Department Manager

Other articles by same author

Article collaborators

SABO Newsletter icon


Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

About SABO Mobile IT

We focus on developing specialized software for our customers in the automotive, supplier, medical and high-tech industries in Germany and other European countries. We connect systems, data and users and generate added value for our customers with products that are intuitive to use.
Learn more about sabo