Alter 3: Your NLP-Driven Robot Assistant

type

status

date

slug

summary

Key Features of Alter3

Natural Language to Action Mapping

Alter3 is able to directly convert natural language commands into robot actions. Users can control the robot to perform various tasks through simple language instructions.

Action Planning Based on the “Agent Framework”

Alter3 uses the GPT-4 model as the background to plan the steps for the robot to perform tasks through the “agent framework”. The model first acts as a planner to determine the required action steps, and then the coding agent generates specific robot commands.

Contextual Learning and API Adaptation

GPT-4 uses its contextual learning capabilities to adapt and map the robot’s API commands. By providing a list of commands and usage examples, the model can convert task steps into API commands and send them to the robot for execution.

Human Feedback Support

Alter3 is able to receive human feedback and adjust its actions accordingly. For example, when the user instructs the robot to “raise your arm”, the feedback information is processed by another GPT-4 agent, which adjusts the corresponding code and updates the action sequence.

Multi-tasking

Alter3 can perform a variety of complex tasks, such as taking selfies, drinking tea, imitating actions such as ghosts or snakes, and excels in scenes that require fine motor planning.

Emotional Expression and Imitation

GPT-4’s extensive knowledge base enables it to infer and express emotions. Alter3 is able to reflect emotions, such as embarrassment or joy, through body movements, enhancing the realism of human-computer interactions.

Alter3 can express emotions through facial expressions and body movements. Even if there is no clear emotional expression in the text, GPT-4 can infer the appropriate emotion and reflect it in the robot’s actions.

Showing surprise and delight when hearing a funny story.

Zero-shot Learning

No pre-training required: Alter3 can generate new actions based on language instructions without pre-training, which means that no programming or training is required for each new action.

Use existing data: Leverage GPT-4’s extensive training dataset, which contains a large number of action descriptions and supports the robot to generate a variety of actions.

Wide Range of Usage

In addition to performing daily tasks, Alter3 also has broad application potential in fields that require advanced motion planning and emotional expression, such as entertainment and customer service.

🔥

Tired of Downtime? RackNerd: Your Uptime Guarantee

Technical details of Alter 3

Natural Language Processing and Mapping

Integrate a large language model: Use GPT-4 as the core language model and integrate it into the Alter3 robot.

Language to action mapping: Alter3 uses the GPT-4 model to process natural language commands and map them to specific robot actions. Through a large language model, the robot is able to understand and execute complex language instructions.

Language to action mapping of Alter 3 — **Language to action mapping of** Alter 3

Agent Framework

Description: Adopts an “agent framework” for action planning. The framework is divided into two stages: first, the planning stage, in which GPT-4 determines the steps required to perform the task; then, the encoding stage, which generates specific API commands.

Planner: In the first stage, the GPT-4 model acts as a planner, analyzing natural language instructions and developing a detailed action plan.

Coding Agent: In the second phase, the coding agent is responsible for converting action plans into API commands for the robot.

Action generation protocol

Natural language protocol : Use natural language protocols (such as Chain of Thought, CoT) to generate Python code to control the robot, thereby achieving action generation.

Diversity generation : Since GPT-4 is non-deterministic, even the same input can generate different action patterns, increasing the diversity of action generation.

Language Feedback System

Instant adjustment: Users can make instant adjustments to actions through verbal instructions (such as “raise your hands higher”), and the robot will modify the action code based on the feedback.

Action Storage: Improved actions are stored in a JSON database with descriptive tags (e.g. “holding guitar”) for easy future retrieval and use.

Contextual Learning

Description: GPT-4 uses contextual learning capabilities to adapt to the bot’s API. By providing a list of commands and examples, the model is able to map action steps to API commands.

Examples and command lists: Include example commands and explanations in context to help the model generate accurate API commands.

Human feedback and adjustments

Description: Supports human feedback, allowing users to fine-tune the robot’s actions. The feedback is processed by another GPT-4 agent, which adjusts the action code and updates the execution sequence.

Feedback processing: User feedback such as “raise arms” is processed and turned into code adjustments and stored in a database for future use.

Emotional expression

Description: GPT-4’s knowledge base supports emotion inference and expression. Alter3 is able to reflect emotions, such as embarrassment and joy, through body movements.

Sentiment Inference: Even in text without explicit sentiment, the model can infer the appropriate sentiment and reflect it in the robot’s physical response.

Multitasking and apps

Description: Alter3 can perform a variety of tasks, such as taking selfies, drinking tea, and imitating the movements of ghosts or snakes, demonstrating its potential for application in daily tasks and complex scenarios.

Example tasks: Models are experimentally tested to perform various tasks, such as taking selfies and imitating actions.

External Storage and Memory

External Memory Integration: Through the verbal feedback system, Alter3 is able to store action improvement information in external memory, which is referenced when generating actions in the future.

Body Schema: This external memory effectively acts as Alter3’s body schema, allowing it to continually learn and improve its performance.

Data and model training

Description: GPT-4’s training includes extensive language representation and action descriptions, supporting its application in robot control. The knowledge base of the underlying model provides rich background knowledge, improving the robot’s task execution ability.

Basic Challenges: Although the model performs well at high-level planning, the robot still faces challenges in performing basic tasks such as grasping objects, maintaining balance, and moving.

Evaluation and Results of Alter3

Action Generation Evaluation

Evaluation method: Nine different generated actions were displayed using videos, and participants rated the expressiveness of the robot’s actions by watching the videos.

Scoring criteria : A 5-point scale is used, with 1 being the worst and 5 being the best.

Participant recruitment

Recruitment Platform: 107 participants were recruited through the Prolific platform.

Participant task: Participants watched the video and rated the expressiveness of the movements.

Video Category

Instant gestures: including daily and imitation actions such as taking selfies, drinking tea, pretending to be a ghost, pretending to be a snake, etc.

Sustained Action Situations: Includes complex situations like eating someone else’s popcorn at the cinema or feeling the emotions of an old survival story while running in the park.

Control group setting

Random Actions: Randomly generated actions are used as a control group, and the action labels are generated by GPT-4.

Control videos: Three random action control videos were inserted into the videos that participants watched.

Statistical Analysis

Friedman test: used to compare the significant differences between different video scores. The results showed that there were significant differences between the video scores.

Nemenyi test: Further analysis showed that the control group videos had a significant difference in ratings compared to the other videos (p-value less than or equal to 0.001).

Summary of results

Scoring results: The action scores generated by GPT-4 are significantly higher than those of the random action control group, indicating that the actions generated by GPT-4 have more advantages in expressiveness.

Action diversity: GPT-4 is able to generate diverse actions ranging from everyday actions to complex scenarios, and can express emotions such as embarrassment and happiness.

Emotional expression: Through GPT-4, Alter3 is able to understand and reflect the emotions in the conversation content, and even if the emotions are not explicitly expressed, they can be inferred and reflected in the actions.

Summary of Main Assessment Findings of Alter3

By integrating GPT-4 into the Alter3 robot, zero-sample learning, spontaneous action generation, and language feedback optimization were achieved, significantly improving the robot’s expressiveness and naturalness. Alter3 can express emotions and generate a variety of actions, demonstrating the great potential and broad application prospects of large language models in robotics. This research not only provides new ideas for the development of robotics, but also lays the foundation for more natural and humane human-computer interaction in the future.

In conclusion

Zero-shot learning capability

No pre-training required : Alter3 is able to generate natural and diverse actions through GPT-4 without specific programming or training. This shows that the training dataset of GPT-4 already contains rich action descriptions, which supports the robot to directly generate complex actions.

Language feedback optimization

Instant Adjustment : Through the language feedback system, users can instantly adjust and improve Alter3’s movements. This feedback mechanism allows the robot to continuously learn and optimize its movement performance, enhancing the interactive experience with humans.

Emotional expression

Rich emotional expression: Alter3 is not only able to imitate daily human actions, but also express emotions through facial expressions and body movements. Whether expressing emotions directly or inferring emotions from context, Alter3 can accurately reflect emotional states through GPT-4.

Wide application potential

Universality: The system can be applied to any humanoid robot with only minor modifications. This universality gives it broad potential in various robotic applications.

The study found

Significant advantages: Evaluation results show that the actions generated by GPT-4 are significantly better than randomly generated actions in terms of expressiveness and naturalness, demonstrating the powerful capabilities of large language models in robot action generation.

Complex scenario simulation: Alter3 is able to generate a variety of actions from simple daily actions to complex scenario simulations, demonstrating its broad application prospects in robot control and human-computer interaction.

💡

Limited Resources Holding You Back? Unlock your potential