R&D Project: Make Unreal Engine’s Metahuman Characters Respond And Act Using AI Including Language Models

The objective of the project was to make Unreal Engine’s Metahuman characters respond and act using AI, including language models.

This would create AI-driven non-player characters (NPCs) in games or characters in 3D applications, including augmented and virtual reality, who can offer new interactions each time they’re spoken to.

These characters were equipped with specific information and personalities we assign, enabling them to act and engage in conversations based on these characteristics.

  • Objective: Utilize AI, including language models, to animate Unreal Engine’s Metahuman characters.
  • Application: Create AI-driven NPCs for games and characters for 3D applications across AR and VR, enabling unique interactions upon each engagement.
  • Character Design: Assign specific information and personalities to these characters.
  • Interaction: Characters will converse and act based on the traits and knowledge provided to them.

InWorld AI

InWorld AI a third-party tool, for enhancing player interaction through AI NPCs. This tool represents a significant evolution in player engagement, offering interactive, realistic NPCs that enrich the gaming experience. By integrating the Inworld Character Engine with Unreal Engine 5, creators can craft custom AIs tailored to their game’s requirements and the unique traits of each AI character.

In the Inworld AI studio, we had the flexibility to create an AI avatar without being limited to predefined models; it’s fully compatible with any Metahuman avatar for your projects. We can define the character’s core traits, including their flaws and motivations. The studio allows for the selection of various dialogue styles, ranging from formal to entertaining, and even lets us pick a preferred voice for the character, enabling a highly personalized and engaging AI creation process.

Once the AI character is configured, it can be integrated in Metahuman within the Unreal Engine scene. This allows for interactive conversations with your Metahuman, who will react and respond based on their designed personality and knowledge base.

Pros:

  • Quick and straightforward development process.
  • Extensive customization options for characters.
  • High-quality animations synced with audio.
  • No need to manage the backend language model; focus on AI personality.

Cons:

  • Dependency on a third-party service.
  • Potential data privacy concerns.
  • Costs associated with commercial use or exceeding a certain number of conversations.
  • Advisable to discuss data handling and pricing directly with the provider.

MetahumanSDK offers a lip-sync plugin for Unreal Engine 5, enabling multilingual, realistic lip synchronization animations from audio and text inputs. It supports a range of facial expressions and rigs compatible with Live Link Face (ARKit, FACTS), alongside voice synthesis connectivity modules. The plugin includes demo projects featuring ChatGPT integration for enhanced interactivity. For further details and examples, the official documentation and example levels within the plugin are great resources. Find the documentation here.

Pros:

  • Integrates with Chat-GPT or any LLM for generating text.
  • Converts generated text into voice, lip-sync and facial animations.

Cons:

  • Not free, even for testing purposes.
  • Animation quality is inferior compared to “Inworld AI”.
  • Generated voice sounds computerized and robotic.

NVIDIA Omniverse’s Audio2Face

NVIDIA Omniverse’s Audio2Face beta simplifies animating 3D characters’ facial expressions to align with any voice track, suitable for games, films, and more. It utilizes the Universal Scene Description for both interactive and traditional animation tasks.

Starting is easy with a preloaded character model, “Digital Mark,” animated by uploading an audio file. The audio drives facial animations through a Deep Neural Network in real-time, with options to fine-tune the performance through post-processing parameters.

The showcased results are primarily direct from Audio2Face, with minimal adjustments.

We can generate text using any LLM model, convert that text to speech with a suitable tool, and then input this audio into the Audio2Face package for lip-sync animation.

This animation can be integrated into Unreal Engine using Live Link for realistic character interactions.

Conclusion

Given the current landscape, venturing into creating a bespoke tool similar to Inworld AI requires substantial investment in time, financial resources, and specialized knowledge. Thus, engaging with existing third-party solutions presents a more feasible approach.

Establishing a dialogue with these service providers to agree on critical aspects such as data storage, localization, and pricing becomes imperative.

While the Metahuman SDK may not offer the highest quality of animation, and Audio2Face lacks the capability for real-time generation of audio and animation, the field is ripe with ongoing research into open-source alternatives that may soon provide viable options.

As we navigate through the infancy of this technology, it’s reasonable to expect the emergence of more sophisticated and accessible solutions in the near future, broadening the horizon for innovative applications and developments.

Contact us

Are you looking to create an AI training module that uses avatars for effective learning and engagement? Then feel free to drop us an email at info@sbanimation.com, or give us a call on +44 (0)207 148 0526. We would be happy to help.

If you are unfamiliar with the production process, this blog post What to expect when you work with Sliced Bread might serve to help you, it provides a complete guide on how we typically approach our projects, from concept to final delivery. It also provides information on how we structure our fees and plan the production schedule.

We also post industry related content to our LinkedIn company page, why not give us a follow?

Recent Posts