GameGen-O: Create Stunning Open-World Games with Automated Content Now

type

status

date

slug

summary

Key Features of GameGen-O

Open World Generation

Automatically generate various elements in open world video games, including characters, environments, actions, and events. It can simulate multiple functions in the game engine and generate high-quality game content.

Multimodal interactive control

It supports controlling the generation of game content through text instructions, operation signals (such as keyboard operations), video prompts, etc. Players or developers can affect the character's actions, environmental changes, etc. by inputting different instructions.

Data driven generation

We used a large open-world game dataset called OGameData for training, which contains more than 32,000 videos, from which we selected high-quality video clips for model generation and interactive control.

High-quality image generation

It can generate delicate game images, such as environments in different seasons, various weather phenomena, complex character movements, etc., to enhance the visual experience of the game.

Game element generation

This includes the generation of characters (such as Geralt from The Witcher), scenes (such as spring and autumn landscapes), actions (such as riding, flying, etc.), and special events (such as rain, sunrise, storm, etc.).

Technical Methods of GameGen-O

1. Dataset construction (OGameData)

Data collection: More than 32,000 videos were collected from the Internet and game engines. After manual and automatic model screening, about 15,000 high-quality videos were retained, forming a video dataset of more than 4,000 hours.

Data processing: Cut the video into short clips and screen them by optical flow analysis, aesthetic scoring, semantic content and other criteria. Then apply the multimodal model to perform structured annotation on the video to further provide richer training data for the model.

Decoupled annotation: Content changes in the video are accurately annotated, enhancing the model's ability to interactively control different scenes and events.

data pipeline of GameGen-O — data pipeline of **GameGen-O**

2. Basic model pre-training

VAE Compression: The video is compressed using a 2+1D variational autoencoder (VAE, Magvit-v2), and specific adjustments are made in the gaming domain. This enables the model to effectively process videos of different frame rates and resolutions, and lays the foundation for subsequent generation tasks.

Text-to-video generation and continuation: The masked attention mechanism is used to enable the model to generate text to video and continuously generate video content (i.e., generate subsequent plots based on existing video content).

3. Instruction Tuning

InstructNet: Added the InstructNet branch based on pre-training, through multi-modal input (such as text instructions, operation signals and video prompts), it can control the generation of subsequent videos. InstructNet predicts future content based on the current video content and adjusts it according to the input signal.

Multimodal input: InstructNet accepts different input modes, including structured text instructions (such as "move the character to the left"), operation signals (such as "move left" operated by keyboard), and video prompts (such as target objects in the video).

Interactive Control: During the inference phase, GameGen-O receives these inputs, generates and controls new game content, and provides interactivity and continuity.

**Technical Underpinnings of GameGen-O**

4. Generate model architecture

Latte and OpenSora framework: The overall model architecture is based on the Latte and OpenSora V1.2 framework. Latte provides a flexible architecture design for the generative model, while OpenSora is used to optimize the performance of the video generation process.

Masked Attention Mechanism: Used to handle long sequence data in video generation, enabling the model to better understand and generate open-domain game video content.

5. Generative Ability

Character generation: Generate diverse game characters, such as wizards, warriors, robots, etc.

Environment Generation: Ability to generate complex environments with changing seasons and different terrains (such as lakes, oceans, forests, etc.).

Action generation: Generate character actions, such as riding, driving, flying, shooting, etc.

Event generation: Generate dynamic events such as rain, thunder, sunrise, storm, etc.

Some Examples of GameGen-O

Project address and more demos: https://gamegen-o.github.io/

🔥

Expert Support, Always Available: Get help when you need it with RackNerd

racknerd | windows virtual server hosting