VideoDoodles: Animate Your Videos with Hand-Drawn Sketches

type

status

date

slug

summary

What is VideoDoodles good for?

Make the video more interesting: You can use it to add creative hand-drawn content to the video, such as drawing a funny little man or a beautiful rainbow to make the video more attractive.

Better explain the content: For example, in an instructional video, you can use hand-drawn animation to highlight the key points and make it easier for the audience to understand what you want to express.

Easy to use: Whether you are a novice or a professional, you can quickly get started and use it to create professional animation effects.

System Advantages of VideoDoodles

Simplified creation of 3D effects : The system combines 2D drawing with 3D computer vision technology, making it easy for even users without 3D modeling experience to create 3D animations with perspective and occlusion effects.

Friendly to newbies : One of the design goals of the system is to reduce the learning curve, so that users with no animation experience can quickly get started and create complex animation effects.

Balance between user control and automation : While providing powerful automation capabilities, the system also allows users to precisely control the effects of animations through keyframes to achieve their creative intent.

Key Features of VideoDoodles

3D Flat Canvas

The system allows users to place 3D flat canvases in the video scene and perform hand-drawn animations on these canvases. These canvases are positioned according to the 3D scene of the video. Users can operate them through a simple 2D interface, and the system automatically handles the position and orientation of the canvas in 3D space. In other words, these canvases will automatically adjust according to the perspective and object movement in the video to ensure the correct position and angle of the animation.

Support for dynamic and static canvases:

VideoDoodles supports placing static canvases (fixed in one position) and dynamic canvases (following moving objects) in the scene. Users can choose different types of canvases according to their needs to achieve various animation effects.

Automatic tracking and perspective correction

The system has powerful 3D tracking capabilities that allow the canvas to align with static or dynamic objects in the video. Whether the camera moves or the object moves, the system will automatically adjust the perspective of the canvas, making the animation look as natural as if it is embedded in the video scene.

Allows users to anchor the canvas to static or dynamic objects in the video, allowing the animation to follow the movement and rotation of the object, maintaining a natural and realistic effect.

Occlusion processing

VideoDoodles can automatically handle occlusion effects. When objects in the video block the animation, the system will intelligently handle the occlusion to ensure that the animation is visually consistent with the video content.

Multiple 3D motion effects: Through 3D tracking and rotation control of the canvas, the system can achieve a variety of complex 3D animation effects, such as object rotation, dynamic scaling, etc., making the animation effects more expressive.

Fast creation and efficient output

Get started quickly: Users can create complex animation effects in a short time without a complicated learning process, which is suitable for rapid content creation on social media.

Efficient output: Whether it is a static or dynamic canvas, the system can quickly generate and render the final animation effect, meeting the needs of professional users for efficient creation.

Easy-to-use 2D drawing interface

Simplified 2D drawing interface: Users can draw directly on the 2D interface without worrying about perspective distortion. The system will automatically embed the drawing content into the 3D scene in the video and generate the correct perspective effect.

Balance between automation and manual control: The system automatically completes tracking and perspective correction in most cases, and the user only needs to provide a small amount of keyframe input. If necessary, the user can also manually adjust the position and direction of the canvas for more precise animation control. The user can control the movement trajectory and direction of the canvas by setting keyframes. If the automatic tracking does not meet the requirements, the user can manually adjust the keyframes to further refine the animation effect.

Professional function expansion

Although the system is optimized for beginners, it still retains some features suitable for professional users:

Precise keyframe control: The system allows users to precisely control the movement trajectory and direction of the canvas through keyframes, thereby achieving more complex animation effects.

Integration with existing tools: The system can be used with existing video editing tools to provide users with more creative freedom and possibilities.

VideoDoodles: Integration with existing tools — VideoDoodles: **Integration with existing tools**

Technical Methods of VideoDoodles

1. 3D scene reconstruction and preprocessing

The system first constructs a rough 3D model of the scene by extracting depth information and motion data from the video. This reconstruction allows the system to understand the relative positions and directions of movement of objects in the video.

First, the input video is reconstructed in 3D, which includes calculating the camera pose, depth map and optical flow of each frame. This information is obtained through the latest computer vision technology and can provide accurate 3D geometric information for subsequent animation embedding and tracking.

We use Robust Consistent Video Depth EstimationCOLMAP

a method to generate consistent depth reconstructions of 3D scenes. This method first uses tools to estimate the camera pose, then combines it with a depth prediction network to compute single-frame depth maps, and finally aligns these depth maps to a consistent world coordinate system through geometric optimization.

2. 3D tracking algorithm for flat canvas

Users can place flat canvases in the 3D scene, which are equivalent to the "drawing board" for drawing hand-drawn animations in the video. The system supports placing these canvases on static or dynamic objects and automatically adjusts them according to the depth and perspective effects of the scene.

Keyframe-driven 3D tracking: The user can place a flat canvas in a certain frame of the video and set its position and orientation. The system will automatically deduce the 3D trajectory of the canvas in other frames of the video. To achieve this goal, the system will use the keyframes specified by the user as hard constraints, and use the scene motion and depth information generated in the preprocessing stage to calculate the 3D position of the canvas in each frame through an optimization algorithm.

3D trajectory optimization based on scene flow: The system uses scene flow estimation to preliminarily determine the 3D trajectory of the canvas, and then further stabilizes and refines the trajectory using the Poisson integration method to reduce jitter and drift.

VideoDoodles: 3D trajectory optimization based on scene flow — VideoDoodles: **3D trajectory optimization based on scene flow**

3. Orientation tracking of dynamic canvas

VideoDoodles uses a custom tracking algorithm to associate the canvas with the object in the video. This algorithm can track the movement and rotation of the object, allowing the canvas to update its position and angle in real time to ensure that the animation is consistent with the video content. This tracking includes not only simple translation, but also supports complex rotation and scaling.

The system supports dynamic canvas, that is, the canvas can change as the object in the video moves. To achieve this function, the system will optimize the rotation matrix of the canvas according to the canvas direction keyframe set by the user and the direction of movement in the scene, so that it is consistent with the movement trajectory of the object.

VideoDoodles leverages existing computer vision techniques, such as depth estimation and object tracking, to obtain scene information in videos. This information is used to generate accurate 3D reconstruction and scene perception, thus enabling more accurate animation placement and tracking.

Posture Optimization: The system uses a manifold-based trust region algorithm (Riemannian Trust Region) to optimize the rotation matrix so that the posture of the canvas can smoothly follow the direction of movement in the scene. At the same time, users can fine-tune the posture by setting key frames.

4. Occlusion processing

The system can identify the occlusion relationship between different objects in the video and automatically handle the occlusion effect of hand-drawn animation. This means that the animation on the canvas will be occluded by the foreground object when necessary, enhancing the realism of the animation and video scene.

5. User interaction and 2D drawing interface

VideoDoodles provides an intuitive 2D interface where users can directly manipulate the canvas in the 3D scene. By simply dragging and rotating, users can set the position and angle of the animation. In addition, the system also supports real-time preview, allowing users to see the animation effect immediately.

Although the system performs complex 3D calculations in the background, users only need to draw through a simple 2D interface on the front end. The system aligns the canvas with the video content and displays perspective correction and occlusion effects in real time during the drawing process. Users can directly draw frame-by-frame animations in this interface without having to consider complex 3D transformations.

Automatic perspective correction: When users draw on the 2D interface, the system will automatically correct the content on the canvas to be consistent with the perspective of the video scene.

6. Keyframe-based user control

Users can set the position and orientation keyframes of the canvas at different frames of the video. The system will automatically interpolate smoothly between these keyframes to generate a continuous animation effect. This method greatly reduces the workload of manual adjustment and improves the efficiency of animation production.

The system allows users to control the 3D trajectory and rotation of animations through a small number of keyframe settings, thereby achieving complex animation effects. Users can freely adjust the position, direction and depth of the canvas between keyframes, and the system will automatically calculate the smooth transition between these keyframes.

System Workflow of VideoDoodles

Input video processing

The system first receives an input video and preprocesses it. This step includes calculating the camera pose of each frame, generating a depth map, and calculating optical flow. These preprocessing steps provide the necessary geometric information for subsequent 3D tracking and canvas embedding.

Canvas placement and configuration

Users can place static or dynamic canvases in any video frame. Placing a static canvas only requires the user to drag and rotate the canvas, while placing a dynamic canvas requires the user to specify the position and orientation of the canvas in one or more keyframes. The system automatically calculates the 3D trajectory and rotation of the canvas based on the user's input.

Drawing on canvas

Users draw animations on the canvas through the 2D drawing interface provided by the system. To simplify the user's drawing work, the system displays the canvas as an orthogonal, front-view parallel view, so that users do not need to consider perspective distortion. The completed animation is automatically embedded in the 3D scene of the video, and perspective and occlusion are processed according to the video content.

Keyframe and trajectory optimization

For dynamic canvas, the system calculates the 3D motion trajectory of the canvas through the keyframes specified by the user. The system uses the shortest path algorithm to calculate the preliminary motion trajectory, and then optimizes the trajectory through Poisson integral to make it smoother and more stable in 3D space. Users can adjust or add keyframes at any time to precisely control the movement of the canvas.

Animation generation and output

When the user has finished drawing and adjusting, the system will combine the animation of the canvas with the video content and generate the final video output. The system handles all occlusion and perspective deformation to ensure that the output animation is natural and realistic.

User Testing Results of VideoDoodles

The test included 7 participants, 5 of whom were beginners and 2 were professional animators. These participants participated in the test through different devices (such as Wacom Cintiq 16 tablet, mouse or trackpad). The test was divided into two tasks:

Task 1 – Goal-oriented task: Participants need to reproduce a sample video doodle within 10 minutes. The sample contains a static doodle and a dynamic doodle, and the purpose is to let users experience the main functions of the system.

Task 2 – Open-ended task: Participants need to choose two from 15 short video clips and create their own unique video doodle within 30 minutes.

Test Results

Quick start and creative efficiency

When using the system for the first time, novices only took about 10 minutes to complete a medium-complexity graffiti task (the average time was 10 minutes and 30 seconds, with a standard deviation of 3 minutes and 7 seconds).

In the open-ended task, users spent an average of 14 minutes (standard deviation: 7 minutes and 5 seconds) to create video graffiti, which shows the system's support for rapid creation.

Professional users particularly appreciate the system’s speed and simplicity, finding it allows them to achieve satisfying results more quickly than with traditional tools.

Balance between automation and manual control

The system's automated tracking feature significantly reduces the user's workload, allowing users to focus on creativity and drawing itself. On average, participants spent 49% of their time drawing (with a standard deviation of 17%, and minimum/maximum values of 24% and 89% respectively).

Nevertheless, users were still able to precisely control the trajectory and direction of the canvas through keyframes. Participants set an average of 3.3 position keyframes (with a standard deviation of 3.0) and 2.5 direction keyframes (with a standard deviation of 1.8) per dynamic canvas.

Appeal to both beginners and professional users

The beginners demonstrated their ability to quickly master keyframe control and successfully create multi-frame animations. All participants were very satisfied with their creations (4.7 out of 5).

Professional users say that although the system is simpler than traditional tools, it still provides enough control, especially in setting keyframes and adjusting the depth of the canvas.

User feedback and improvement suggestions

Some participants wanted the system to be able to draw directly on the video, rather than being limited to a drawing panel with an orthographic view. Other users wanted to be able to select different brush shapes and textures to enhance the expressiveness of their drawings.

Professional users noted that while the system’s tracking capabilities are already excellent, more advanced tracking options, such as fine-grained pixel-level control in After Effects, are still desirable.

System limitations and user solutions

The system's flat canvas limited the creation of non-flat graffiti, and some participants responded to this limitation by placing multiple canvases or switching between graffiti from different perspectives. For example, a participant placed two orthogonal canvases when depicting a steam cloud.

Project address: https://em-yu.github.io/research/videodoodles

Paper: https://www-sop.inria.fr/reves/Basilic/2023/YBNWKB23/VideoDoodles.pdf

GitHub: https://github.com/adobe-research/VideoDoodles

👍🏼

Power your business with lightning-fast servers