Omniverse & Audio2Face

Demo Omniverse File

NVIDIA Omniverse is an open platform built for virtual collaboration and real-time physically accurate simulation. It was released as an open beta in Dec 2020. Designers, animators, creators, and VFX teams connect major design tools, assets, and projects for collaborative iteration in a shared virtual space, all built on the core foundation of USD, Universal Scene Description, originally developed by PIXAR but now being adopted across multiple industries using 3D.

“USD provides a common foundation for describing everything in a 3D scene”, according to Richard Kerris, General Manager of Omniverse and Head of Developer Relations at NVIDIA and a former CTO at Lucasfilm. While the Alembic file format (co-developed by Industrial Light & Magic & Sony Imageworks and launched at Siggraph in 2011) contained most of the file information about a 3D model, with USD, “you get all of the elements of the 3D environment”. “You can think of it as the HTML of 3D,” he adds. “It brings with it, the lighting, the shading, the materials, and so on.” Alembic was a step towards that, and a precursor to USD “But what was really needed is the ability to bring everything from one 3D project to another 3D project,” Kerris explains. USD is at the very core of Omniverse, and NVIDIA has built an entire set of tools and core technologies around USD that is truly jaw-droppingly impressive.

NVIDIA continues to advance state-of-the-art graphics hardware, and with their AI R&D strength and the depth of Omniverse, the company is now able to really show what is possible with real-time ray tracing and intelligent systems. “The potential to improve the creative process through all stages of VFX and animation pipelines will be transformative”, comments Francois Chardavoine the current VP of Technology at Lucasfilm & ILM.

Audio2Face (Early access)

One of the applications built as part of Omniverse that has just been released in open beta is Audio2Face, a tool that simplifies the complex process of animating a face to an audio input. Audio2Face was developed as an Omniverse App, which sits on the platform and brings its capabilities to other applications integrated into the workflow. The input is any popular audio file such as .wav or .mp3. and the output is realistic facial animation – either as a geometry cache or as a live stream.

Audio2Face is an AI-based technology that generates facial motion and lip-sync entirely from an audio source. Audio2Face offers various ways to exploit the technology – it can be used at runtime or to generate facial animation for more traditional content creation pipelines. Audio2Face also provides a full character transfer pipeline providing the user a simplified workflow that enables them to drive their own characters with Audio2Face technologies.

Above: Fxguide’s Mike Seymour tests Audio2Face with his Australian accent!

Audio2Face simplifies the animation of a 3D character to match any voice/audio track. It is not aimed at the highest end animation of feature films, and yet it provides excellent lip-sync animation characters in games, NPC background characters, or for real-time digital assistants. Users can use the app to make interactive real-time applications or work offline and export caches for use in a traditional facial animation pipeline. Audio2Face can be run live or bake in and exported out, it’s up to the user.

Audio2Face rose from a set of meetings in around 2018 when multiple game developers came to NVIDIA and said “Hey, you know, the scale of our games is now so huge. What can NVIDIA do – maybe with the help with deep learning or other technology, to help accelerate and meet these production demands?”, explained Simon Yuen, Director of Graphics and AI at ‎NVIDIA. At that time it was not uncommon for a game development team to be in motion capture for two and a half years straight, just for one game. These realistic production problems plus the general difficulty (time consuming, labor-intensive process) of creating high-quality 3D facial animation “led us to start thinking about how to solve these problems. We want to come up with a solution that can accelerate and simplify speech-based facial animations.” he adds.

The straight output of Mike’s audio to the default face, note this version does not yet have active eyes.

The application uses the RTX Renderer as Audio2Face is a part of the Omniverse framework. It is not intended as a demo application, it is a tool to allow developers to build serious applications. For example, it is to be expected that this will be integrated with Epic’s UE4 Metahumans, allowing for Audio2face to drive UE4 characters in real-time. Omniverse facilities the round-tripping of assets in and out of programs such as UE4 and Autodesk Maya. Audio2Face can currently export either as a Maya cache or a USD cache. Richard Kerris points out that before committing to specific pipelines with partners such as Epic Games, NVIDIA wants beta customer feedback which plays a critical role in the development of the Omniverse platform. For Audio2Face to succeed it needs to do so by allowing developers to build it into more complex pipelines. Kerris sees this close feedback from customers as a primary goal for the team following the Early Beta release.

The Audio2Face program is now in open beta, but this is only a preliminary preview, there are many features still to come or just about to be released. For example, Audio2Face has a set of post inference controls that allow you to adjust the resulting poses of the output animation based on trained data, this allows you to fine-tune the amplitude of your performance among other things. There will also be ways to allow the user to change or combine different emotions to fine-tune the character’s expressions and responses. These will all be able to be animated on a timeline allowing users to fade in and out of emotions such as amazement, anger, disgust, joy, pain, etc in a future release.

Later releases will include more detailed audio-driven features including:

  • eyes and gaze
  • teeth and tongue
  • head motion (nods etc), using a joint structure – all driven by audio.
  • Plus, high-level controls so you can quickly direct the mood and emotion of the character.

All driven by audio with additional user controls. The goal is to use a very simple interface to get a complete high-quality performance without a lot of the work you have to do traditionally in CG.

Some of the leaps of plausible behavior are nothing short of remarkable. Just to underscore this point: NVIDIA Research is using machine learning to produce plausible animation of what a character’s eyes are doing based on audio waveforms alone! “We’ve trained the eyes so that they do a natural saccadic movement, and you can then both control or amp that up if you need. You can have offsets if you want on top of that to control where the character looks,” explains Yuen.

Emotions can be mixed in the UI

In a future release, the emotions of the characters can also be mixed. “Think of a combination of FACS poses embedded in a neural network so you don’t have to drive each FACS shape individually for the common laborious and expert required workflow,” says Yuen.  “We are looking at how we can mix and match emotions to get the performance we need.” The system animates not just the face skin of a character but the full face, eyeballs, eyelids, head motion, teeth, tongue, and neck. All of these aspects are controlled by the audio-driven animation and moderated by the emotional vectors.

Yeongho SeolSenior Developer Technology Engineer demonstrated emotion control at GTC 21. The data collection for the ML training data was based on 4D facial performance capture with synced audio. This included speech performances with various emotional states. Along with reference images and casts of the actor’s teeth, the NVIDIA team built a complex high-end 3D baseline 3D model to train the deep neural network.

Audio2Face is one of many applications of NVIDIA’s AI SDKs. Another team is working on new advanced text to speech (TTS) APIs and building blocks which NVIDIA CEO Jensen Huang flagged in his keynote address at the 2021 GTC conference, commenting on no more computer-sounding voices. NVIDIA has an entire framework for conversational AI from Automatic Speech Recognition and Natural Language Processing to Text-to-Speech (TTS) with voice synthesis using Mel Spectrograms.  NVIDIA released Jarvis 1.0 Beta which includes an end-to-end workflow for building and deploying real-time conversational AI apps, such as transcription, virtual assistants, and chatbots at the end of February 2021.


Developed by the NVIDIA AI Research Lab in Toronto, lead by Sanja Fidler the GANverse3D application inflates flat images into realistic 3D models that can be visualized and controlled in virtual environments. This new deep learning engine for creating intelligent 3D object models from standard 2D images recently brought the iconic car K.I.T.T. from Knight Rider back to life in NVIDIA’s Omniverse. This research is from the same team that last year presented GameGAN (see our fxguide story), an AI-powered version of PAC-MAN that can be played without an underlying game engine.

Differentiable rendering has paved the way for training neural networks to perform “inverse graphics” tasks such as predicting 3D geometry from monocular photographs. The single jpeg does not produce a highly complex model by VFX standards, but it does produce a fully 3D car, with no 3D modelling software or experience required. As it is done via machine learning, the model has moving wheels and one can animate or ‘drive it’ around a virtual scene, complete with realistic headlights, taillights, and blinkers (note: headlight effects were a post-process step). The process of going from images to 3D is often called “inverse graphics” since the problem is inverse to the process of rendering a 3D scene by taking into account the geometry and material properties of objects, and light sources present in the scene.  This means the certain properties of the car are inferred, even when not seen. For example, the hidden side of the car is plausibly created even though the far side of the car is hidden in the photo.

[embedded content]

To generate a dataset for training, the researchers harnessed a generative adversarial network, or GAN, to synthesize images depicting the same object from multiple viewpoints — like a photographer who walks around a parked vehicle, taking shots from different angles. These multi-view images were plugged into a rendering framework for inverse graphics, the process of inferring 3D mesh models from 2D images. NVIDIA GANverse3D uses the previously published NVIDIA StyleGAN as a synthetic data generator, and the process labels this data extremely efficiently. This “dataset” is then used to train an inverse graphics network to predict 3D properties of the objects in the images. As it uses StyleGAN, this approach produces higher quality 3D reconstruction results while requiring 10,000× less annotation effort for the training data, making it very useful in production.

The original actor William Daniels from Knight Rider was re-hired by NVIDIA to do the voice for their K.I.T.T. test demo piece. (Knight Rider content courtesy of Universal Studios Licensing LLC. )

To recreate K.I.T.T., the researchers simply fed the trained model an image of the car, letting GANverse3D predict a corresponding 3D textured mesh, as well as different parts of the vehicle such as wheels and headlights. They then used NVIDIA Omniverse Kit and NVIDIA PhysX tools to convert the predicted texture into high-quality materials that give the digital KITT a more realistic look and feel allowing it to be placed it in a dynamic driving simulation sequence.

Once trained on multi-view images, GANverse3D needs only a single 2D image to predict a 3D mesh model. This model can be used with a 3D neural renderer that gives developers control to customize objects and swap out backgrounds. For example, the tool has been used to produce horses (quadrupeds) and birds. But the program has limits, it would not be able to produce a complex quadruped walk cycle for example. “We would have to infer a bone system to do that,” explained Professor Sanja Fidler who worked on the project and leads NVIDIA’s Toronto Research. Fidler also outlined that the team is looking next to do a similar process for human faces, “but that problem is considerably more difficult.”

The ability to infer 3D properties such as geometry, texture, material, and light from photographs may prove key in many domains such as AR/VR, computer vision, and previz/scene mock-ups. The current system was trained on 55,000 images of cars, with fewer images for the birds and horses.

Given input images (1st column above), GANverse3D then predicts 3D shape, texture, and renders the object from the same basic point of view (2nd column). The image is now 3D as shown in the Multiple Views (3rd) column. The process is able to reconstruct hard surfaces with specular highlights and also more difficult articulated objects, such as birds and horses. But while some spec highlights can be accounted for, some StyleGAN-generated images can contain advanced lighting effects such as complex reflections, window transparency, and shadows, and the spherical harmonic lighting model is incapable of dealing with all cases successfully.

When imported as an extension in the NVIDIA Omniverse platform and run on NVIDIA RTX GPUs, GANverse3D can be used to recreate any 2D image into 3D. Users can both use it to create objects that the program is already trained on, such as cars, or use GANverse3D to train on users own new data sets.

The research papers supporting GANverse3D will be presented at two upcoming conferences: the International Conference on Learning Representations(ICLR) in May, and the Conference on Computer Vision and Pattern Recognition (CVPR), in June.

Creators in gaming, architecture, and design rely on virtual environments like the NVIDIA Omniverse simulation and collaboration platform to test out new ideas and visualize prototypes before creating their final products

Omniverse Enterprise

NVIDIA also announced the coming general availability of the high-end NVIDIA Omniverse Enterprise platform that enables globally distributed teams working across multiple software suites to collaborate in real-time in a shared virtual space.

WPP, the world’s largest marketing services organisation, is using the NVIDIA Omniverse platform to reinvent the way they make advertising content.

NVIDIA Omniverse Enterprise makes it possible for virtual production teams — which are often large, diverse in skills, and geographically dispersed thanks to COVID— to work seamlessly together on complex projects. Rather than requiring in-person meetings or exchanging and iterating on massive files, designers, artists and reviewers can work simultaneously in a virtual world from anywhere, on any device.

Share if you enjoyed this post!

Source link

RTC Digital Humans Preview

The April 26-28, 2021 RealTime Conference has just begun and it is featuring three 16-hour days of live presentations, discussions, interviews, and real-time live demos. The virtual event brings together some of the leading voices from a diverse set of industries, all united with a common goal: finding ways to take advantage of real-time technology.
The three-day event contains 100 sessions spread across 19 separate tracks, with each track focusing on a different industry or topic. The event includes more than 150 speakers from some of the world’s most innovative companies.
Sessions will be hosted by experts from companies that are leading the way in real-time technology. RTC’s April event will also feature several live demos highlighting some of the most advanced real-time tools available.

The stage is set for RTC Day 2

Digital Humans Curated by Christophe Hery and Mike Seymour

The Digital Humans track kicks off with

MetaHuman – Human Behind the Character (LA time 1pm Tuesday / Sydney: 6am Wednesday)

with Vladimir Mastilovic (3lateral/Epic Games) and Matt Workman (Digital DOP).

Creating believable and relatable digital humans is one of the hardest things to do in game and film productions. With the release of MetaHuman Creator this has changed and now it’s easier to imagine future workflows where digital humans are much more accessible to a wider group of creatives. In this session we’ll explore the short and the longer term implications to games, film and virtual productions.

This is followed by a panel discussion, with the presenters and joined by Kim Libreri CTO of Epic Games. This panel will discuss the impact of Metahuman Creator on Virtual Production, Storytelling & Prototyping.

Human Creation In Blender (LA time  4.30pm / Sydney 9.30am)

This is presented by Alexander Lashko, who is a 3D character artist specializing in high/low poly modeling and sculpting, as well as texturing.

Real-Time Humans With 4D Faces & Body Sims (LA time  4.50pm / Sydney 9.50am)

James Jacob will discuss how Ziva is bringing offline digital human bodies and faces into real-time environments using Machine Learning, 4D capture, and extensive simulation data.

The Future Of Volumetric Capture (LA time  5.10pm / Sydney 10.10am)

Christina Heller will present how Volumetric capture brings the power of real human performances into virtual mediums.

Then follows a panel discussion with the presenters above. (LA time  5.30pm / Sydney 10.30am)

Talking To Douglas- The Challenges In Creating An Autonomous Digital Human (+ Live Demo)  (LA time  6.05pm / Sydney 11.05am)

Matthias Wittmann will be showing a live-demo of DD’s latest progress in autonomous digital humans.

From Alita To Gemini Man- A Journey Of How To Make A Digi A Human (LA time  6.25pm / Sydney 11.25 am)

Weta’s Andrea Weidlich will explain how digital humans are often regarded as a key remaining challenge for computer graphics as the success of a movie rests on their ability to carry the heart and soul of the story.

Identity Protection With Digital Veils (LA time  6.45pm / Sydney 11.45 am)

Ryan Laney will present how his team masked witnesses using Neural Rendering in a documentary film to protect their identities.

Roundtable Discussion (IRL & Digital Beings) (LA time  7.05pm / Sydney 12.05 pm)

A brief roundtable discussion with Kathleen Cohen, Christophe Hery and Mike Seymour about Digital Twins, Digital Beings and Humanity.

Panel Discussion (Digital Humans) (LA time  7.20pm / Sydney 12.20 pm)

Adding A Touch Of Soul To The Metaverse (LA time  7.55pm / Sydney 12.55 pm)

In this presentation, Mark Sagar of Soul Machines will present examples of the types of characters the metaverse can be populated with from digital twins to digital helpers, dig into the need for a digital brain, and discuss BabyX, a virtual animated baby that learns and reacts in real-time.

The RealTime Conference

The RealTime Conference (RTC) is designed for Real-Time Communities. With real-time technologies growing at an unprecedented pace and shaping countless industries in their wake.

In 2020, 6,740 unique registrants from 103 countries, hundreds of keynotes speakers and visionaries, and insightful Tech companies opening up the conversation across industries too often siloed − from Architecture to Automotive, Design & Manufacturing, Virtual Production, Digital Humans, and more…

RTC is supported by the RTC Advisory Board, a group dedicated to bringing people together while offering recommendations to strengthen the program, including our own fxguide’s Mike Seymour. The full list of RTC Board members can be found here.

The fully virtual events will allow participants the opportunity to engage with experts across multiple industries to see the enabling technologies and present-day applications. Focused collaborative sessions offer creative collisions across industries and help spark new ideas.

The conference was founded by Jean-Michel Blottière, Dave Gougé and Thomas Haegele. Blottière and Haegele previously collaborated in overseeing FMX before founding the RTC, while Gougé has put together over 100 events as curator and host. They are joined by Manny Francisco, the former VP of advanced creative technology for NBCU/DreamWorks Animation, and more.

Share if you enjoyed this post!

Source link

Congrats to the Oscar Winners!

And the Oscar goes to…

A huge congratulations to all the winners and all the nominated film’s teams of artists at this year’s 93rd Oscars.

The Winners for Best Visual Effects:

Andrew Jackson, David Lee, Andrew Lockley and Scott Fisher

Congratulations to all the artists behind Tenet.

The DNEG team also won a well-deserved Special Visual Effects BAFTA, in the UK.

See our fxguide coverage here:

fxguide’s Mike Seymour is also hosting a Tenet panel discussion with Andrew Jackson and Scott Fisher next week at FMX.

Oscar’s night around the world.

Best Animated film:


Pete Docter and Dana Murray

” This film started as a love letter to jazz. But we had no idea how much jazz would teach us about life,” Pete Docter, Pixar.

Short Film (Animated)


Will McCormack and Michael Govier

Rolling our the Red carpet for the 93rd Oscars last night at Union Station

441 days after the last Oscars, the 93rd Academy Awards were presented at Union Station in Los Angeles, CA and televised live by the ABC Television Network.

Share if you enjoyed this post!

Source link

Galactic footsteps in the cosmic rewind: Scanline & the Flash

Zack Snyder’s Justice League, often referred to as the “Snyder Cut” is the 2021 director’s cut of the 2017 film Justice League. Like the theatrical release, Zack Snyder’s Justice League follows the Justice League – Batman (Ben Affleck), Superman (Henry Cavill), Wonder Woman (Gal Gadot), Cyborg (Ray Fisher), Aquaman (Jason Momoa), and the Flash (Ezra Miller) – as they attempt to save the world from the catastrophic threat. Steppenwolf and his Parademons return after eons to capture Earth. However, Batman builds a team with the help of Wonder Woman, to recruit and assemble Flash, Cyborg, and Aquaman to fight the powerful new enemy.

Unlike the theatrical release, the film is much longer, a different aspect ratio, darker, and a much bigger hit with fans of the DC universe. Snyder had originally stepped down during post-production following the death of his daughter, and Joss Whedon was hired to finish the theatrical version, completing it in ways quite different to Snyder’s script.

Bryan Hirota, VFX Supervisor at Scanline has worked extensively before with both Director Zach Snyder and VFX Supervisor John Des Jardin, but in the case of the Director’s version of Justice League, the revisions to the film were almost a film within themselves. For the new 4-hour film, only one hour of footage was used from the first film untouched. Scanline completed their work in just 7 months to help Snyder bring his version of the film to an audience. The first month of that was just trying to uncompress and bring back online the archive of the original files from 2017.

One of the interesting sequences in the new version was the rolling back of time by the Flash, undoing the apocalyptic nightmare, by moving faster than the speed of light. This posed a range of problems. For the audience, the flash needs to move forward, while the world is moving backwards. To denote speed, the Flash is moving in slow motion, so the world is unexploding backwards in slow motion, reforming around him. Additionally, the director had a very clear idea of what he hoped to see in the final sequence. “Zach had the idea of some sort of representation of the unity explosion, and energy, as Flash was rewinding all the destruction backwards,” comments Hirota “He also had this idea of his footsteps creating like a galaxy with each step, which he described as being ‘like a mini big bang’ on every step”.

Hirota and the team were keen to explore this new visual problem and build on past work they had done with the character. The team started doing effects, tests, and simulations. “We were exploring what it meant to do a mini big bang on each, on each footfall,” he adds. The team nicknamed the effect: ‘the galactic footsteps in the cosmic rewind‘. But the Scanline team quickly discovered that this rewinding of explosions around the Flash was not as simple as playing a simulation backwards. To make matter worse, the idea seemed so strong that the director decided to accelerate that process by putting two or three of these shots in the original ‘hallelujah’ trailer. (See below @ 2:00) “That whole period is a gigantic blur,” jokes Hirota. “From the time we got the go-ahead to do the (revision) project, to the time that the first trailer came out wasn’t that long, maybe it was a couple of months!”

[embedded content]

Shocking Frankenstein’s Monster

Projects at Scanline are well archived and carefully catalogued, but even so with the changes in both staff and technology that had happened since the original 2017 project, “we spent the first month just trying to “shock Frankenstein’s monster back to life”, mused Hirota. Scanline VFX, as a facility, had not worked on the original footage shot in Russia, but in reality, as most of this was digitally created from a clean sheet design, it was easier than some of the revision to previously crafted shots. For example, the Scanline faced not being able to load some old assets as the newer versions of the pipeline would not read the files completely accurately, and even on a normal project, done under the best conditions, it is exceptionally hard to pick up a complex shot from another artist and take over blind, – let alone when the first artist no longer is on your team and used a different version of the software. “In general, if there were an old shot that got brought back, then it required some forensics to understand even what the aim was of the shot,” spells out Hirota. “We would start by gauging where Zach was with the shot.  If Zack 90% happy with it, but we need to fix X, Y or Z on it, then we tried to figure out the minimally invasive surgery we had to do. But still, sometimes we would get stymied by some software that just was no longer compatible or some scene file.”

If all of this was not enough, the Zach Snyder version was mastered in a different format. The new film is in a 1.33 aspect ratio vs the original which was 1.85. This meant that some shots that might have been perfectly acceptable, inside the 1.85 region, needed to be worked on to bring the parts of the frame outside the 1.85, but now seen in 1.33, up to final quality.

Likely for Hirota, while some team members had moved on to other projects, the senior team leaders in Scanline’s VFX team were mainly the same on the new version as the old. The project was comped in Nuke, with the animation in Maya and most of the effects done in 3DSMax, “using Houdini as well to destroy things. And then we used Scanline’s Flowline to do a lot of fluid dynamics simulations on top of that,” he outlines. The team grew to over 400 artists at its peak, which was a third again larger than the team on the original.

Sims Backwards.

Scanline is known for its outstanding work in simulation, but the sims needed to look good in reverse and that required more than just playing forward-sims backwards. The normal approach with destruction sims is to do the major simulation and then build tertiary sims onto it with very bespoke elements. What the team found was that the dust passes in particular needed to respect that while the explosion was running one way in time, The Flash was going the other, and for him to be seated in the shot, his forward motion needed to also be reflected. “You could argue that time and thus direction was ambiguous. But the backwards traveling effects could end up in inconvenient places or unattractive places in the shot in the time,” He says. “And also the ripping apart of geometry, in reverse it can look too neat.” The fact the audience can see where the destruction is going changes how the eye reads the scene, naturally, there is no sudden surprise as large amounts of debris fall back into place. If pieces seemed too predictable in their reassembly the shot was less cinematic, especially as this was happening in reverse.

The performance of the Flash was key as the plot turned on his ability to save the day at this point and overcome his guilt at allowing the death of his friends. The actor, Ezra Miller, was shot in a suit with interactive lighting, but because the digital environment changed with the light, Scanline, “ended up replacing his suit almost entirely in this scene, so that it could react to whatever flash speed effects we were creating and adding to the world around him,” says Hirota. “And in a handful of wider shots, the character was fully digital.” Scanline took great care to preserve as much as they could of the original actor and to match to Ezra (Miller’s) great performance.”

There are some shots in this sequence where Miller was delivering dialogue, but still in slow motion. For these Miller mimed the lines which were sped up on playback in the studio. This meant that when the footage is played back in slow motion the words would appear to be said at the correct speed, while Miller would be moving in slow motion. Most of this style of footage was shot at half-speed, (48fps for 24). As effective as this is there were still some shots where the director noticed the ‘mime’ and the Scanline team would correct the issue.

And much more….

Flash was only one of many sequences Scanline VFX worked on, for this new four-hour version. For example, they redesigned the costume or skin of Steppenwolf, making his skin ripple and become a reactive armour that reflected his moods with spikes, metal scales, and feathers as well as changing his dialogue delivery.

In all, Scanline VFX delivered over 1,000 shots across 22 sequences, with their work encompassing everything from hero creatures, character builds and digi-doubles, to large-scale environment work and epic battle/fx destruction sequences

Share if you enjoyed this post!

Source link

Flame Learns (Even More)

Autodesk has updated Flame with powerful new machine learning (ML) features and a more tightly integrated toolset, with enhanced support for remote review sessions.

Autodesk has released the latest version of Flame with more than 40 user-driven improvements that aim to simplify the artist’s day-to-day and boost creative collaboration. Central is building on Flame’s current ML capabilities that address specific tasks in Flame, this update brings ML to Flame’s core toolset in the form of next-gen camera tracking technology. The new camera tracker delivers remarkable automatic camera solves and 3D geometry output.

Flame artists can now experience next-generation camera tracking using new scene reconstruction algorithms similar to autonomous vehicle smart ‘vision’ and reality capture-type point cloud reconstruction, Flame’s new camera tracker ‘auto masks’ or ignores moveable objects like people, cars, bodies of water, and skies, and focuses the solve on the static scene environment only, so that artists no longer have to spend time manually masking out moving objects to get a reliable result. In many cases, the new camera tracker provides artists with a one-click solution, delivering high-quality results with over 5,000 points in far less time than it would take using previous workflows.

There is also a new integrated finishing toolset. This includes a new creative look up table (LUT) loader that lets users import an external file-based LUT or color transform from a wide variety of file formats (.3dl .cube .ctf .ccc) directly inside Action and Image toolsets, and apply the ‘look’ to the entire picture or part of it. Additional enhancements include broadened GMask Tracer functionality, expanded support for industry-standard tactile colorist control panels (Arc, Element, Wave 2, Ripple, Element-Vs), and Blackmagic RAW media compatibility.

Additionally, there is NDI video preview streaming so artists can now share high-quality full screen video with creative stakeholders remotely, either over a closed network or public internet, using any NewTek NDI receiver software or device. This new feature is also compatible with webcasting software like OBS Studio and streaming services like YouTube Live, Facebook Live, and Twitch.

“The latest updates to Flame – including new ML-powered camera tracking, an integrated finishing toolset, and enhanced support for remote workflows – are a direct reflection of these efforts and feedback from the Flame community of artists,” commented Will Harris, Flame Family Product Manager, Autodesk.

“Flame has brought forth some fantastic features that I’m very excited to bring to my VFX workflow, including the update to the camera tracker, which will be a game-changer. The ability to generate geometry from point clouds and leverage machine learning in order to accomplish more consistent and powerful results is also a leap that I’m so excited for,” shared VFX Supervisor, Bilali Mack.

For ‘What’s New in Flame’ tune into Autodesk’s Catch Up with Flame Webinar on Wednesday, April 28 from 12-1:30 pm ET/9:30-10:30 am PT. The virtual event will take audiences behind the scenes of the latest release with insight from Autodesk Flame Family Product Manager Will Harris and Alkemy X VFX Supervisor Bilali Mack. LOGIK’s Andy Milkis will also present the 2021 Flame Award nominees and reveal this year’s winner.

Share if you enjoyed this post!

Source link

CopyCat, Inference & Machine Learning in Nuke

Nuke 13 includes Machine Learning (ML), a flexible machine learning toolset. The ML Toolset was developed by Foundry’s A.I. Research team (AIR), it enables artists to create bespoke effects with applications of the toolset including upres, removing motion blur, tracking marker removal, beauty work, garbage matting, and more.

The key components of the ML toolset include:

  • CopyCat – an artist can create an effect on a small number of frames in a sequence and train a network to replicate this effect with the CopyCat node. This artist-focused shot-specific approach enables the creation of high-quality, bespoke models relatively quickly within Nuke without custom training environments, complex network permissions, or sending data to the cloud.
  • Inference- is the node that runs the neural networks produced by Copy Cat, applying the model to your image sequence or another sequence.
  • Upscale and Deblur – two new tools for common compositing tasks were developed using the ML methodology behind CopyCat and open-source ML-Server. The ML networks for these nodes can be refined using CopyCat to create even higher-quality shots or studio-specific versions in addition to their primary use for resizing footage and removing motion blur.
The reduction of blur can be seen here split screen inside NUKE

Two years ago, fxguide published a story on the Foundry’s open-source ML-Server client/server system that enabled rapid prototyping, experimentation and development of ML models on a separate server, with the aim of introducing a way to have ML tools in Nuke, but developed in parallel.

With Nuke v13 the Foundry’s AIR team now offers native nodes inside Nuke. Primary amongst these is the CopyCat node. As with the ML-Server, the core ML tool is a Multi-Scale Recurrent Network (MSRN). “We do believe that the MRSN is a magic network, it solves a huge variety of challenges, and it does it well,” comments Dr. Dan Ring, Head of Research at the Foundry. There is no doubt that ML brings to Visual Effects a whole new world of solutions to visual effects problems. What makes it so exciting is that ML represents a new way to solve problems not just a new tool or node in Nuke. The approach of providing training material to a ML node which then infers a result is truly revolutionary and exceeds even the current hype surrounding the general AI buzz in the press. There is now little doubt that while such AI tools will not replace artists, those who fail to understand them may fall away as a new generation of complex AI solutions are deployed.

Beauty work here removing the actor’s breard

Supervised Learning

Not all AI or ML requires training data examples, but the CopyCat node does. This is because it is part of a class of ML called supervised learning. Importantly there are also ML solutions that are classified as Unsupervised and Reinforcement Learning, more on those below.

To use CopyCat is relatively easy, samples frames of roto or beauty work are provided as before and after frames to Nuke. The system then infers what has been done and applies it to a clip. To really master ML it is worth understanding how this works and what is really happening under the hood. It is easy to anthropomorphize the actions and imaging the computer ‘sees’ the frame as we do. The output seems so logical and sensible that when it does not work, (and it most certainly can fail), such failures seem to make almost no sense.

The first thing to understand is that computer has no understanding of the image, it appears to, but it does not. It works on statistical inference and while we clearly see a person or a car, the computer never does, it just sees pixels and it tries to reduce errors so it’s output aligns mathematically with the training data. The further you ask the computer to infer outside the examples of training data space, the worse the results. It infers best inside the space not extrapolating from the examples. It also does not view the whole frame as one thing, it works in patches.

What makes the computer so powerful is its ability to try things a million different ways and if it starts getting better results it continues with that line of inference. In maths terms, this is directly related to the reduction of errors in the ML network. The network is called a Deep Network as it has layers. To reduce the errors, it ripples back through the network improvements to its’ approach, changing the weights of nodes inside its neural network. This rippling back is called backpropagation. If the error is high then via backpropagation it is reduced, this declining error rate is mapped visually in Nuke and it called gradient descent. As with most things in life, it is easy to take a wrong turn or ‘bark up the wrong tree’. In maths terms, this means that there is a local minima, and one of the greatest technological feats of CopyCat in Nuke is how the AIR team have produced a general tool that overcomes wrong turns or local minima and manages to drive the gradient descent effectively on a host of different visual problems. While Foundry did not invent Multi-Scale Recurrent Networks (MSRN) the AIR team have produced a remarkable implementation that produces better results over time, in a variety of non-task-specific applications and avoids many possible problems and divergent results during the gradient descent or error reduction.

The error reduction can be seen graphically on the right of the image

One key part of the Foundry’s highly effective solution is over-fitting. Essentially, overfitting a model for a VFX task exploits the massive within-shot redundancy while side-stepping significant inter-shot differences that traditional ML aims to capture. Although the MSRN network is not task-specific, Nuke’s encoder-decoder networks have proven themselves very effective in a wide variety of tasks. “But in each case where they excel, it’s because they apply domain-specific knowledge, and that’s precisely how we tuned our MSRN,” points out Ring. The AIR team tunes their MSRN not for a task, but for VFX, where the demands for pixel accuracy, color fidelity, and performance are very high. One of the interesting challenges Foundry faces is that while ML is a hot topic in academic research, the focus of general researchers is not a great match to the demands of the VFX world. “There are so many areas that have been completely ignored by academia. And I think that is because they’re just different objectives between an academic paper and getting a shot finished and delivered on time. They are so very different,” reasons Ring. “A prime example is that we applied the same process needed to make SmartVector useful; we paid very close attention to filtering and preserving information between filters. This is crucial in mitigating the typical artifacts you get in a lot of ML models when you stop training too soon,” he adds. Pre-processing the data is also vitally important to Nuke’s ML tools, particularly for HDR imagery, which needs special attention. “We still have more to do here: ML frameworks have been historically focused on 8-bit sRGB images and shifting to production-level images is not trivial.”

This digital makeup example remarkably only needed two training frames due to the way the ML works

One impressive aspect of the ML nodes such as CopyCat is just how few training frames they need to be effective. Once again this is due to the computer not working in whole frames but patches. While a user may only offer 5 or few frames of supervised ‘correct’ training frames, this is enough to train and improve the inference given enough time.

The MSRN is a layered network with a set of weights, adjusting these weights is at the core of ML. The default network is roughly 42 layers deep, which is ~7M weights. You can vary the size of the parameters in CopyCat. There are other network sizes, depending on whether the one wants to prioritize speed, quality, or complexity. The default 7M weights are adjusted and this is what is stored in a .cat file (which is typically about 26MB per .cat file).

What matters to the ML is the pixels vs parameters ratio. If CopyCat uses ~10 patches, where each patch is at most 256x256pixels (262,144 floating values). “For ~10 patches, we’re talking of a total of ~2M floats, which is dramatically lower than the 7M weights,” explains Ring.

GPU card advice

The ML tools work well using modern GPUs, the internal workhorse used by the AIR team is the Titan RTX NVIDIA cards for development as “the extra memory has been great,” Ring comments. Generally, users need a minimum amount of memory to run the inference, “at the moment you need about 7 or 8 Gigs VRAM in order to do anything, that’s the minimum. We haven’t used the 3080, but we have been using the A6000 and that has been phenomenal for our performance transfer work. The sweet spot is 24Gig/ Titan RTX for training … as we move forward, we are seeing the need for a lot of VRAM…I think if you’re thinking about trying stuff now, then the current level of cards, like, the 2080s, are great. If you’re looking forward the 3080 is going to be great, or the 3090, if you can get your hands on it – absolutely. ”

Upscaling is another ML application inside NUKE

ML Server is not dead (just yet).

Nuke is a platform as much as an application. Many facilities build out from the core of Nuke using C++ SDK and Python. It is Foundry’s desire for this to extend to ML. High-End users are encouraged to build ML processes that create their own .cat files that feed the Inference node. In five years from now the Foundry expects teams to build their own models and sharing them on something like Nukepeida. Already a user can take a model from CopyCat and extend it with their own work. This does not however exclude the use of ML-Server.

The ML-Server has been providing a lot of offline support to customers who have been working with it and using it in production. Ring points out that “if you look at the open ‘issues’ re ML-Server, you can see what has been done extends far beyond the original remit of ‘experiment with ML in Nuke’ and is precisely why our focus has transitioned to our Inference tool.”

For Inference and custom .cat files, Foundry has already started sharing a script that allows teams to package their PyTorch models into a “.cat” file, but there are some limitations. For example, there can only be 4 channel images and the same image dimensions for both input & output. “The plan is to add support for a wider variety of models and include the conversion script and training templates natively. This will be part of the next release, likely Nuke 13.1,” Ring explains.

If one looks at the Github for the ML-Server, the software has not been updated for 11 months so the current development is less than active, but the AIR team themselves still use it to quickly share models and ideas with customers and prototype their own internal investigations. “I’m not ready to pull the plug on it just yet,” jokes Ring.

Deblur output example

Unsupervised Learning and Reinforcement Learning.

Unsupervised Learning is another extremely powerful and popular area of ML. It has wide and impressive applications in classification problems for example.  Reinforcement Learning (RL) is a third area that can be powerful in improving and personalizing processes. And perhaps RL could be applied to the issues of Nuke User Experience, – in other words, there could be a role for RL in how Nuke presents itself to the user. We put to Dan Ring that RL could be a tool for personalizing one’s Nuke workflow based on understanding the tasks a user does. The challenge for the AIR team would be overcoming what is known as the Alignment problem, which means NUKE may have too much variation in tasks for RL tools. “We’re very keen to explore RL for VFX, and two of our recent hires have RL backgrounds! It is certainly on a longer timeline for us, but we (and our customers) are starting to think about what it means,” he responded.

Two examples often come up – the first is what Rig called the artist’s “MS Clippy / auto-comp my shot” task. Can a system see what a user is trying to do and ‘auto-complete’ the shot? “This is where your Value Alignment problem crops up immediately. In the artist’s case, it’s often hard to come up with the ‘real value/reward’ signal without the artist (supervisor) imparting knowledge or direction. Ideally, you want to know how well an artist or a system pulled a key from a green screen, and it’s not objective,” he explains. “There’s a lot of value in that sort of online semi-supervised learning, but it’s not strictly RL. One objective signal we can measure is time, or ‘time spent in a node’. An RL system could use that to decide an optimum set of knob values to minimize the time spent tweaking a node. Again, it’s hard to say whether a system that reduces the artist’s time spent aligns well with the task the artist is doing, but I’m looking forward to finding out!”

The second possible application is not focused on helping the Nuke artist but helping the VFX Producer answer the question “how much is this shot going to cost?” The general bidding problem is based on having several shots, and each shot requires a variety of tasks at various quality levels. The AIR team is asking now “can a system infer a list of steps to deliver your shots and reduce your costs? The alignment here is much clearer and can be measured objectively.” But Ring is also very quick to point out that this is to help producers not replace them. “A good VFX Producer is more than the sum of their experience and ability to process data.”

The volume of work a single artist has to accomplish has exploded over the last year. To help artists, things go faster, and that means scaling. Either across more machines on-premises or more use of processing in the cloud. “Once you start getting into serious competition for resources, your studio’s performance suffers. We’ve started thinking about how Q-learning (a branch of RL) could help,” says Ring. “In particular: for a given compute environment with a given load, can a system be used to infer the best order of Katana’s compute and data transfer operations to minimize graph evaluation and render times?” The AIR team is still investigating this, but already it seems to the team that any large-scale or out-of-core compute-heavy application should be designed with clever AI or ML scheduling.


Perhaps the most sort after ML solution is the one that would solve Roto. CopyCat can produce a very good Alpha or matte, although as Ring points out, you still need a good artist, to generate the high-quality training example for the Network to learn from, CopyCat ” won’t make a bad matte better, it will just give you more bad mattes sooner.” While CopyCat mattes can be brilliant, fully solving Roto requires genius-level AI. Foundry’s launch demos include CopyCat processing several roto shots to B/W mattes but a general solution to Roto is still a little way off. The reason that Roto is so complex comes down to three key points:

  1. Most ML solutions such as segmentation assume the output is a B/W matte. CopyCat can be used effectively to produce a matte, but the Roto problem is not targetted at a matte as the final output, but rather an editable splined shape that artists can adjust and vary. As such, the matte is expressed as a keyframed spline shape that sensibly moves over time. For the Nuke artist to be able to adjust the roto, the solution cannot be just a pixel mask matte output nor can it be a stand-alone spline keyframed on every single frame.
  2. Most ML solutions are not temporal. This may not seem obvious from the brilliant digital makeup examples, but the solution is not tracked patches but frame by frame independent solutions that just happen to be so close to each other that they don’t flicker. But the ML logic is not to solve a clip it is to solve a series of frames. This is not a great match to the roto problem. Good roto places keyframes on the correct apex of a motion, not just say every 10 frames.
  3. Roto artists don’t just want the current silhouette or outline as the roto output. The roto of someone walking has shapes overlapping as arms move over the body. Any good roto artist animates shapes that make sense to the object even on that frame the outline of the combined rotos is an odd-shaped blob.

Some good news is that Foundry knows that to solve roto fully requires a holistic picture; an ability to produce a spline tool that artists can manipulate. It has an internal project called the Smart Roto (Roto++) that is a funded research program with the University of Bath and University College London since 2016. For now, CopyCat does a great job but it is very much a pixel solution. There are advantages to such an approach. As CopyCat does not care about smooth temporal splines, it is easy to make synthetic data for training. Synthetic data is when a roto is hand created for a frame and then just duplicated on a frame at random angles over different possible backgrounds. Since the ML node is just looking to learn about the transition from matte to non-matte, it can learn from fake frames which make no sense to the sequence but provide more matte vs. non-matte training data.

Roto temporal shape animation may benefit from other areas of AI.  The Foundry’s own research has identified that it is estimated that just over half of all roto involved people. People are also hard to roto due to limbs and especially finger and hand occlusions along with issues related to hair and loose clothes. As such it is worth considering developing special case people roto tools. There are already some great research papers on estimating 3D human bone and joint movement from just flat 2D video. There is also strong work in ML volume reconstruction, both areas may end up as additional inputs to a more powerful people-specific roto AI solution in the future.

Helping Artists (non replacing them)

Thinking longer-term, CopyCat is a powerful tool for Nuke artists, and it also helps lay the foundations and set expectations around ML. The vertical headroom for ML in VFX is very high, and the impact that things like RL and Q-learning can be huge when applied to the right task with the right thinking.

It is also important to remember that the nature of ML is that it takes time to train, but often the solutions, the inference can be lightning fast, which opens us Nuke to more use in the real-time space. People are starting to use CopyCat for solving the whole host of new comp problems generated by in-camera VFX, such as fixing moiré, removing rigs in the shot, grazing angle color aberrations in LED capture volumes, and other real-time related virtual production work. Due to the nature of the LED screens themselves such as limited dynamic range, and interactions between physical lights & walls, etc, there is a host of areas in VP where Nuke is not being used. “Studios, VP (Virtual Production) & VFX vendors have been promised they can ‘take it home on the day, but these new problems mean shots need to go to post,” points out Ring. “We’re investigating toolsets to mitigate these problems as close to set as possible. We’ve already started this with our GENIO (Nuke / Unreal bridge) work, allowing you to easily pull UE render passes and cameras into Nuke, and you can also imagine a world where a real-time CopyCat is trained during the wall set-up and used to generate live mattes. All with the shared goal of giving you your final image that day.”

Foundry is also looking at workarounds for VP & Nuke around: ‘persisting’ decisions, wrangling on-set data, and conforming to a master shared timeline (“Timeline of Truth”). “As you might imagine, assembling and managing data is a more challenging problem than ML (possibly one of the most difficult problems in our industry) and is farther off, Ring concludes.

For more…

Ben Kent – Research Engineering Manager is presenting Machine Learning for VFX with Nuke and CopyCat, today, – Apr 15, 2021 @ 3:00 PM – 3:40 PM BST as part of GTC. Dan Ring will also be presenting at the RTC conference later this month on Nuke & ML.

Share if you enjoyed this post!

Source link

Fake Tom Cruise

Chris Ume is a European VFX artist living in Bangkok who has shot to international attention with his Tom Cruise Deep Fake videos or DeepTomCruise posts. Chris has demonstrated a level of identity swapping that has surprised and delighted the community in equal measure. Since he started posting the videos of Miles Fisher’s face swapped as Tom Cruise his email inbox has been swamped with requests for advice, help, and work. What has caught the imagination of so many fellow artists is how the TikTok videos have Fisher breaking the ‘rules’ of Neural Rendering or Deep Fakes. In the videos, DeepTomCruise pulls on jumpers over his face, takes on and off glasses, hats without any seeming concern about occlusion, and how DeepTomCruise regularly has his hair or his hand partially over his face.

Ume uses as his backbone the free AI or Machine Learning (ML) software DeepFaceLab 2.0 (DFL), but the process is far from being a fully automated process. For each short video Ume spends15 to 20 hours working to perfect the shot and sell the illusion. While anyone can download the software, the final clip is anything but a one-button-press solution. As with all VFX, the artist’s role is central and what looks easy and effortless on-screen is actually complex and oftentimes challenging.

Each video starts with a conversation with Tom Cruise impersonator Miles Fisher. It is actually Fisher who films himself and sends the videos to Ume. There is never a tight script, Ume has explained the known limits and invited Fisher to push the boundaries. Ume does not direct the actor, and to date, only one video has had to be reshot. In the original version of the lollypop clip, Fisher too often came very close to the camera, turned, and dropped in and out of frame.

Ume uses DFL 2.0 which no longer supports AMD GPUs/OpenCL, the only way to use it is with nVidia GPU (minimum 3.0 CUDA compute level supported GPU required) or CPU. Ume uses an A6000 nVidia card. The actual software version of DFL 2.0 that Ume uses is faceshiftlabs, which is a Github a fork of actual DFL code.

[embedded content]

Fisher films the base clips on his iPhone and sends the files to Ume. The resolution is not high similar to 720P but at the end of each process, Ume performs an UpRes. He prefers to do this on the combined comped clip as he feels often times it is a mismatch in sharpness and perceived resolution that makes a deep fake look unrealistic.

A key part of Ume’s process is Machine Video Editor. MVE is a free community supported tool for deepfake project management it helps with data gathering to compositing, it fully supports DeepFaceLab and data format, Ume uses it extensively for the supporting mattes that are required for the later compositing work.

When doing any such ML the training stage is time-consuming and Ume normally allows “2 to 3 days at least, maybe more, depending on how quickly the shot clears up” to tackle a new subject such as DeepTomCruise. While it is his work on DeepTomCruise that most people know, Ume has done many similar projects with different subjects and targets.

The focus of MVE is neural rendering project management, and it allows Ume to have all his DFL training material in a single project folder, and data for data scraping, extracting, with advanced sorting methods, set analysis, augmentation, and manual face and mask editor tools.

The program helps with automatic face tagging avoids the need for manual identification of eyebrows, eyes, noses, mouths, or chins. The program is not open-sourced, but it is free.

DFL 2.0 has improved and optimized the process, which means Ume can train higher resolution models or train existing ones faster. But the new version only supports two models – SAEHD and Quick 96. There is no longer any H128/H64/DF/LIAEF/SAE models available and any pre-trained models (SAE/SAEHD) from 1.0 are not compatible. Ume only uses SAEHD, he sees Quick96 as just a fast rough test model and while he has explored it, DeepTomCruise uses SAEHD.

[embedded content]

All the compositing is currently done in AfterEffects. Ume is interested to explore NUKE, especially with its new ML nodes such as Copycat, but for now, he knows AE so well it is hard to shift applications. Some of the software in his pipeline only runs on PC so this is the platform that Ume does all his work.

As part of the compositing, Ume has experimented with changing hair color, patching skin textures, and noticed interesting artifacts from the training space into the solution space of DFL. For example, when Miles leans very close to the camera, the lens distortion is sometimes not reflected in the solution. This means the new DeepTomCruise has a jaw that is the wrong apparent width and is not receding fully with its distance to the lens. A face close to the camera at eye level will have the chin relatively thinner due to the wide-angle effect, but this is rare to see in actual Tom Cruise footage as the actor is seldom shot this way. In these cases, Ume uses the jaw much more from Miller than DeepTomCruise.

Ume is very collaborative working with VFX houses and also all the major artists working in the Deep Fake space. A group including users such as ctrl shift face, futuring machine, deephomage, dr fakenstein, the fakening, shamook, next face, and derpfakes , who collectively represent some of the best known usewrs on Github all share ideas and work to demonstrate the sort of amazing work that can be done with neural rendering technology.

Miles Fisher has sent respectful emails to Cruise’s management explaining that his & Ume’s work is just to explore and educate Deep Fakes and neural rendering technology, and he has vowed to never use DeepTomCruiseto promote a product or cause. Ume’s primary aim is to educate as to what is possible and build his own career in visual effects. “My goal was to work with Matt & Trey, (South Park) which I am now doing. My next goal is to work with ‘The Lord of the Rings‘ team. I grew up watching those movies over and over again,” Ume explains, admiringly referring to Weta Digital.

Share if you enjoyed this post!

Source link

Twin Peaks Meets Fargo Meets Alf: Resident Alien

Syfy’s Resident Alien starring Alan Tudyk (Firefly), who plays “Dr. Harry Vanderspeigle,” an alien who has taken on the identity of a small-town Colorado doctor. CoSA VFX is the primary visual effects vendor for the 10-episode series and VFX house Artifex Studios added 685 shots, amid COVID-workflow adjustments in 2020.

CoSA VFX Animation

Resident Alien has offered multiple opportunities for CoSA VFX‘s Animation team to shine, from environments to CG characters and everything else in between, and also for the team to have quite a bit of fun in the process.

“It was fun. They let us experiment to find the character. We did a lot of things that were probably even a little outlandish, looking back at it,” comments CoSA’s animator Roger Vizard, adding that Harry the alien had a lot of emotional values that we do they try to touch upon in the animation. “He’s a brilliant character to work with”.

Getting the alien on a horse was probably the biggest challenge the animators faced, as the nine-foot-tall alien was certainly larger than the stunt actor riding the horse in the raw footage. To accomplish this, the team watched and studied how the horse would react and animated Harry with matching interactions. Additionally, the horse was match-moved for seamless integration with dynamic elements and the animated character. This featured sequence for the third episode, where an ordinarily classic Western moment becomes something else entirely.

CoSA’s lead animator Teri Shellen commented that he hopes this kind of work continues to come to the studio, as these types of shots are very rewarding for the animators to tackle. “When we get episodes like this, they’re treasured, because we actually really get to get into character development and really push that field in our studio,”

CoSA also worked on many of the environment shots and in particular the pilot episode’s ship crashing to earth after being hit by lightning.


Artifex was involved early in setting key environments for “Resident Alien” and continued to add embellishments or build-outs dependent on scene requirements. In episode 6, the studio augmented stock plates to add sweeping snow-covered mountain ranges, while episode 8 saw a build-out of practical glaciers into a full environment.

The glacier sequence in episode 8 in particular demanded that virtually every moment was touched in some way by the VFX team. Artifex used matte painting, CG extensions, smoothing and alteration of the set, and texture work to subtly add snow and ice.

Artifex also did creature animation, in episode 7 their team created a CGI octopus which Alan Tudyk interacts with through aquarium glass. Their conversation suggests that Harry’s species and octopuses are closely related, something which Harry himself later states to Asta Twelvetrees. Nathan Fillion previously co-starred with Tudyk in the 2003 series Firefly and its concluding film Serenity. Fillion is not the only fan favourite guest star on the show, Sci-Fi acting legend Linda Hamilton plays General McCallister, a high-ranking U.S. military officer.

The photo-real octopod inspires a later scene in episode 9 Artifex had to supplant Tudyk’s leg with a tentacle. For the scene, the team painted out what was visible of Alan Tudyk’s leg, and added the CG leg, complete with flailing animation, and interaction with the bacon.

“The animation had to find a sweet spot that suited the vocal performance accompanying it,” said Artifex VFX Supervisor Rob Geddes. “We wanted to be careful to provide a grabbing visual without taking the viewer out of the moment by being too intentionally cartoonish or farcical.”

For the Day for Night (DFN) sequence above, the scene was shot in full daylight, but needed to shift in the edit, making it into a night sequence.  This required extensive roto, with matte painted elements to introduce lit building interiors, and streetlights.

Rounding out the work was the inside of the spaceship in episode 10, the season finale. Artifex designed and integrated the spaceship interior inside and around the green screen set.

Inside the Spaceship

The project spanned roughly a year due to delays imposed by COVID, with both internal and external adjustments being made to reflect the realities of working remotely.

Hardware/software used during the project included Maya / V-Ray for modeling, animation, and rendering; tracking in Syntheyes, matte painting in Photoshop, compositing in Nuke, scheduling and production tracking in ftrack, and Meshroom for photogrammetry.

Season 2

Executive producer and showrunner Chris Sheridan (Family Guy) and his talented creative staff have announced that the show has just been picked up for a second season and will return soon to Syfy.

Share if you enjoyed this post!

Source link

Congrats to the Winners of the VES awards

The Visual Effects Society held the 19th Annual VES Awards, the prestigious awards recognize outstanding visual effects artistry and innovation in film, animation, television, commercials, and video games. It celebrates the amazing work of  VFX supervisors, VFX producers, and all the artists who bring the work to life.

Winners of the 19th Annual VES Awards are as follows:

Outstanding Visual Effects in a Photoreal Feature
Matt Kasmir
Greg Baxter
Chris Lawrence
Max Solomon
David Watkins

Outstanding Supporting Visual Effects in a Photoreal Feature
Wei Zheng
Peter Mavromates
Simon Carr
James Pastorius

Outstanding Visual Effects in an Animated Feature
Pete Docter
Dana Murray
Michael Fong
Bill Watral

Outstanding Visual Effects in a Photoreal Episode
Joe Bauer
Abbigail Keller
Hal Hickel
Richard Bluff
Roy Cancino

Outstanding Supporting Visual Effects in a Photoreal Episode
THE CROWN; Gold Stick
Ben Turner
Reece Ewing
Andrew Scrase
Jonathan Wood

Outstanding Visual Effects in a Real-Time Project
Jason Connell
Matt Vainio
Jasmin Patry
Joanna Wang

Outstanding Visual Effects in a Commercial
WALMART; Famous Visitors
Chris “Badger” Knight
Lori Talley
Yarin Manes
Matt Fuller

Outstanding Visual Effects in a Special Venue Project
Salvador Zalvidea
Tracey Gibbons
George Allan
Matthías Bjarnason
Scott Smith

Outstanding Animated Character in a Photoreal Feature
Valentina Rosselli
Thomas Huizer
Andrea De Martis
William Bell

Outstanding Animated Character in an Animated Feature
SOUL; Terry
Jonathan Hoffman
Jonathan Page
Peter Tieryas
Ron Zorman

Outstanding Animated Character in an Episode or Real-Time Project
THE MANDALORIAN; The Jedi; The Child
John Rosengrant
Peter Clarke
Scott Patton
Hal Hickel

Outstanding Animated Character in a Commercial
ARM & HAMMER; Once Upon a Time; Tuxedo Tom
Shiny Rajan
Silvia Bartoli
Matías Heker
Tiago Dias Mota

Outstanding Created Environment in a Photoreal Feature
MULAN; Imperial City
Jeremy Fort
Matt Fitzgerald
Ben Walker
Adrian Vercoe

Outstanding Created Environment in an Animated Feature
SOUL; You Seminar
Hosuk Chang
Sungyeon Joh
Peter Roe
Frank Tai

Outstanding Created Environment in an Episode, Commercial, or Real-Time Project
THE MANDALORIAN; The Believer; Morak Jungle
Enrico Damm
Johanes Kurnia
Phi Tran
Tong Tran

Outstanding Virtual Cinematography in a CG Project
Matt Aspbury
Ian Megibben

Outstanding Model in a Photoreal or Animated Project
Michael Balthazart
Jonathan Opgenhaffen
John-Peter Li
Simon Aluze

Outstanding Effects Simulations in a Photoreal Feature
Yin Lai Jimmy Leung
Jonathan Edward Lyddon-Towl
Pierpaolo Navarini
Michelle Lee

Outstanding Effects Simulations in an Animated Feature
Alexis Angelidis
Keith Daniel Klohn
Aimei Kutt
Melissa Tseng

Outstanding Effects Simulations in an Episode, Commercial, or Real-Time Project
LOVECRAFT COUNTRY; Strange Case; Chrysalis
Federica Foresti
Johan Gabrielsson
Hugo Medda
Andreas Krieg

Outstanding Compositing in a Feature
Russell Horth
Matthew Patience
Julien Rousseau

Outstanding Compositing in an Episode
LOVECRAFT COUNTRY; Strange Case; Chrysalis
Viktor Andersson
Linus Lindblom
Mattias Sandelius
Crawford Reilly

Outstanding Compositing in a Commercial
BURBERRY; “Festive”
Alex Lovejoy
Mithun Alex
David Filipe
Amresh Kumar

Outstanding Special (Practical) Effects in a Photoreal or Animated Project
FEAR THE WALKING DEAD; Bury Her Next to Jasper’s Leg
Frank Iudica
Scott Roark
Daniel J. Yates

Outstanding Visual Effects in a Student Project
Antoine Dupriez
Hugo Caby
Lucas Lermytte
Zoé Devise

VES Special Awards

Cate Blanchett presented the VES Lifetime Achievement Award to award-winning filmmaker Sir Peter Jackson – along with a star-studded tribute from Andy Serkis, Naomi Watts, Elijah Wood, Sir Ian McKellen, James Cameron and Gollum.

Sacha Baron Cohen presented the VES Award for Creative Excellence to acclaimed visual effects supervisor, second unit director, and director of photography Robert Legato, ASC.

Share if you enjoyed this post!

Source link

VP with Digital Humans & Darren Hendler

Epic Games has released the second volume of its Virtual Production Field Guide, a free in-depth resource for creators at any stage of the virtual production process in film and television. This latest volume of the Virtual Production Field Guide dives into workflow evolutions including remote multi-user collaboration, new features released as well as what’s coming this year in Unreal Engine 5, and two dozen new interviews with industry leaders about their hands-on experiences with virtual production.

One such contributor is Darren Hendler at Digital Domain.

Darren Hendler

Hendler is the Director of Digital Domain’s Digital Humans Group. His job includes researching and spearheading new technologies for the creation of photoreal characters. Hendler’s credits include Pirates of the Caribbean, FF7, Maleficent, Beauty and the Beast, and Avengers: Infinity War.

Can you talk about your role at Digital Domain?

Hendler: My background is in visual effects for feature films. I’ve done an enormous amount of virtual production, especially in turning actors into digital characters. On Avengers: Infinity War I was primarily responsible for our work turning Josh Brolin into Thanos. I’m still very much involved in the feature film side, which I love, and also now the real-time side of things.

Josh Brolin from the Thanos shoot

Digital humans are one of the key components in the holy grail of virtual production. We’re trying to accurately get the actor’s performance to drive their creature or character. There’s a whole series of steps of scanning the actor’s face in super-high-resolution, down to their pore-level details and their fine wrinkles. We’re even scanning their blood flow in their face to get this representation of what their skin looks like as they’re busy talking and moving.

The trick to virtual production is how you get your actor’s performance naturally. The primary technique is helmet cameras with markers on their face and mocap markers on their body, or an accelerometer suit to capture their body motion. That setup allows your actors to live on set with the other actors, interacting, performing, and getting everything live, and that’s the key to the performance.

The biggest problem has been the quality of the data coming out, not necessarily the body motion but the facial motion. That’s where the expressive performance is coming from. Seated capture systems get much higher-quality data. Unfortunately, that’s the most unnatural position, and their face doesn’t match their body movement. So, that’s where things are really starting to change recently on the virtual production side.

Where does Unreal Engine enter the pipeline?

Hendler: Up until this moment, everything has been offline with some sort of real-time form for body motion. About two or three years ago, we were looking at what Unreal Engine was able to do. It was getting pretty close to the quality we see on a feature film, so we wondered how far we could push it with a different mindset.

We didn’t need to build a game, but we just wanted a few of these things to look amazing. So, we started putting some of our existing digital humans into the engine and experimenting with the look, quality, and lighting to see what kind of feedback we could get in real-time. It has been an eye-opening experience, especially when running some of the stats on the characters.

At the moment, a single frame generated in Unreal Engine doesn’t produce the same visual results as a five-hour render. But it’s a million times faster, and the results are getting pretty close. We’ve been showing versions of this to a lot of different studios. The look is good enough to use real-time virtual production performances and go straight into editorial with them as a proxy.

The facial performance is not 100 percent of what we can get from our offline system. But now we see a route where our filmmakers and actors on set can look at these versions and say, “Okay, I can see how this performance came through. I can see how this would work or not work on this character.”

How challenging is it to map the human face to non-human characters, where there’s not always a one-to-one correlation between features?

Hendler: We’ve had a fantastic amount of success with that. First, we get an articulate capture from the actor and map out their anatomy and structures. We map out the structures on the other character, and then we have techniques to map the data from one to the other. We always run our actors through a range of motions, different expressions, and various emotions. Then we see how it looks on the character and make adjustments. Finally, the system learns from our changes and tells the network to adjust the character to a specific look and feel whenever it gets facial input close to a specific expression.

At some point, the actors aren’t even going to need to wear motion capture suits. We’ll be able to translate the live main unit camera to get their body and facial motion and swap them out to the digital character. From there, we’ll get a live representation of what that emotive performance on the character will look like. It’s accelerating to the point where it’s going to change a lot about how we do things because we’ll get these much better previews.

How do you create realistic eye movement?

Hendler: We start with an actor tech day and capture all these different scans, including capturing an eye scan and eye range of motion. We take a 4K or 8K camera and frame it right on their eyes. Then we have them do a range of motions and look-around tests. We try to impart as much of the anatomy of the eye as possible in a similar form to the digital character.

Thanos is an excellent example of that. We want to get a lot of the curvature and the shape of the eyes and those details correct. The more you do that, the quicker the eye performance falls into place.

We’re also starting to see results from new capture techniques. For the longest time, helmet-mounted capture systems were just throwing away the eye data. Now we can capture subtle shifts and micro eye darts at 60 frames a second, sometimes higher. We’ve got that rich data set combined with newer deep learning techniques and even deep fake techniques in the future.

Another thing that we’ve been working on is the shape of the body and the clothing. We’ve started to generate real-time versions of anatomy and clothing. We run sample capture data through a series of high-powered machines to simulate the anatomy and the clothing. Then, with deep learning, we can play 90 percent of the simulation in real-time. With all of that running in Unreal Engine, we’re starting to complete the final look in real-time.

What advice would you give someone interested in a career in digital humans?

Hendler: I like websites like ArtStation, where you’ve got students and other artists just creating the most amazing work and talking about how they did it. There are so many classes, like Gnomon and others, out there too. There are also so many resources online for people just to pick up a copy of ZBrush and Maya and start building their digital human or their digital self-portrait.

You can also bring those characters into Unreal Engine. Even for us, as we were jumping into the engine, it was super helpful because it comes primed with digital human assets that you can already use. So you can immediately go from sculpting into the real-time version of that character.

The tricky part is some of the motion, but even there you can hook up your iPhone with ARKit to Unreal Engine. So much of this has been a democratization of the process, where somebody at home can now put up a realistically rendered talking head. Even five years ago, that would’ve taken us a long time to get to.

Where do you see digital humans evolving next?

Hendler: You’re going to see an explosion of virtual YouTube and Instagram celebrities. We see them already in a single frame, and soon, they will start to move and perform. You’ll have a live actor transforming into an artificial human, creature, or character delivering blogs. That’s the distillation of virtual production in finding this whole new avenue—content delivery.

We’re also starting to see a lot more discussion related to COVID-19 over how we capture people virtually. We’re already doing projects and can actually get a huge amount of the performance from a Zoom call. We’re also building autonomous human agents for more realistic meetings and all that kind of stuff.

What makes this work well is us working together with the actors and the actors understanding this. We’re building a tool for you to deliver your performance. When we do all these things right, and you’re able to perform as a digital character, that’s when it’s incredible.

Digital Domain

Matthias Wittman, VFX Supervisor @ Digital Domain, will also be part of the upcoming Real-Time Conference, Digital Human talks, co-hosted by fxguide’s Mike Seymour, (April 26/27). He will be presenting “Talking to Douglas, Creating an Autonomous Digital Human“. Also presenting will be Marc Petit, General Manager of Unreal Engine at Epic Games.

Digital Domain was also recently honored at the Advanced Imaging Society’s 11th annual awards for technical achievements. Masquerade 2.0, the company’s facial capture system, was recognized for its distinguished technical achievement. Masquerade generates high-quality moving 3D meshes of an actor’s facial performance from a helmet capture system (HMC). This data can then be transformed into a digital character’s face or their digital double, or a completely different digital person. With Masquerade, the actor is free to move around on set, interacting live with other actors to create a more natural performance. The images from the HMC worn by the actors are processed using machine learning into a high quality, per frame, moving mesh that contains the actor’s nuanced performance, complete with wrinkle detail, skin sliding and subtle eye motion, etc. We posted an in-depth story on Masquerade 2.0 in 2020.

Field Guide Vol II

The first volume of the Virtual Production Field Guide was released in July 2019, designed as a foundational roadmap for the industry as the adoption of virtual production techniques was poised to explode. Since then, a number of additional high-profile virtual productions have been completed, with new methodologies developed and tangible lessons ready to share with the industry. The second volume expands upon the first with over 100 pages of all-new content, covering a variety of virtual production workflows including remote collaboration, visualization, in-camera VFX, and animation.

This new volume of the Virtual Production Field Guide was put together by Noah Kadner who wrote the first volume in 2019. It features interviews with directors Jon Favreau and Rick Famuyiwa, Netflix’s Girish Balakrishnan and Christina Lee Storm, VFX supervisor Rob Legato, cinematographer Greig Fraser, Digital Domain’s Darren Hendler, DNEG’s George Murphy, Sony Pictures Imageworks’ Jerome Chen, ILM’s Andrew Jones, Richard Bluff, and Charmaine Chan, and many more.

As the guide comments, what really altered filmmaking and its relationship with virtual production was the worldwide pandemic. “Although the pandemic brought an undeniable level of loss to the world, it has also caused massive changes in how we interact and work together. Many of these changes will be felt in filmmaking forever.” Remote collaboration and using tools from the evolving virtual production toolbox went from a nice-to-have to a must-have for almost all filmmakers.  The Guide examines a variety of workflow scenarios, the impact of COVID-19 on production, and the growing ecosystem of virtual production service providers.

Click here to download the Virtual Production Field Guide as a PDF, or visit Epic’s Virtual Production Hub to learn more about how virtual production and the craft of filmmaking.

Share if you enjoyed this post!

Source link