Computing optical flow is an important part of video understanding. There are many ways to train a model to compute this, but one of the more compelling methods is to:
- Feed a model an image pair
- Have it predict optical flow
- Apply that optical flow to the original image
- Compute a pixel-wise loss against the second image.
In order to use this algorithm, however, you need a differentiable way to do step (3), typically called an “image warp”. Tensorflow has just such an operation in contrib, but to my knowledge Pytorch does not.
After digging around for awhile today, I found what I needed in one of nVidia’s open source repositories:
https://github.com/NVIDIA/flownet2-pytorch
In this repository, the author has implemented a new CUDA primitive called “resample2d“. Although there isn’t any documentation on this operation, it is exactly what is needed to compute an image warp given an optical flow vector.
Suppose you have an image and a .flo file, which you can find from several places. Here is how you would use this operation to compute the secondary image:
from utils.flow_utils import readFlow, writeFlow, visulize_flow_file
from networks.resample2d_package.resample2d import Resample2d
im1 = load_image('image.png').to('cuda')
flow = torch.tensor(readFlow('flow.flo')).permute(2,0,1).unsqueeze(0).contiguous().to('cuda')
resample = Resample2d()
synth = resample(im2, flow)
torchvision.utils.save_image(synth, "flowed_image.png")
You’ll need to import the code from the above linked repository to run this. Note that resample2d must be performed on the GPU. It does not work on CPU and just returns all zeros.