API Reference¶

mmedit.core¶

class mmedit.core.DistEvalIterHook(dataloader, interval=1, gpu_collect=False, **eval_kwargs)[source]¶

Distributed evaluation hook.

Parameters

dataloader (DataLoader) – A PyTorch dataloader.
interval (int) – Evaluation interval. Default: 1.
tmpdir (str | None) – Temporary directory to save the results of all processes. Default: None.
gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.
eval_kwargs (dict) – Other eval kwargs. It may contain: save_image (bool): Whether save image. save_path (str): The path to save image.

after_train_iter(runner)[source]¶

The behavior after each train iteration.

Parameters: runner (mmcv.runner.BaseRunner) – The runner.

class mmedit.core.EvalIterHook(dataloader, interval=1, **eval_kwargs)[source]¶

Non-Distributed evaluation hook for iteration-based runner.

This hook will regularly perform evaluation in a given interval when performing in non-distributed environment.

Parameters

dataloader (DataLoader) – A PyTorch dataloader.
interval (int) – Evaluation interval. Default: 1.
eval_kwargs (dict) – Other eval kwargs. It contains: save_image (bool): Whether to save image. save_path (str): The path to save image.

after_train_iter(runner)[source]¶

The behavior after each train iteration.

Parameters: runner (mmcv.runner.BaseRunner) – The runner.

evaluate(runner, results)[source]¶

Evaluation function.

Parameters

runner (mmcv.runner.BaseRunner) – The runner.
results (dict) – Model forward results.

class mmedit.core.L1Evaluation[source]¶

L1 evaluation metric.

Parameters: data_dict (dict) – Must contain keys of ‘gt_img’ and ‘fake_res’. If ‘mask’ is given, the results will be computed with mask as weight.

class mmedit.core.LinearLrUpdaterHook(target_lr=0, start=0, interval=1, **kwargs)[source]¶

Linear learning rate scheduler for image generation.

In the beginning, the learning rate is ‘base_lr’ defined in mmcv. We give a target learning rate ‘target_lr’ and a start point ‘start’ (iteration / epoch). Before ‘start’, we fix learning rate as ‘base_lr’; After ‘start’, we linearly update learning rate to ‘target_lr’.

Parameters

target_lr (float) – The target learning rate. Default: 0.
start (int) – The start point (iteration / epoch, specified by args ‘by_epoch’ in its parent class in mmcv) to update learning rate. Default: 0.
interval (int) – The interval to update the learning rate. Default: 1.

get_lr(runner, base_lr)[source]¶

Calculates the learning rate.

Parameters

runner (object) – The passed runner.
base_lr (float) – Base learning rate.

Returns

Current learning rate.

Return type

float

class mmedit.core.VisualizationHook(output_dir, res_name_list, interval=- 1, filename_tmpl='iter_{}.png', rerange=True, bgr2rgb=True, nrow=1, padding=4)[source]¶

Visualization hook.

In this hook, we use the official api save_image in torchvision to save the visualization results.

Parameters

output_dir (str) – The file path to store visualizations.
res_name_list (str) – The list contains the name of results in outputs dict. The results in outputs dict must be a torch.Tensor with shape (n, c, h, w).
interval (int) – The interval of calling this hook. If set to -1, the visualization hook will not be called. Default: -1.
filename_tmpl (str) – Format string used to save images. The output file name will be formatted as this args. Default: ‘iter_{}.png’.
rerange (bool) – Whether to rerange the output value from [-1, 1] to [0, 1]. We highly recommend users should preprocess the visualization results on their own. Here, we just provide a simple interface. Default: True.
bgr2rgb (bool) – Whether to reformat the channel dimension from BGR to RGB. The final image we will save is following RGB style. Default: True.
nrow (int) – The number of samples in a row. Default: 1.
padding (int) – The number of padding pixels between each samples. Default: 4.

after_train_iter(runner)[source]¶

The behavior after each train iteration.

Parameters: runner (object) – The runner.

mmedit.core.build_optimizers(model, cfgs)[source]¶

Build multiple optimizers from configs.

If cfgs contains several dicts for optimizers, then a dict for each constructed optimizers will be returned. If cfgs only contains one optimizer config, the constructed optimizer itself will be returned.

For example,

Multiple optimizer configs:

optimizer_cfg = dict(
    model1=dict(type='SGD', lr=lr),
    model2=dict(type='SGD', lr=lr))

The return dict is dict('model1': torch.optim.Optimizer, 'model2': torch.optim.Optimizer)

Single optimizer config:

optimizer_cfg = dict(type='SGD', lr=lr)

The return is torch.optim.Optimizer.

Parameters

model (nn.Module) – The model with parameters to be optimized.
cfgs (dict) – The config dict of the optimizer.

Returns

The initialized optimizers.

Return type

dict[torch.optim.Optimizer] | torch.optim.Optimizer

mmedit.core.psnr(img1, img2, crop_border=0, input_order='HWC')[source]¶

Calculate PSNR (Peak Signal-to-Noise Ratio).

Ref: https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio

Parameters

img1 (ndarray) – Images with range [0, 255].
img2 (ndarray) – Images with range [0, 255].
crop_border (int) – Cropped pixels in each edges of an image. These pixels are not involved in the PSNR calculation. Default: 0.
input_order (str) – Whether the input order is ‘HWC’ or ‘CHW’. Default: ‘HWC’.

Returns

psnr result.

Return type

float

mmedit.core.reorder_image(img, input_order='HWC')[source]¶

Reorder images to ‘HWC’ order.

If the input_order is (h, w), return (h, w, 1); If the input_order is (c, h, w), return (h, w, c); If the input_order is (h, w, c), return as it is.

Parameters

img (ndarray) – Input image.
input_order (str) – Whether the input order is ‘HWC’ or ‘CHW’. If the input image shape is (h, w), input_order will not have effects. Default: ‘HWC’.

Returns

reordered image.

Return type

ndarray

mmedit.core.ssim(img1, img2, crop_border=0, input_order='HWC')[source]¶

Calculate SSIM (structural similarity).

Ref: Image quality assessment: From error visibility to structural similarity

The results are the same as that of the official released MATLAB code in https://ece.uwaterloo.ca/~z70wang/research/ssim/.

For three-channel images, SSIM is calculated for each channel and then averaged.

Parameters

img1 (ndarray) – Images with range [0, 255].
img2 (ndarray) – Images with range [0, 255].
crop_border (int) – Cropped pixels in each edges of an image. These pixels are not involved in the SSIM calculation. Default: 0.
input_order (str) – Whether the input order is ‘HWC’ or ‘CHW’. Default: ‘HWC’.

Returns

ssim result.

Return type

float

mmedit.core.tensor2img(tensor, out_type=<class 'numpy.uint8'>, min_max=(0, 1))[source]¶

Convert torch Tensors into image numpy arrays.

After clamping to (min, max), image values will be normalized to [0, 1].

For differnet tensor shapes, this function will have different behaviors:

4D mini-batch Tensor of shape (N x 3/1 x H x W):
Use make_grid to stitch images in the batch dimension, and then convert it to numpy array.

3D Tensor of shape (3/1 x H x W) and 2D Tensor of shape (H x W):
Directly change to numpy array.

Note that the image channel in input tensors should be RGB order. This function will convert it to cv2 convention, i.e., (H x W x C) with BGR order.

Parameters

tensor (Tensor | list[Tensor]) – Input tensors.
out_type (numpy type) – Output types. If np.uint8, transform outputs to uint8 type with range [0, 255]; otherwise, float type with range [0, 1]. Default: np.uint8.
min_max (tuple) – min and max values for clamp.

Returns

3D ndarray of shape (H x W x C) or 2D ndarray of shape (H x W).

Return type

(Tensor | list[Tensor])

mmedit.datasets¶

datasets¶

class mmedit.datasets.AdobeComp1kDataset(ann_file, pipeline, data_prefix=None, test_mode=False)[source]¶

Adobe composition-1k dataset.

The dataset loads (alpha, fg, bg) data and apply specified transforms to the data. You could specify whether composite merged image online or load composited merged image in pipeline.

Example for online comp-1k dataset:

[
    {
        "alpha": 'alpha/000.png',
        "fg": 'fg/000.png',
        "bg": 'bg/000.png'
    },
    {
        "alpha": 'alpha/001.png',
        "fg": 'fg/001.png',
        "bg": 'bg/001.png'
    },
]

Example for offline comp-1k dataset:

[
    {
        "alpha": 'alpha/000.png',
        "merged": 'merged/000.png',
        "fg": 'fg/000.png',
        "bg": 'bg/000.png'
    },
    {
        "alpha": 'alpha/001.png',
        "merged": 'merged/001.png',
        "fg": 'fg/001.png',
        "bg": 'bg/001.png'
    },
]

load_annotations()[source]¶

Load annoations for Adobe Composition-1k dataset.

It loads image paths from json file.

Returns: Loaded dict.
Return type: dict

class mmedit.datasets.BaseDataset(pipeline, test_mode=False)[source]¶

Base class for datasets.

All datasets should subclass it. All subclasses should overwrite:

load_annotations, supporting to load information and generate image lists.

Parameters

pipeline (list[dict | callable]) – A sequence of data transforms.
test_mode (bool) – If True, the dataset will work in test mode. Otherwise, in train mode.

abstract load_annotations()[source]¶

Abstract function for loading annotation.

All subclasses should overwrite this function

prepare_test_data(idx)[source]¶

Prepare testing data.

Parameters: idx (int) – Index for getting each testing batch.
Returns: Returned testing batch.
Return type: Tensor

prepare_train_data(idx)[source]¶

Prepare training data.

Parameters: idx (int) – Index of the training batch data.
Returns: Returned training batch.
Return type: dict

class mmedit.datasets.BaseGenerationDataset(pipeline, test_mode=False)[source]¶

Base class for generation datasets.

evaluate(results, logger=None)[source]¶

Evaluating with saving generated images. (needs no metrics)

Parameters: results (list[tuple]) – The output of forward_test() of the model.
Returns: Evaluation results dict.
Return type: dict

static scan_folder(path)[source]¶

Obtain image path list (including sub-folders) from a given folder.

Parameters: path (str | Path) – Folder path.
Returns: Image list obtained from the given folder.
Return type: list[str]

class mmedit.datasets.BaseMattingDataset(ann_file, pipeline, data_prefix=None, test_mode=False)[source]¶

Base image matting dataset.

evaluate(results, logger=None)[source]¶

Evaluating with different metrics.

Parameters: results (list[tuple]) – The output of forward_test() of the model.
Returns: Evaluation results dict.
Return type: dict

class mmedit.datasets.BaseSRDataset(pipeline, scale, test_mode=False)[source]¶

Base class for super resolution datasets.

evaluate(results, logger=None)[source]¶

Evaluate with different metrics.

Parameters: results (list[tuple]) – The output of forward_test() of the model.
Returns: Evaluation results dict.
Return type: dict

static scan_folder(path)[source]¶

Obtain image path list (including sub-folders) from a given folder.

Parameters: path (str | Path) – Folder path.
Returns: image list obtained form given folder.
Return type: list[str]

class mmedit.datasets.GenerationPairedDataset(dataroot, pipeline, test_mode=False)[source]¶

General paired image folder dataset for image generation.

It assumes that the training directory is ‘/path/to/data/train’. During test time, the directory is ‘/path/to/data/test’. ‘/path/to/data’ can be initialized by args ‘dataroot’. Each sample contains a pair of images concatenated in the w dimension (A|B).

Parameters

dataroot (str | Path) – Path to the folder root of paired images.
pipeline (List[dict | callable]) – A sequence of data transformations.
test_mode (bool) – Store True when building test dataset. Default: False.

load_annotations()[source]¶

Load paired image paths.

Returns: List that contains paired image paths.
Return type: list[dict]

class mmedit.datasets.GenerationUnpairedDataset(dataroot, pipeline, test_mode=False)[source]¶

General unpaired image folder dataset for image generation.

It assumes that the training directory of images from domain A is ‘/path/to/data/trainA’, and that from domain B is ‘/path/to/data/trainB’, respectively. ‘/path/to/data’ can be initialized by args ‘dataroot’. During test time, the directory is ‘/path/to/data/testA’ and ‘/path/to/data/testB’, respectively.

Parameters

dataroot (str | Path) – Path to the folder root of unpaired images.
pipeline (List[dict | callable]) – A sequence of data transformations.
test_mode (bool) – Store True when building test dataset. Default: False.

load_annotations(dataroot)[source]¶

Load unpaired image paths of one domain.

Parameters: dataroot (str) – Path to the folder root for unpaired images of one domain.
Returns: List that contains unpaired image paths of one domain.
Return type: list[dict]

prepare_test_data(idx)[source]¶

Prepare unpaired test data.

Parameters: idx (int) – Index of current batch.
Returns: Prepared test data batch.
Return type: list[dict]

prepare_train_data(idx)[source]¶

Prepare unpaired training data.

Parameters: idx (int) – Index of current batch.
Returns: Prepared training data batch.
Return type: dict

class mmedit.datasets.ImgInpaintingDataset(ann_file, pipeline, data_prefix=None, test_mode=False)[source]¶

Image dataset for inpainting.

load_annotations()[source]¶

Load annotations for dataset.

Returns: Contain dataset annotations.
Return type: list[dict]

class mmedit.datasets.RepeatDataset(dataset, times)[source]¶

A wrapper of repeated dataset.

The length of repeated dataset will be times larger than the original dataset. This is useful when the data loading time is long but the dataset is small. Using RepeatDataset can reduce the data loading time between epochs.

Parameters

dataset (Dataset) – The dataset to be repeated.
times (int) – Repeat times.

class mmedit.datasets.SRAnnotationDataset(lq_folder, gt_folder, ann_file, pipeline, scale, data_prefix=None, test_mode=False, filename_tmpl='{}')[source]¶

General paired image dataset with an annotation file for image restoration.

The dataset loads lq (Low Quality) and gt (Ground-Truth) image pairs, applies specified transforms and finally returns a dict containing paired data and other information.

This is the “annotation file mode”: Each line in the annotation file contains the image names and image shape (usually for gt), separated by a white space.

Example of an annotation file:

0001_s001.png (480,480,3)
0001_s002.png (480,480,3)

Parameters

lq_folder (str | Path) – Path to a lq folder.
gt_folder (str | Path) – Path to a gt folder.
ann_file (str | Path) – Path to the annotation file.
pipeline (list[dict | callable]) – A sequence of data transformations.
scale (int) – Upsampling scale ratio.
test_mode (bool) – Store True when building test dataset. Default: False.
filename_tmpl (str) – Template for each filename. Note that the template excludes the file extension. Default: ‘{}’.

load_annotations()[source]¶

Load annoations for SR dataset.

It loads the LQ and GT image path from the annotation file. Each line in the annotation file contains the image names and image shape (usually for gt), separated by a white space.

Returns: Returned dict for LQ and GT pairs.
Return type: dict

class mmedit.datasets.SRFolderDataset(lq_folder, gt_folder, pipeline, scale, test_mode=False, filename_tmpl='{}')[source]¶

General paired image folder dataset for image restoration.

The dataset loads lq (Low Quality) and gt (Ground-Truth) image pairs, applies specified transforms and finally returns a dict containing paired data and other information.

This is the “folder mode”, which needs to specify the lq folder path and gt folder path, each folder containing the corresponding images. Image lists will be generated automatically. You can also specify the filename template to match the lq and gt pairs.

For example, we have two folders with the following structures:

data_root
├── lq
│   ├── 0001_x4.png
│   ├── 0002_x4.png
├── gt
│   ├── 0001.png
│   ├── 0002.png

then, you need to set:

lq_folder = data_root/lq
gt_folder = data_root/gt
filename_tmpl = '{}_x4'

Parameters

lq_folder (str | Path) – Path to a lq folder.
gt_folder (str | Path) – Path to a gt folder.
pipeline (List[dict | callable]) – A sequence of data transformations.
scale (int) – Upsampling scale ratio.
test_mode (bool) – Store True when building test dataset. Default: False.
filename_tmpl (str) – Template for each filename. Note that the template excludes the file extension. Default: ‘{}’.

load_annotations()[source]¶

Load annoations for SR dataset.

It loads the LQ and GT image path from folders.

Returns: Returned dict for LQ and GT pairs.
Return type: dict

class mmedit.datasets.SRLmdbDataset(lq_folder, gt_folder, pipeline, scale, test_mode=False)[source]¶

General paired image lmdb dataset for image restoration.

The dataset loads lq (Low Quality) and gt (Ground-Truth) image pairs, applies specified transforms and finally returns a dict containing paired data and other information.

This is the “lmdb mode”. In order to speed up IO, you are recommended to use lmdb. First, you need to make lmdb files. Suppose the lmdb files are path_to_lq/lq.lmdb and path_to_gt/gt.lmdb, then you can just set:

lq_folder = path_to_lq/lq.lmdb
gt_folder = path_to_gt/gt.lmdb

Contents of lmdb. Taking the lq.lmdb for example, the file structure is:

lq.lmdb
├── data.mdb
├── lock.mdb
├── meta_info.txt

The data.mdb and lock.mdb are standard lmdb files and you can refer to https://lmdb.readthedocs.io/en/release/ for more details.

The meta_info.txt is a specified txt file to record the meta information of our datasets. It will be automatically created when preparing datasets by our provided dataset tools. Each line in the txt file records

image name (with extension);

image shape;

compression level, separated by a white space.

For example, the meta information of the lq.lmdb is: baboon.png (120,125,3) 1, which means: 1) image name (with extension): baboon.png; 2) image shape: (120,125,3); and 3) compression level: 1

We use the image name without extension as the lmdb key. Note that we use the same key for the corresponding lq and gt images.

Parameters

lq_folder (str | Path) – Path to a lq lmdb file.
gt_folder (str | Path) – Path to a gt lmdb file.
pipeline (list[dict | callable]) – A sequence of data transformations.
scale (int) – Upsampling scale ratio.
test_mode (bool) – Store True when building test dataset. Default: False.

load_annotations()[source]¶

Load annoations for SR dataset.

It loads the LQ and GT image path from the meta_info.txt in the LMDB files.

Returns: Returned dict for LQ and GT pairs.
Return type: dict

class mmedit.datasets.SRREDSDataset(lq_folder, gt_folder, ann_file, num_input_frames, pipeline, scale, val_partition='official', test_mode=False)[source]¶

REDS dataset for video super resolution.

The dataset loads several LQ (Low-Quality) frames and a center GT (Ground-Truth) frame. Then it applies specified transforms and finally returns a dict containing paired data and other information.

It reads REDS keys from the txt file. Each line contains: 1. image name; 2, image shape, seperated by a white space. Examples:

000/00000000.png (720, 1280, 3)
000/00000001.png (720, 1280, 3)

Parameters

lq_folder (str | Path) – Path to a lq folder.
gt_folder (str | Path) – Path to a gt folder.
ann_file (str | Path) – Path to the annotation file.
num_input_frames (int) – Window size for input frames.
pipeline (list[dict | callable]) – A sequence of data transformations.
scale (int) – Upsampling scale ratio.
val_partition (str) – Validation partition mode. Choices [‘official’ or
Default ('REDS4']) – ‘official’.
test_mode (bool) – Store True when building test dataset. Default: False.

load_annotations()[source]¶

Load annoations for REDS dataset.

Returns: Returned dict for LQ and GT pairs.
Return type: dict

class mmedit.datasets.SRVid4Dataset(lq_folder, gt_folder, ann_file, num_input_frames, pipeline, scale, filename_tmpl='{:08d}', test_mode=False)[source]¶

Vid4 dataset for video super resolution.

The dataset loads several LQ (Low-Quality) frames and a center GT (Ground-Truth) frame. Then it applies specified transforms and finally returns a dict containing paired data and other information.

It reads Vid4 keys from the txt file. Each line contains:

folder name;

number of frames in this clip (in the same folder);

image shape, seperated by a white space.

Examples:

calendar 40 (320,480,3)
city 34 (320,480,3)

Parameters

lq_folder (str | Path) – Path to a lq folder.
gt_folder (str | Path) – Path to a gt folder.
ann_file (str | Path) – Path to the annotation file.
num_input_frames (int) – Window size for input frames.
pipeline (list[dict | callable]) – A sequence of data transformations.
scale (int) – Upsampling scale ratio.
filename_tmpl (str) – Template for each filename. Note that the template excludes the file extension. Default: ‘{:08d}’.
test_mode (bool) – Store True when building test dataset. Default: False.

load_annotations()[source]¶

Load annoations for Vid4 dataset.

Returns: Returned dict for LQ and GT pairs.
Return type: dict

class mmedit.datasets.SRVimeo90KDataset(lq_folder, gt_folder, ann_file, num_input_frames, pipeline, scale, test_mode=False)[source]¶

Vimeo90K dataset for video super resolution.

The dataset loads several LQ (Low-Quality) frames and a center GT (Ground-Truth) frame. Then it applies specified transforms and finally returns a dict containing paired data and other information.

It reads Vimeo90K keys from the txt file. Each line contains: 1. image name; 2, image shape, seperated by a white space. Examples:

00001/0266 (256, 448, 3)
00001/0268 (256, 448, 3)

Parameters

lq_folder (str | Path) – Path to a lq folder.
gt_folder (str | Path) – Path to a gt folder.
ann_file (str | Path) – Path to the annotation file.
num_input_frames (int) – Window size for input frames.
pipeline (list[dict | callable]) – A sequence of data transformations.
scale (int) – Upsampling scale ratio.
test_mode (bool) – Store True when building test dataset. Default: False.

load_annotations()[source]¶

Load annoations for VimeoK dataset.

Returns: Returned dict for LQ and GT pairs.
Return type: dict

mmedit.datasets.build_dataloader(dataset, samples_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, seed=None, drop_last=False, pin_memory=True, **kwargs)[source]¶

Build PyTorch DataLoader.

In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.

Parameters

dataset (Dataset) – A PyTorch dataset.
samples_per_gpu (int) – Number of samples on each GPU, i.e., batch size of each GPU.
workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
num_gpus (int) – Number of GPUs. Only used in non-distributed training. Default: 1.
dist (bool) – Distributed training/test or not. Default: True.
shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.
seed (int | None) – Seed to be used. Default: None.
drop_last (bool) – Whether to drop the last incomplete batch in epoch. Default: False
pin_memory (bool) – Whether to use pin_memory in DataLoader. Default: True
kwargs (dict, optional) – Any keyword argument to be used to initialize DataLoader.

Returns

A PyTorch dataloader.

Return type

DataLoader

mmedit.datasets.build_dataset(cfg, default_args=None)[source]¶

Build a dataset from config dict.

It supports a variety of dataset config. If cfg is a Sequential (list or dict), it will be a concatenated dataset of the datasets specified by the Sequential. If it is a RepeatDataset, then it will repeat the dataset cfg['dataset'] for cfg['times'] times. If the ann_file of the dataset is a Sequential, then it will build a concatenated dataset with the same dataset type but different ann_file.

Parameters

cfg (dict) – Config dict. It should at least contain the key “type”.
default_args (dict, optional) – Default initialization arguments. Default: None.

Returns

The constructed dataset.

Return type

Dataset

pipelines¶

class mmedit.datasets.pipelines.BinarizeImage(keys, binary_thr, to_int=False)[source]¶

Binarize image.

Parameters

keys (Sequence[str]) – The images to be binarized.
binary_thr (float) – Threshold for binarization.
to_int (bool) – If True, return image as int32, otherwise return image as float32.

class mmedit.datasets.pipelines.Collect(keys, meta_keys=None)[source]¶

Collect data from the loader relevant to the specific task.

This is usually the last stage of the data loader pipeline. Typically keys is set to some subset of “img”, “gt_labels”.

The “img_meta” item is always populated. The contents of the “meta” dictionary depends on “meta_keys”.

Parameters

keys (Sequence[str]) – Required keys to be collected.
meta_keys (Sequence[str]) – Required keys to be collected to “meta”. Default: None.

class mmedit.datasets.pipelines.Compose(transforms)[source]¶

Compose a data pipeline with a sequence of transforms.

Parameters: transforms (list[dict | callable]) – Either config dicts of transforms or transform objects.

class mmedit.datasets.pipelines.CompositeFg(fg_dirs, alpha_dirs, interpolation='nearest')[source]¶

Composite foreground with a random foreground.

This class composites the current training sample with additional data randomly (could be from the same dataset). With probability 0.5, the sample will be composited with a random sample from the specified directory. The composition is performed as:

\[ \begin{align}\begin{aligned}fg_{new} = \alpha_1 * fg_1 + (1 - \alpha_1) * fg_2\\\alpha_{new} = 1 - (1 - \alpha_1) * (1 - \alpha_2)\end{aligned}\end{align} \]

where $(fg_1, \alpha_1)$ is from the current sample and $(fg_2, \alpha_2)$ is the randomly loaded sample. With the above composition, $\alpha_{new}$ is still in [0, 1].

Required keys are “alpha” and “fg”. Modified keys are “alpha” and “fg”.

Parameters

fg_dirs (str | list[str]) – Path of directories to load foreground images from.
alpha_dirs (str | list[str]) – Path of directories to load alpha mattes from.
interpolation (str) – Interpolation method of mmcv.imresize to resize the randomly loaded images.

class mmedit.datasets.pipelines.Crop(keys, crop_size, random_crop=True)[source]¶

Crop data to specific size for training.

Parameters

keys (Sequence[str]) – The images to be cropped.
crop_size (Tuple[int]) – Target spatial size (h, w).
random_crop (bool) – If set to True, it will random crop image. Otherwise, it will work as center crop.

class mmedit.datasets.pipelines.CropAroundCenter(crop_size)[source]¶

Randomly crop the images around unknown area in the center 1/4 images.

This cropping strategy is adopted in GCA matting. The unknown area is the same as semi-transparent area. https://arxiv.org/pdf/2001.04069.pdf

It retains the center 1/4 images and resizes the images to ‘crop_size’. Required keys are “fg”, “bg”, “trimap” and “alpha”, added or modified keys are “crop_bbox”, “fg”, “bg”, “trimap” and “alpha”.

Parameters: crop_size (int | tuple) – Desired output size. If int, square crop is applied.

class mmedit.datasets.pipelines.CropAroundFg(keys, bd_ratio_range=(0.1, 0.4), test_mode=False)[source]¶

Crop around the whole foreground in the segmentation mask.

Required keys are “seg” and the keys in argument keys. Meanwhile, “seg” must be in argument keys. Added or modified keys are “crop_bbox” and the keys in argument keys.

Parameters

keys (Sequence[str]) – The images to be cropped. It must contain ‘seg’.
bd_ratio_range (tuple, optional) – The range of the boundary (bd) ratio to select from. The boundary ratio is the ratio of the boundary to the minimal bbox that contains the whole foreground given by segmentation. Default to (0.1, 0.4).
test_mode (bool) – Whether use test mode. In test mode, the tight crop area of foreground will be extended to the a square. Default to False.

class mmedit.datasets.pipelines.CropAroundUnknown(keys, crop_sizes, unknown_source='alpha', interpolations='bilinear')[source]¶

Crop around unknown area with a randomly selected scale.

Randomly select the w and h from a list of (w, h). Required keys are the keys in argument keys, added or modified keys are “crop_bbox” and the keys in argument keys. This class assumes value of “alpha” ranges from 0 to 255.

Parameters

keys (Sequence[str]) – The images to be cropped. It must contain ‘alpha’. If unknown_source is set to ‘trimap’, then it must also contain ‘trimap’.
crop_sizes (list[int | tuple[int]]) – List of (w, h) to be selected.
unknown_source (str, optional) – Unknown area to select from. It must be ‘alpha’ or ‘tirmap’. Default to ‘alpha’.
interpolations (str | list[str], optional) – Interpolation method of mmcv.imresize. The interpolation operation will be applied when image size is smaller than the crop_size. If given as a list of str, it should have the same length as keys. Or if given as a str all the keys will be resized with the same method. Default to ‘bilinear’.

class mmedit.datasets.pipelines.FixedCrop(keys, crop_size, crop_pos=None)[source]¶

Crop paired data (at a specific position) to specific size for training.

Parameters

keys (Sequence[str]) – The images to be cropped.
crop_size (Tuple[int]) – Target spatial size (h, w).
crop_pos (Tuple[int]) – Specific position (x, y). If set to None, random initialize the position to crop paired data batch.

class mmedit.datasets.pipelines.Flip(keys, flip_ratio=0.5, direction='horizontal')[source]¶

Flip the input data with a probability.

Reverse the order of elements in the given data with a specific direction. The shape of the data is preserved, but the elements are reordered. Required keys are the keys in attributes “keys”, added or modified keys are “flip”, “flip_direction” and the keys in attributes “keys”. It also supports flipping a list of images with the same flip.

Parameters

keys (list[str]) – The images to be flipped.
flip_ratio (float) – The propability to flip the images.
direction (str) – Flip images horizontally or vertically. Options are “horizontal” | “vertical”. Default: “horizontal”.

class mmedit.datasets.pipelines.FormatTrimap(to_onehot=False)[source]¶

Convert trimap (tensor) to one-hot representation.

It transforms the trimap label from (0, 128, 255) to (0, 1, 2). If to_onehot is set to True, the trimap will convert to one-hot tensor of shape (3, H, W). Required key is “trimap”, added or modified key are “trimap” and “to_onehot”.

Parameters: to_onehot (bool) – whether convert trimap to one-hot tensor. Default: False.

class mmedit.datasets.pipelines.GenerateFrameIndices(interval_list, frames_per_clip=99)[source]¶

Generate frame index for REDS datasets. It also performs temporal augmention with random interval.

Required keys: lq_path, gt_path, key, num_input_frames Added or modified keys: lq_path, gt_path, interval, reverse

Parameters

interval_list (list[int]) – Interval list for temporal augmentation. It will randomly pick an interval from interval_list and sample frame index with the interval.
frames_per_clip (int) – Number of frames per clips. Default: 99 for REDS dataset.

class mmedit.datasets.pipelines.GenerateFrameIndiceswithPadding(padding, filename_tmpl='{:08d}')[source]¶

Generate frame index with padding for REDS dataset and Vid4 dataset during testing.

Required keys: lq_path, gt_path, key, num_input_frames, max_frame_num Added or modified keys: lq_path, gt_path

Parameters

padding –

padding mode, one of ‘replicate’ | ‘reflection’ | ‘reflection_circle’ | ‘circle’.

Examples: current_idx = 0, num_input_frames = 5 The generated frame indices under different padding mode:

replicate: [0, 0, 0, 1, 2] reflection: [2, 1, 0, 1, 2] reflection_circle: [4, 3, 0, 1, 2] circle: [3, 4, 0, 1, 2]

class mmedit.datasets.pipelines.GenerateSeg(kernel_size=5, erode_iter_range=(10, 20), dilate_iter_range=(15, 30), num_holes_range=(0, 3), hole_sizes=[(15, 15), (25, 25), (35, 35), (45, 45)], blur_ksizes=[(21, 21), (31, 31), (41, 41)])[source]¶

Generate segmentation mask from alpha matte.

Parameters

kernel_size (int, optional) – Kernel size for both erosion and dilation. The kernel will have the same height and width. Defaults to 5.
erode_iter_range (tuple, optional) – Iteration of erosion. Defaults to (10, 20).
dilate_iter_range (tuple, optional) – Iteration of dilation. Defaults to (15, 30).
num_holes_range (tuple, optional) – Range of number of holes to randomly select from. Defaults to (0, 3).
hole_sizes (list, optional) – List of (h, w) to be selected as the size of the rectangle hole. Defaults to [(15, 15), (25, 25), (35, 35), (45, 45)].
blur_ksizes (list, optional) – List of (h, w) to be selected as the kernel_size of the gaussian blur. Defaults to [(21, 21), (31, 31), (41, 41)].

class mmedit.datasets.pipelines.GenerateSoftSeg(fg_thr=0.2, border_width=25, erode_ksize=3, dilate_ksize=5, erode_iter_range=(10, 20), dilate_iter_range=(3, 7), blur_ksizes=[(21, 21), (31, 31), (41, 41)])[source]¶

Generate soft segmentation mask from input segmentation mask.

Required key is “seg”, added key is “soft_seg”.

Parameters

fg_thr (float, optional) – Threhold of the foreground in the normalized input segmentation mask. Defaults to 0.2.
border_width (int, optional) – Width of border to be padded to the bottom of the mask. Defaults to 25.
erode_ksize (int, optional) – Fixed kernel size of the erosion. Defaults to 5.
dilate_ksize (int, optional) – Fixed kernel size of the dilation. Defaults to 5.
erode_iter_range (tuple, optional) – Iteration of erosion. Defaults to (10, 20).
dilate_iter_range (tuple, optional) – Iteration of dilation. Defaults to (3, 7).
blur_ksizes (list, optional) – List of (h, w) to be selected as the kernel_size of the gaussian blur. Defaults to [(21, 21), (31, 31), (41, 41)].

class mmedit.datasets.pipelines.GenerateTrimap(kernel_size, iterations=1, random=True)[source]¶

Using random erode/dilate to generate trimap from alpha matte.

Required key is “alpha”, added key is “trimap”.

Parameters

kernel_size (int | tuple[int]) – The range of random kernel_size of erode/dilate; int indicates a fixed kernel_size. If random is set to False and kernel_size is a tuple of length 2, then it will be interpreted as (erode kernel_size, dilate kernel_size). It should be noted that the kernel of the erosion and dilation has the same height and width.
iterations (int | tuple[int], optional) – The range of random iterations of erode/dilate; int indicates a fixed iterations. If random is set to False and iterations is a tuple of length 2, then it will be interpreted as (erode iterations, dilate iterations). Default to 1.
random (bool, optional) – Whether use random kernel_size and iterations when generating trimap. See kernel_size and iterations for more information.

class mmedit.datasets.pipelines.GenerateTrimapWithDistTransform(dist_thr=20, random=True)[source]¶

Generate trimap with distance transform function.

Parameters

dist_thr (int, optional) – Distance threshold. Area with alpha value between (0, 255) will be considered as initial unknown area. Then area with distance to unknown area smaller than the distance threshold will also be consider as unknown area. Defaults to 20.
random (bool, optional) – If True, use random distance threshold from [1, dist_thr). If False, use dist_thr as the distance threshold directly. Defaults to True.

class mmedit.datasets.pipelines.GetMaskedImage(img_name='gt_img', mask_name='mask')[source]¶

Get masked image.

Parameters

img_name (str) – Key for clean image.
mask_name (str) – Key for mask image. The mask shape should be (h, w, 1) while ‘1’ indicate holes and ‘0’ indicate valid regions.

class mmedit.datasets.pipelines.GetSpatialDiscountMask(gamma=0.99, beta=1.5)[source]¶

Get spatial discounting mask constant.

Spatial discounting mask is first introduced in: Generative Image Inpainting with Contextual Attention.

Parameters

gamma (float, optional) – Gamma for computing spatial discounting. Defaults to 0.99.
beta (float, optional) – Beta for computing spatial discounting. Defaults to 1.5.

spatial_discount_mask(mask_width, mask_height)[source]¶

Generate spatial discounting mask constant.

Parameters

mask_width (int) – The width of bbox hole.
mask_height (int) – The height of bbox height.

Returns

Spatial discounting mask.

Return type

np.ndarray

class mmedit.datasets.pipelines.ImageToTensor(keys, to_float32=True)[source]¶

Convert image type to torch.Tensor type.

Parameters

keys (Sequence[str]) – Required keys to be converted.
to_float32 (bool) – Whether convert numpy image array to np.float32 before converted to tensor. Default: True.

class mmedit.datasets.pipelines.LoadImageFromFile(io_backend='disk', key='gt', flag='color', channel_order='bgr', save_original_img=False, **kwargs)[source]¶

Load image from file.

Parameters

io_backend (str) – io backend where images are store. Default: ‘disk’.
key (str) – Keys in results to find corresponding path. Default: ‘gt’.
flag (str) – Loading flag for images. Default: ‘color’.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘bgr’.
save_original_img (bool) – If True, maintain a copy of the image in results dict with name of f’ori_{key}’. Default: False.
kwargs (dict) – Args for file client.

class mmedit.datasets.pipelines.LoadImageFromFileList(io_backend='disk', key='gt', flag='color', channel_order='bgr', save_original_img=False, **kwargs)[source]¶

Load image from file list.

It accepts a list of path and read each frame from each path. A list of frames will be returned.

Parameters

io_backend (str) – io backend where images are store. Default: ‘disk’.
key (str) – Keys in results to find corresponding path. Default: ‘gt’.
flag (str) – Loading flag for images. Default: ‘color’.
save_original_img (bool) – If True, maintain a copy of the image in results dict with name of f’ori_{key}’. Default: False.
kwargs (dict) – Args for file client.

class mmedit.datasets.pipelines.LoadMask(mask_mode='bbox', mask_config=None)[source]¶

Load Mask for multiple types.

For different types of mask, users need to provide the corresponding config dict.

Example config for bbox:

config = dict(img_shape=(256, 256), max_bbox_shape=128)

Example config for irregular:

config = dict(
    img_shape=(256, 256),
    num_vertexes=(4, 12),
    max_angle=4.,
    length_range=(10, 100),
    brush_width=(10, 40),
    area_ratio_range=(0.15, 0.5))

Example config for ff:

config = dict(
    img_shape=(256, 256),
    num_vertexes=(4, 12),
    mean_angle=1.2,
    angle_range=0.4,
    brush_width=(12, 40))

Example config for set:

config = dict(
    mask_list_file='xxx/xxx/ooxx.txt',
    prefix='/xxx/xxx/ooxx/',
    io_backend='disk',
    flag='unchanged',
    file_client_kwargs=dict()
)

The mask_list_file contains the list of mask file name like this:
    test1.jpeg
    test2.jpeg
    ...
    ...

The prefix gives the data path.

Parameters

mask_mode (str) – Mask mode in [‘bbox’, ‘irregular’, ‘ff’, ‘set’, ‘file’]. * bbox: square bounding box masks. * irregular: irregular holes. * ff: free-form holes from DeepFillv2. * set: randomly get a mask from a mask set. * file: get mask from ‘mask_path’ in results.
mask_config (dict) – Params for creating masks. Each type of mask needs different configs.

class mmedit.datasets.pipelines.LoadPairedImageFromFile(io_backend='disk', key='gt', flag='color', channel_order='bgr', save_original_img=False, **kwargs)[source]¶

Load a pair of images from file.

Each sample contains a pair of images, which are concatenated in the w dimension (a|b). This is a special loading class for generation paired dataset. It loads a pair of images as the common loader does and crops it into two images with the same shape in different domains.

Required key is “pair_path”. Added or modified keys are “pair”, “pair_ori_shape”, “ori_pair”, “img_a”, “img_b”, “img_a_path”, “img_b_path”, “img_a_ori_shape”, “img_b_ori_shape”, “ori_img_a” and “ori_img_b”.

Parameters

io_backend (str) – io backend where images are store. Default: ‘disk’.
key (str) – Keys in results to find corresponding path. Default: ‘gt’.
flag (str) – Loading flag for images. Default: ‘color’.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘bgr’.
save_original_img (bool) – If True, maintain a copy of the image in results dict with name of f’ori_{key}’. Default: False.
kwargs (dict) – Args for file client.

class mmedit.datasets.pipelines.MergeFgAndBg[source]¶

Composite foreground image and background image with alpha.

Required keys are “alpha”, “fg” and “bg”, added key is “merged”.

class mmedit.datasets.pipelines.ModCrop[source]¶

Mod crop gt images, used during testing.

Required keys are “scale” and “gt”, added or modified keys are “gt”.

class mmedit.datasets.pipelines.Normalize(keys, mean, std, to_rgb=False)[source]¶

Normalize images with the given mean and std value.

Required keys are the keys in attribute “keys”, added or modified keys are the keys in attribute “keys” and these keys with postfix ‘_norm_cfg’. It also supports normalizing a list of images.

Parameters

keys (Sequence[str]) – The images to be normalized.
mean (np.ndarray) – Mean values of different channels.
std (np.ndarray) – Std values of different channels.
to_rgb (bool) – Whether to convert channels from BGR to RGB.

class mmedit.datasets.pipelines.Pad(keys, ds_factor=32, **kwargs)[source]¶

Pad the images to align with network downsample factor for testing.

See Reshape for more explanation. numpy.pad is used for the pad operation. Required keys are the keys in attribute “keys”, added or modified keys are “test_trans” and the keys in attribute “keys”. All keys in “keys” should have the same shape. “test_trans” is used to record the test transformation to align the input’s shape.

Parameters

keys (list[str]) – The images to be padded.
ds_factor (int) – Downsample factor of the network. The height and weight will be padded to a multiple of ds_factor. Default: 32.
kwargs (option) – any keyword arguments to be passed to numpy.pad.

class mmedit.datasets.pipelines.PairedRandomCrop(gt_patch_size)[source]¶

Paried random crop.

It crops a pair of lq and gt images with corresponding locations. It also supports accepting lq list and gt list. Required keys are “scale”, “lq”, and “gt”, added or modified keys are “lq” and “gt”.

Parameters: gt_patch_size (int) – cropped gt patch size.

class mmedit.datasets.pipelines.PerturbBg(gamma_ratio=0.6)[source]¶

Randomly add gaussian noise or gamma change to background image.

Required key is “bg”, added key is “noisy_bg”.

Parameters: gamma_ratio (float, optional) – The probability to use gamma correction instead of gaussian noise. Defaults to 0.6.

class mmedit.datasets.pipelines.RandomAffine(keys, degrees, translate=None, scale=None, shear=None, flip_ratio=None)[source]¶

Apply random affine to input images.

This class is adopted from https://github.com/pytorch/vision/blob/v0.5.0/torchvision/transforms/transforms.py#L1015 # noqa It should be noted that in https://github.com/Yaoyi-Li/GCA-Matting/blob/master/dataloader/data_generator.py#L70 # noqa random flip is added. See explanation of flip_ratio below. Required keys are the keys in attribute “keys”, modified keys are keys in attribute “keys”.

Parameters

keys (Sequence[str]) – The images to be affined.
degrees (float | tuple[float]) – Range of degrees to select from. If it is a float instead of a tuple like (min, max), the range of degrees will be (-degrees, +degrees). Set to 0 to deactivate rotations.
translate (tuple, optional) – Tuple of maximum absolute fraction for horizontal and vertical translations. For example translate=(a, b), then horizontal shift is randomly sampled in the range -img_width * a < dx < img_width * a and vertical shift is randomly sampled in the range -img_height * b < dy < img_height * b. Default: None.
scale (tuple, optional) – Scaling factor interval, e.g (a, b), then scale is randomly sampled from the range a <= scale <= b. Default: None.
shear (float | tuple[float], optional) – Range of shear degrees to select from. If shear is a float, a shear parallel to the x axis and a shear parallel to the y axis in the range (-shear, +shear) will be applied. Else if shear is a tuple of 2 values, a x-axis shear and a y-axis shear in (shear[0], shear[1]) will be applied. Default: None.
flip_ratio (float, optional) – Probability of the image being flipped. The flips in horizontal direction and vertical direction are independent. The image may be flipped in both directions. Default: None.

class mmedit.datasets.pipelines.RandomJitter(hue_range=40)[source]¶

Randomly jitter the foreground in hsv space.

The jitter range of hue is adjustable while the jitter ranges of saturation and value are adaptive to the images. Side effect: the “fg” image will be converted to np.float32. Required keys are “fg” and “alpha”, modified key is “fg”.

Parameters: hue_range (float | tuple[float]) – Range of hue jittering. If it is a float instead of a tuple like (min, max), the range of hue jittering will be (-hue_range, +hue_range). Default: 40.

class mmedit.datasets.pipelines.RandomLoadResizeBg(bg_dir, io_backend='disk', flag='color', **kwargs)[source]¶

Randomly load a background image and resize it.

Required key is “fg”, added key is “bg”.

Parameters

bg_dir (str) – Path of directory to load background images from.
io_backend (str) – io backend where images are store. Default: ‘disk’.
flag (str) – Loading flag for images. Default: ‘color’.
kwargs (dict) – Args for file client.

class mmedit.datasets.pipelines.RandomMaskDilation(keys, binary_thr=0.0, kernel_min=9, kernel_max=49)[source]¶

Randomly dilate binary masks.

Parameters

keys (Sequence[str]) – The images to be resized.
get_binary (bool) – If True, according to binary_thr, reset final output as binary mask. Otherwise, return masks directly.
binary_thr (float) – Threshold for obtaining binary mask.
kernel_min (int) – Min size of dilation kernel.
kernel_max (int) – Max size of dilation kernel.

class mmedit.datasets.pipelines.RandomTransposeHW(keys, transpose_ratio=0.5)[source]¶

Randomly transpose images in H and W dimensions with a probability.

(TransposeHW = horizontal flip + anti-clockwise rotatation by 90 degrees) When used with horizontal/vertical flips, it serves as a way of rotation augmentation. It also supports randomly transposing a list of images.

Required keys are the keys in attributes “keys”, added or modified keys are “transpose” and the keys in attributes “keys”.

Parameters

keys (list[str]) – The images to be transposed.
transpose_ratio (float) – The propability to transpose the images.

class mmedit.datasets.pipelines.RescaleToZeroOne(keys)[source]¶

Transform the images into a range between 0 and 1.

Required keys are the keys in attribute “keys”, added or modified keys are the keys in attribute “keys”. It also supports rescaling a list of images.

Parameters: keys (Sequence[str]) – The images to be transformed.

class mmedit.datasets.pipelines.Resize(keys, scale=None, keep_ratio=False, size_factor=None, max_size=None, interpolation='bilinear')[source]¶

Resize data to a specific size for training or resize the images to fit the network input regulation for testing.

When used for resizing images to fit network input regulation, the case is that a network may have several downsample and then upsample operation, then the input height and width should be divisible by the downsample factor of the network. For example, the network would downsample the input for 5 times with stride 2, then the downsample factor is 2^5 = 32 and the height and width should be divisible by 32.

Required keys are the keys in attribute “keys”, added or modified keys are “keep_ratio”, “scale_factor”, “interpolation” and the keys in attribute “keys”.

All keys in “keys” should have the same shape. “test_trans” is used to record the test transformation to align the input’s shape.

Parameters

keys (list[str]) – The images to be resized.
scale (float | Tuple[int]) – If scale is Tuple(int), target spatial size (h, w). Otherwise, target spatial size is scaled by input size. If any of scale is -1, we will rescale short edge. Note that when it is used, size_factor and max_size are useless. Default: None
keep_ratio (bool) – If set to True, images will be resized without changing the aspect ratio. Otherwise, it will resize images to a given size. Default: False. Note that it is used togher with scale.
size_factor (int) – Let the output shape be a multiple of size_factor. Default:None. Note that when it is used, scale should be set to None and keep_ratio should be set to False.
max_size (int) – The maximum size of the longest side of the output. Default:None. Note that it is used togher with size_factor.
interpolation (str) – Algorithm used for interpolation: “nearest” | “bilinear” | “bicubic” | “area” | “lanczos”. Default: “bilinear”.

class mmedit.datasets.pipelines.TemporalReverse(keys, reverse_ratio=0.5)[source]¶

Reverse frame lists for temporal augmentation.

Required keys are the keys in attributes “lq” and “gt”, added or modified keys are “lq”, “gt” and “reverse”.

Parameters

keys (list[str]) – The frame lists to be reversed.
reverse_ratio (float) – The propability to reverse the frame lists. Default: 0.5.

class mmedit.datasets.pipelines.ToTensor(keys)[source]¶

Convert some values in results dict to torch.Tensor type in data loader pipeline.

Parameters: keys (Sequence[str]) – Required keys to be converted.

mmedit.models¶

models¶

class mmedit.models.BaseMattor(backbone, refiner=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶

Base class for matting model.

A matting model must contain a backbone which produces alpha, a dense prediction with the same height and width of input image. In some cases, the model will has a refiner which refines the prediction of the backbone.

The subclasses should overwrite the function forward_train and forward_test which define the output of the model and maybe the connection between the backbone and the refiner.

Parameters

backbone (dict) – Config of backbone.
refiner (dict) – Config of refiner.
train_cfg (dict) – Config of training. In train_cfg, train_backbone should be specified. If the model has a refiner, train_refiner should be specified.
test_cfg (dict) – Config of testing. In test_cfg, If the model has a refiner, train_refiner should be specified.
pretrained (str) – Path of pretrained model.

evaluate(pred_alpha, meta)[source]¶

Evaluate predicted alpha matte.

The evaluation metrics are determined by self.test_cfg.metrics.

Parameters

pred_alpha (np.ndarray) – The predicted alpha matte of shape (H, W).
meta (list[dict]) – Meta data about the current data batch. Currently only batch_size 1 is supported. Required keys in the meta dict are ori_alpha and ori_trimap.

Returns

The evaluation result.

Return type

dict

forward(merged, trimap, meta, alpha=None, test_mode=False, **kwargs)[source]¶

Defines the computation performed at every call.

Parameters

merged (Tensor) – Image to predict alpha matte.
trimap (Tensor) – Trimap of the input image.
meta (list[dict]) – Meta data about the current data batch. Defaults to None.
alpha (Tensor, optional) – Ground-truth alpha matte. Defaults to None.
test_mode (bool, optional) – Whether in test mode. If True, it will call forward_test of the model. Otherwise, it will call forward_train of the model. Defaults to False.

Returns

Return the output of self.forward_test if test_mode are set to True. Otherwise return the output of self.forward_train.

Return type

dict

abstract forward_test(merged, trimap, meta, **kwargs)[source]¶: Defines the computation performed at every test call.

abstract forward_train(merged, trimap, alpha, **kwargs)[source]¶

Defines the computation performed at every training call.

Parameters

merged (Tensor) – Image to predict alpha matte.
trimap (Tensor) – Trimap of the input image.
alpha (Tensor) – Ground-truth alpha matte.

freeze_backbone()[source]¶: Freeze the backbone and only train the refiner.

init_weights(pretrained=None)[source]¶

Initialize the model network weights.

Parameters: pretrained (str, optional) – Path to the pretrained weight. Defaults to None.

restore_shape(pred_alpha, meta)[source]¶

Restore the predicted alpha to the original shape.

The shape of the predicted alpha may not be the same as the shape of original input image. This function restores the shape of the predicted alpha.

Parameters

pred_alpha (np.ndarray) – The predicted alpha.
meta (list[dict]) – Meta data about the current data batch. Currently only batch_size 1 is supported.

Returns

The reshaped predicted alpha.

Return type

np.ndarray

save_image(pred_alpha, meta, save_path, iteration)[source]¶

Save predicted alpha to file.

Parameters

pred_alpha (np.ndarray) – The predicted alpha matte of shape (H, W).
meta (list[dict]) – Meta data about the current data batch. Currently only batch_size 1 is supported. Required keys in the meta dict are merged_path.
save_path (str) – The directory to save predicted alpha matte.
iteration (int | None) – If given as None, the saved alpha matte will have the same file name with merged_path in meta dict. If given as an int, the saved alpha matte would named with postfix _{iteration}.png.

train_step(data_batch, optimizer)[source]¶

Defines the computation and network update at every training call.

Parameters

data_batch (torch.Tensor) – Batch of data as input.
optimizer (torch.optim.Optimizer) – Optimizer of the model.

Returns

Output of train_step containing the logging variables of the current data batch.

Return type

dict

property with_refiner¶: Whether the matting model has a refiner.

class mmedit.models.BaseModel[source]¶

Base model.

All models should subclass it. All subclass should overwrite:

init_weights, supporting to initialize models.

forward_train, supporting to forward when training.

forward_test, supporting to forward when testing.

train_step, supporting to train one step when training.

forward(imgs, labels, test_mode, **kwargs)[source]¶

Forward function for base model.

Parameters

imgs (Tensor) – Input image(s).
labels (Tensor) – Ground-truth label(s).
test_mode (bool) – Whether in test mode.
kwargs (dict) – Other arguments.

Returns

Forward results.

Return type

Tensor

abstract forward_test(imgs)[source]¶

Abstract method for testing forward.

All subclass should overwrite it.

abstract forward_train(imgs, labels)[source]¶

Abstract method for training forward.

All subclass should overwrite it.

abstract init_weights()[source]¶

Abstract method for initializing weight.

All subclass should overwrite it.

parse_losses(losses)[source]¶

Parse losses dict for different loss variants.

Parameters: losses (dict) – Loss dict.
Returns: Sum of the total loss. log_vars (dict): loss dict for different variants.
Return type: loss (float)

abstract train_step(data_batch, optimizer)[source]¶

Abstract method for one training step.

All subclass should overwrite it.

val_step(data_batch, **kwargs)[source]¶

Abstract method for one validation step.

All subclass should overwrite it.

class mmedit.models.BasicRestorer(generator, pixel_loss, train_cfg=None, test_cfg=None, pretrained=None)[source]¶

Basic model for image restoration.

It must contain a generator that takes an image as inputs and outputs a restored image. It also has a pixel-wise loss for training.

The subclasses should overwrite the function forward_train, forward_test and train_step.

Parameters

generator (dict) – Config for the generator structure.
pixel_loss (dict) – Config for pixel-wise loss.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path for pretrained model. Default: None.

evaluate(output, gt)[source]¶

Evaluation function.

Parameters

output (Tensor) – Model output with shape (n, c, h, w).
gt (Tensor) – GT Tensor with shape (n, c, h, w).

Returns

Evaluation results.

Return type

dict

forward(lq, gt=None, test_mode=False, **kwargs)[source]¶

Forward function.

Parameters

lq (Tensor) – Input lq images.
gt (Tensor) – Ground-truth image. Default: None.
test_mode (bool) – Whether in test mode or not. Default: False.
kwargs (dict) – Other arguments.

forward_dummy(img)[source]¶

Used for computing network FLOPs.

Parameters: img (Tensor) – Input image.
Returns: Output image.
Return type: Tensor

forward_test(lq, gt=None, meta=None, save_image=False, save_path=None, iteration=None)[source]¶

Testing forward function.

Parameters

lq (Tensor) – LQ Tensor with shape (n, c, h, w).
gt (Tensor) – GT Tensor with shape (n, c, h, w). Default: None.
save_image (bool) – Whether to save image. Default: False.
save_path (str) – Path to save image. Default: None.
iteration (int) – Iteration for the saving image name. Default: None.

Returns

Output results.

Return type

dict

forward_train(lq, gt)[source]¶

Training forward function.

Parameters

lq (Tensor) – LQ Tensor with shape (n, c, h, w).
gt (Tensor) – GT Tensor with shape (n, c, h, w).

Returns

Output tensor.

Return type

Tensor

init_weights(pretrained=None)[source]¶

Init weights for models.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.

train_step(data_batch, optimizer)[source]¶

Train step.

Parameters

data_batch (dict) – A batch of data.
optimizer (obj) – Optimizer.

Returns

Returned output.

Return type

dict

val_step(data_batch, **kwargs)[source]¶

Validation step.

Parameters

data_batch (dict) – A batch of data.
kwargs (dict) – Other arguments for val_step.

Returns

Returned output.

Return type

dict

class mmedit.models.CycleGAN(generator, discriminator, gan_loss, cycle_loss, id_loss=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶

CycleGAN model for unpaired image-to-image translation.

Ref: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks

Parameters

generator (dict) – Config for the generator.
discriminator (dict) – Config for the discriminator.
gan_loss (dict) – Config for the gan loss.
cycle_loss (dict) – Config for the cycle-consistency loss.
id_loss (dict) – Config for the identity loss. Default: None.
train_cfg (dict) – Config for training. Default: None. You may change the training of gan by setting: disc_steps: how many discriminator updates after one generator update. disc_init_steps: how many discriminator updates at the start of the training. These two keys are useful when training with WGAN. direction: image-to-image translation direction (the model training direction): a2b | b2a. buffer_size: GAN image buffer size.
test_cfg (dict) – Config for testing. Default: None. You may change the testing of gan by setting: direction: image-to-image translation direction (the model training direction): a2b | b2a. show_input: whether to show input real images. test_direction: direction in the test mode (the model testing direction). CycleGAN has two generators. It decides whether to perform forward or backward translation with respect to direction during testing: a2b | b2a.
pretrained (str) – Path for pretrained model. Default: None.

backward_discriminators(outputs)[source]¶

Backward function for the discriminators.

Parameters: outputs (dict) – Dict of forward results.
Returns: Loss dict.
Return type: dict

backward_generators(outputs)[source]¶

Backward function for the generators.

Parameters: outputs (dict) – Dict of forward results.
Returns: Loss dict.
Return type: dict

forward(img_a, img_b, meta, test_mode=False, **kwargs)[source]¶

Forward function.

Parameters

img_a (Tensor) – Input image from domain A.
img_b (Tensor) – Input image from domain B.
meta (list[dict]) – Input meta data.
test_mode (bool) – Whether in test mode or not. Default: False.
kwargs (dict) – Other arguments.

forward_dummy(img)[source]¶

Used for computing network FLOPs.

Parameters: img (Tensor) – Dummy input used to compute FLOPs.
Returns: Dummy output produced by forwarding the dummy input.
Return type: Tensor

forward_test(img_a, img_b, meta, save_image=False, save_path=None, iteration=None)[source]¶

Forward function for testing.

Parameters

img_a (Tensor) – Input image from domain A.
img_b (Tensor) – Input image from domain B.
meta (list[dict]) – Input meta data.
save_image (bool, optional) – If True, results will be saved as images. Default: False.
save_path (str, optional) – If given a valid str path, the results will be saved in this path. Default: None.
iteration (int, optional) – Iteration number. Default: None.

Returns

Dict of forward and evaluation results for testing.

Return type

dict

forward_train(img_a, img_b, meta)[source]¶

Forward function for training.

Parameters

img_a (Tensor) – Input image from domain A.
img_b (Tensor) – Input image from domain B.
meta (list[dict]) – Input meta data.

Returns

Dict of forward results for training.

Return type

dict

get_module(module)[source]¶

Get nn.ModuleDict to fit the MMDistributedDataParallel interface.

Parameters: module (MMDistributedDataParallel | nn.ModuleDict) – The input module that needs processing.
Returns: The ModuleDict of multiple networks.
Return type: nn.ModuleDict

init_weights(pretrained=None)[source]¶

Initialize weights for the model.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Default: None.

setup(img_a, img_b, meta)[source]¶

Perform necessary pre-processing steps.

Parameters

img_a (Tensor) – Input image from domain A.
img_b (Tensor) – Input image from domain B.
meta (list[dict]) – Input meta data.

Returns

The real images from domain A/B, and the image path as the metadata.

Return type

Tensor, Tensor, list[str]

train_step(data_batch, optimizer)[source]¶

Training step function.

Parameters

data_batch (dict) – Dict of the input data batch.
optimizer (dict[torch.optim.Optimizer]) – Dict of optimizers for the generators and discriminators.

Returns

Dict of loss, information for logger, the number of samples and results for visualization.

Return type

dict

val_step(data_batch, **kwargs)[source]¶

Validation step function.

Parameters

data_batch (dict) – Dict of the input data batch.
kwargs (dict) – Other arguments.

Returns

Dict of evaluation results for validation.

Return type

dict

class mmedit.models.DIM(backbone, refiner=None, train_cfg=None, test_cfg=None, pretrained=None, loss_alpha=None, loss_comp=None, loss_refine=None)[source]¶

Deep Image Matting model.

https://arxiv.org/abs/1703.03872

Note

For (self.train_cfg.train_backbone, self.train_cfg.train_refiner):

(True, False) corresponds to the encoder-decoder stage in the paper.

(False, True) corresponds to the refinement stage in the paper.

(True, True) corresponds to the fine-tune stage in the paper.

Parameters

backbone (dict) – Config of backbone.
refiner (dict) – Config of refiner.
train_cfg (dict) – Config of training. In train_cfg, train_backbone should be specified. If the model has a refiner, train_refiner should be specified.
test_cfg (dict) – Config of testing. In test_cfg, If the model has a refiner, train_refiner should be specified.
pretrained (str) – Path of pretrained model.
loss_alpha (dict) – Config of the alpha prediction loss. Default: None.
loss_comp (dict) – Config of the composition loss. Default: None.
loss_refine (dict) – Config of the loss of the refiner. Default: None.

forward_test(merged, trimap, meta, save_image=False, save_path=None, iteration=None)[source]¶

Defines the computation performed at every test call.

Parameters

merged (Tensor) – Image to predict alpha matte.
trimap (Tensor) – Trimap of the input image.
meta (list[dict]) – Meta data about the current data batch. Currently only batch_size 1 is supported. It may contain information needed to calculate metrics (ori_alpha and ori_trimap) or save predicted alpha matte (merged_path).
save_image (bool, optional) – Whether save predicted alpha matte. Defaults to False.
save_path (str, optional) – The directory to save predicted alpha matte. Defaults to None.
iteration (int, optional) – If given as None, the saved alpha matte will have the same file name with merged_path in meta dict. If given as an int, the saved alpha matte would named with postfix _{iteration}.png. Defaults to None.

Returns

Contains the predicted alpha and evaluation result.

Return type

dict

forward_train(merged, trimap, meta, alpha, ori_merged, fg, bg)[source]¶

Defines the computation performed at every training call.

Parameters

merged (Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
trimap (Tensor) – of shape (N, 1, H, W). Tensor of trimap read by opencv.
meta (list[dict]) – Meta data about the current data batch.
alpha (Tensor) – of shape (N, 1, H, W). Tensor of alpha read by opencv.
ori_merged (Tensor) – of shape (N, C, H, W). Tensor of origin merged image read by opencv (not normalized).
fg (Tensor) – of shape (N, C, H, W). Tensor of fg read by opencv.
bg (Tensor) – of shape (N, C, H, W). Tensor of bg read by opencv.

Returns

Contains the loss items and batch infomation.

Return type

dict

class mmedit.models.DeepFillv1Inpaintor(*args, stage1_loss_type=('loss_l1_hole'), stage2_loss_type=('loss_l1_hole', 'loss_gan'), input_with_ones=True, disc_input_with_mask=False, **kwargs)[source]¶

calculate_loss_with_type(loss_type, fake_res, fake_img, gt, mask, prefix='stage1_', fake_local=None)[source]¶

Calculate multiple types of losses.

Parameters

loss_type (str) – Type of the loss.
fake_res (torch.Tensor) – Direct results from model.
fake_img (torch.Tensor) – Composited results from model.
gt (torch.Tensor) – Ground-truth tensor.
mask (torch.Tensor) – Mask tensor.
prefix (str, optional) – Prefix for loss name. Defaults to ‘stage1_’.
fake_local (torch.Tensor, optional) – Local results from model. Defaults to None.

Returns

Contain loss value with its name.

Return type

dict

forward_train_d(data_batch, is_real, is_disc)[source]¶

Forward function in discriminator training step.

In this function, we modify the default implementation with only one discriminator. In DeepFillv1 model, they use two separated discriminators for global and local consistency.

Parameters

data (torch.Tensor) – Batch of real data or fake data.
is_real (bool) – If True, the gan loss will regard this batch as real data. Otherwise, the gan loss will regard this batch as fake data.
is_disc (bool) – If True, this function is called in discriminator training step. Otherwise, this function is called in generator training step. This will help us to compute different types of adversarial loss, like LSGAN.

Returns

Contains the loss items computed in this function.

Return type

dict

get_module(model, module_name)[source]¶

Get an inner module from model.

Since we will wrapper DDP for some model, we have to judge whether the module can be indexed directly.

Parameters

model (nn.Module) – This model may wrapped with DDP or not.
module_name (str) – The name of specific module.

Returns

Returned sub module.

Return type

nn.Module

train_step(data_batch, optimizer)[source]¶

Train step function.

In this function, the inpaintor will finish the train step following the pipeline:

get fake res/image

optimize discriminator (if have)

optimize generator

If self.train_cfg.disc_step > 1, the train step will contain multiple iterations for optimizing discriminator with different input data and only one iteration for optimizing gerator after disc_step iterations for discriminator.

Parameters

data_batch (torch.Tensor) – Batch of data as input.
optimizer (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).

Returns

Dict with loss, information for logger, the number of samples and results for visualization.

Return type

dict

two_stage_loss(stage1_data, stage2_data, data_batch)[source]¶

Calculate two-stage loss.

Parameters

stage1_data (dict) – Contain stage1 results.
stage2_data (dict) – Contain stage2 results.
data_batch (dict) – Contain data needed to calculate loss.

Returns

Contain losses with name.

Return type

dict

class mmedit.models.ESRGAN(generator, discriminator=None, gan_loss=None, pixel_loss=None, perceptual_loss=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶

Enhanced SRGAN model for single image super-resolution.

Ref: ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. It uses RaGAN for GAN updates: The relativistic discriminator: a key element missing from standard GAN.

Parameters

generator (dict) – Config for the generator.
discriminator (dict) – Config for the discriminator. Default: None.
gan_loss (dict) – Config for the gan loss. Note that the loss weight in gan loss is only for the generator.
pixel_loss (dict) – Config for the pixel loss. Default: None.
perceptual_loss (dict) – Config for the perceptual loss. Default: None.
train_cfg (dict) – Config for training. Default: None. You may change the training of gan by setting: disc_steps: how many discriminator updates after one generate update; disc_init_steps: how many discriminator updates at the start of the training. These two keys are useful when training with WGAN.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path for pretrained model. Default: None.

train_step(data_batch, optimizer)[source]¶

Train step.

Parameters

data_batch (dict) – A batch of data.
optimizer (obj) – Optimizer.

Returns

Returned output.

Return type

dict

class mmedit.models.GCA(backbone, train_cfg=None, test_cfg=None, pretrained=None, loss_alpha=None)[source]¶

Guided Contextual Attention image matting model.

https://arxiv.org/abs/2001.04069

Parameters

backbone (dict) – Config of backbone.
train_cfg (dict) – Config of training. In train_cfg, train_backbone should be specified. If the model has a refiner, train_refiner should be specified.
test_cfg (dict) – Config of testing. In test_cfg, If the model has a refiner, train_refiner should be specified.
pretrained (str) – Path of the pretrained model.
loss_alpha (dict) – Config of the alpha prediction loss. Default: None.

forward_test(merged, trimap, meta, save_image=False, save_path=None, iteration=None)[source]¶

Defines the computation performed at every test call.

Parameters

merged (Tensor) – Image to predict alpha matte.
trimap (Tensor) – Trimap of the input image.
meta (list[dict]) – Meta data about the current data batch. Currently only batch_size 1 is supported. It may contain information needed to calculate metrics (ori_alpha and ori_trimap) or save predicted alpha matte (merged_path).
save_image (bool, optional) – Whether save predicted alpha matte. Defaults to False.
save_path (str, optional) – The directory to save predicted alpha matte. Defaults to None.
iteration (int, optional) – If given as None, the saved alpha matte will have the same file name with merged_path in meta dict. If given as an int, the saved alpha matte would named with postfix _{iteration}.png. Defaults to None.

Returns

Contains the predicted alpha and evaluation result.

Return type

dict

forward_train(merged, trimap, meta, alpha)[source]¶

Forward function for training GCA model.

Parameters

merged (Tensor) – with shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
trimap (Tensor) – with shape (N, C’, H, W). Tensor of trimap. C’ might be 1 or 3.
meta (list[dict]) – Meta data about the current data batch.
alpha (Tensor) – with shape (N, 1, H, W). Tensor of alpha.

Returns

Contains the loss items and batch infomation.

Return type

dict

class mmedit.models.GLInpaintor(encdec, disc=None, loss_gan=None, loss_gp=None, loss_disc_shift=None, loss_composed_percep=None, loss_out_percep=False, loss_l1_hole=None, loss_l1_valid=None, loss_tv=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶

Inpaintor for global&local method.

This inpaintor is implemented according to the paper: Globally and Locally Consistent Image Completion

Importantly, this inpaintor is an example for using custom training schedule based on OneStageInpaintor.

The training pipeline of global&local is as following:

if cur_iter < iter_tc:
    update generator with only l1 loss
else:
    update discriminator
    if cur_iter > iter_td:
        update generator with l1 loss and adversarial loss

The new attribute cur_iter is added for recording current number of iteration. The train_cfg contains the setting of the training schedule:

train_cfg = dict(
    start_iter=0,
    disc_step=1,
    iter_tc=90000,
    iter_td=100000
)

iter_tc and iter_td correspond to the noation $T_C$ and $T_D$ of theoriginal paper.

Parameters

generator (dict) – Config for encoder-decoder style generator.
disc (dict) – Config for discriminator.
loss_gan (dict) – Config for adversarial loss.
loss_gp (dict) – Config for gradient penalty loss.
loss_disc_shift (dict) – Config for discriminator shift loss.
loss_composed_percep (dict) – Config for perceptural and style loss with composed image as input.
loss_out_percep (dict) – Config for perceptural and style loss with direct output as input.
loss_l1_hole (dict) – Config for l1 loss in the hole.
loss_l1_valid (dict) – Config for l1 loss in the valid region.
loss_tv (dict) – Config for total variation loss.
train_cfg (dict) – Configs for training scheduler. disc_step must be contained for indicates the discriminator updating steps in each training step.
test_cfg (dict) – Configs for testing scheduler.
pretrained (str) – Path for pretrained model. Default None.

generator_loss(fake_res, fake_img, fake_local, data_batch)[source]¶

Forward function in generator training step.

In this function, we mainly compute the loss items for generator with the given (fake_res, fake_img). In general, the fake_res is the direct output of the generator and the fake_img is the composition of direct output and ground-truth image.

Parameters

fake_res (torch.Tensor) – Direct output of the generator.
fake_img (torch.Tensor) – Composition of fake_res and ground-truth image.
data_batch (dict) – Contain other elements for computing losses.

Returns

A tuple containing two dictionaries. The first one is the result dict, which contains the results computed within this function for visualization. The second one is the loss dict, containing loss items computed in this function.

Return type

tuple[dict]

train_step(data_batch, optimizer)[source]¶

Train step function.

In this function, the inpaintor will finish the train step following the pipeline:

get fake res/image
optimize discriminator (if in current schedule)
optimzie generator (if in current schedule)

If self.train_cfg.disc_step > 1, the train step will contain multiple iterations for optimizing discriminator with different input data and sonly one iteration for optimizing generator after disc_step iterations for discriminator.

Parameters

data_batch (torch.Tensor) – Batch of data as input.
optimizer (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).

Returns

Dict with loss, information for logger, the number of samples and results for visualization.

Return type

dict

class mmedit.models.IndexNet(backbone, train_cfg=None, test_cfg=None, pretrained=None, loss_alpha=None, loss_comp=None)[source]¶

IndexNet matting model.

This implementation follows: Indices Matter: Learning to Index for Deep Image Matting

Parameters

backbone (dict) – Config of backbone.
train_cfg (dict) – Config of training. In ‘train_cfg’, ‘train_backbone’ should be specified.
test_cfg (dict) – Config of testing.
pretrained (str) – path of pretrained model.
loss_alpha (dict) – Config of the alpha prediction loss. Default: None.
loss_comp (dict) – Config of the composition loss. Default: None.

forward_test(merged, trimap, meta, save_image=False, save_path=None, iteration=None)[source]¶

Defines the computation performed at every test call.

Parameters

merged (Tensor) – Image to predict alpha matte.
trimap (Tensor) – Trimap of the input image.
meta (list[dict]) – Meta data about the current data batch. Currently only batch_size 1 is supported. It may contain information needed to calculate metrics (ori_alpha and ori_trimap) or save predicted alpha matte (merged_path).
save_image (bool, optional) – Whether save predicted alpha matte. Defaults to False.
save_path (str, optional) – The directory to save predicted alpha matte. Defaults to None.
iteration (int, optional) – If given as None, the saved alpha matte will have the same file name with merged_path in meta dict. If given as an int, the saved alpha matte would named with postfix _{iteration}.png. Defaults to None.

Returns

Contains the predicted alpha and evaluation result.

Return type

dict

forward_train(merged, trimap, meta, alpha, ori_merged, fg, bg)[source]¶

Forward function for training IndexNet model.

Parameters

merged (Tensor) – Input images tensor with shape (N, C, H, W). Typically these should be mean centered and std scaled.
trimap (Tensor) – Tensor of trimap with shape (N, 1, H, W).
meta (list[dict]) – Meta data about the current data batch.
alpha (Tensor) – Tensor of alpha with shape (N, 1, H, W).
ori_merged (Tensor) – Tensor of origin merged images (not normalized) with shape (N, C, H, W).
fg (Tensor) – Tensor of foreground with shape (N, C, H, W).
bg (Tensor) – Tensor of background with shape (N, C, H, W).

Returns

Contains the loss items and batch infomation.

Return type

dict

class mmedit.models.OneStageInpaintor(encdec, disc=None, loss_gan=None, loss_gp=None, loss_disc_shift=None, loss_composed_percep=None, loss_out_percep=False, loss_l1_hole=None, loss_l1_valid=None, loss_tv=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶

Standard one-stage inpaintor with commonly used losses.

An inpaintor must contain an encoder-decoder style generator to inpaint masked regions. A discriminator will be adopted when adversarial training is needed.

In this class, we provide a common interface for inpaintors. For other inpaintors, only some funcs may be modified to fit the input style or training schedule.

Parameters

generator (dict) – Config for encoder-decoder style generator.
disc (dict) – Config for discriminator.
loss_gan (dict) – Config for adversarial loss.
loss_gp (dict) – Config for gradient penalty loss.
loss_disc_shift (dict) – Config for discriminator shift loss.
loss_composed_percep (dict) – Config for perceptural and style loss with composed image as input.
loss_out_percep (dict) – Config for perceptural and style loss with direct output as input.
loss_l1_hole (dict) – Config for l1 loss in the hole.
loss_l1_valid (dict) – Config for l1 loss in the valid region.
loss_tv (dict) – Config for total variation loss.
train_cfg (dict) – Configs for training scheduler. disc_step must be contained for indicates the discriminator updating steps in each training step.
test_cfg (dict) – Configs for testing scheduler.
pretrained (str) – Path for pretrained model. Default None.

forward(masked_img, mask, test_mode=True, **kwargs)[source]¶

Forward function.

Parameters

masked_img (torch.Tensor) – Image with hole as input.
mask (torch.Tensor) – Mask as input.
test_mode (bool, optional) – Whether use testing mode. Defaults to True.

Returns

Dict contains output results.

Return type

dict

forward_dummy(x)[source]¶

Forward dummy function for getting flops.

Parameters: x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
Returns: Results tensor with shape of (n, 3, h, w).
Return type: torch.Tensor

forward_test(masked_img, mask, save_image=False, save_path=None, iteration=None, **kwargs)[source]¶

Forward function for testing.

Parameters

masked_img (torch.Tensor) – Tensor with shape of (n, 3, h, w).
mask (torch.Tensor) – Tensor with shape of (n, 1, h, w).
save_image (bool, optional) – If True, results will be saved as image. Defaults to False.
save_path (str, optional) – If given a valid str, the reuslts will be saved in this path. Defaults to None.
iteration (int, optional) – Iteration number. Defaults to None.

Returns

Contain output results and eval metrics (if have).

Return type

dict

forward_train(*args, **kwargs)[source]¶

Forward function for training.

In this version, we do not use this interface.

forward_train_d(data_batch, is_real, is_disc)[source]¶

Forward function in discriminator training step.

In this function, we compute the prediction for each data batch (real or fake). Meanwhile, the standard gan loss will be computed with several proposed losses fro stable training.

Parameters

data (torch.Tensor) – Batch of real data or fake data.
is_real (bool) – If True, the gan loss will regard this batch as real data. Otherwise, the gan loss will regard this batch as fake data.
is_disc (bool) – If True, this function is called in discriminator training step. Otherwise, this function is called in generator training step. This will help us to compute different types of adversarial loss, like LSGAN.

Returns

Contains the loss items computed in this function.

Return type

dict

generator_loss(fake_res, fake_img, data_batch)[source]¶

Forward function in generator training step.

In this function, we mainly compute the loss items for generator with the given (fake_res, fake_img). In general, the fake_res is the direct output of the generator and the fake_img is the composition of direct output and ground-truth image.

Parameters

fake_res (torch.Tensor) – Direct output of the generator.
fake_img (torch.Tensor) – Composition of fake_res and ground-truth image.
data_batch (dict) – Contain other elements for computing losses.

Returns

Dict contains the results computed within this function for visualization and dict contains the loss items computed in this function.

Return type

tuple(dict)

init_weights(pretrained=None)[source]¶

Init weights for models.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.

save_visualization(img, filename)[source]¶

Save visualization results.

Parameters

img (torch.Tensor) – Tensor with shape of (n, 3, h, w).
filename (str) – Path to save visualization.

train_step(data_batch, optimizer)[source]¶

Train step function.

In this function, the inpaintor will finish the train step following the pipeline:

get fake res/image

optimize discriminator (if have)

optimize generator

If self.train_cfg.disc_step > 1, the train step will contain multiple iterations for optimizing discriminator with different input data and only one iteration for optimizing gerator after disc_step iterations for discriminator.

Parameters

data_batch (torch.Tensor) – Batch of data as input.
optimizer (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).

Returns

Dict with loss, information for logger, the number of samples and results for visualization.

Return type

dict

val_step(data_batch, **kwargs)[source]¶

Forward function for evaluation.

Parameters: data_batch (dict) – Contain data for forward.
Returns: Contain the results from model.
Return type: dict

class mmedit.models.PConvInpaintor(encdec, disc=None, loss_gan=None, loss_gp=None, loss_disc_shift=None, loss_composed_percep=None, loss_out_percep=False, loss_l1_hole=None, loss_l1_valid=None, loss_tv=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶

forward_dummy(x)[source]¶

Forward dummy function for getting flops.

Parameters: x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
Returns: Results tensor with shape of (n, 3, h, w).
Return type: torch.Tensor

forward_test(masked_img, mask, save_image=False, save_path=None, iteration=None, **kwargs)[source]¶

Forward function for testing.

Parameters

masked_img (torch.Tensor) – Tensor with shape of (n, 3, h, w).
mask (torch.Tensor) – Tensor with shape of (n, 1, h, w).
save_image (bool, optional) – If True, results will be saved as image. Defaults to False.
save_path (str, optional) – If given a valid str, the reuslts will be saved in this path. Defaults to None.
iteration (int, optional) – Iteration number. Defaults to None.

Returns

Contain output results and eval metrics (if have).

Return type

dict

train_step(data_batch, optimizer)[source]¶

Train step function.

In this function, the inpaintor will finish the train step following the pipeline:

get fake res/image

optimize discriminator (if have)

optimize generator

If self.train_cfg.disc_step > 1, the train step will contain multiple iterations for optimizing discriminator with different input data and only one iteration for optimizing gerator after disc_step iterations for discriminator.

Parameters

data_batch (torch.Tensor) – Batch of data as input.
optimizer (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).

Returns

Dict with loss, information for logger, the number of samples and results for visualization.

Return type

dict

class mmedit.models.Pix2Pix(generator, discriminator, gan_loss, pixel_loss=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶

Pix2Pix model for paired image-to-image translation.

Ref: Image-to-Image Translation with Conditional Adversarial Networks

Parameters

generator (dict) – Config for the generator.
discriminator (dict) – Config for the discriminator.
gan_loss (dict) – Config for the gan loss.
pixel_loss (dict) – Config for the pixel loss. Default: None.
train_cfg (dict) – Config for training. Default: None. You may change the training of gan by setting: disc_steps: how many discriminator updates after one generator update. disc_init_steps: how many discriminator updates at the start of the training. These two keys are useful when training with WGAN. direction: image-to-image translation direction (the model training direction): a2b | b2a.
test_cfg (dict) – Config for testing. Default: None. You may change the testing of gan by setting: direction: image-to-image translation direction (the model training direction, same as testing direction): a2b | b2a. show_input: whether to show input real images.
pretrained (str) – Path for pretrained model. Default: None.

backward_discriminator(outputs)[source]¶

Backward function for the discriminator.

Parameters: outputs (dict) – Dict of forward results.
Returns: Loss dict.
Return type: dict

backward_generator(outputs)[source]¶

Backward function for the generator.

Parameters: outputs (dict) – Dict of forward results.
Returns: Loss dict.
Return type: dict

forward(img_a, img_b, meta, test_mode=False, **kwargs)[source]¶

Forward function.

Parameters

img_a (Tensor) – Input image from domain A.
img_b (Tensor) – Input image from domain B.
meta (list[dict]) – Input meta data.
test_mode (bool) – Whether in test mode or not. Default: False.
kwargs (dict) – Other arguments.

forward_dummy(img)[source]¶

Used for computing network FLOPs.

Parameters: img (Tensor) – Dummy input used to compute FLOPs.
Returns: Dummy output produced by forwarding the dummy input.
Return type: Tensor

forward_test(img_a, img_b, meta, save_image=False, save_path=None, iteration=None)[source]¶

Forward function for testing.

Parameters

img_a (Tensor) – Input image from domain A.
img_b (Tensor) – Input image from domain B.
meta (list[dict]) – Input meta data.
save_image (bool, optional) – If True, results will be saved as images. Default: False.
save_path (str, optional) – If given a valid str path, the results will be saved in this path. Default: None.
iteration (int, optional) – Iteration number. Default: None.

Returns

Dict of forward and evaluation results for testing.

Return type

dict

forward_train(img_a, img_b, meta)[source]¶

Forward function for training.

Parameters

img_a (Tensor) – Input image from domain A.
img_b (Tensor) – Input image from domain B.
meta (list[dict]) – Input meta data.

Returns

Dict of forward results for training.

Return type

dict

init_weights(pretrained=None)[source]¶

Initialize weights for the model.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Default: None.

setup(img_a, img_b, meta)[source]¶

Perform necessary pre-processing steps.

Parameters

img_a (Tensor) – Input image from domain A.
img_b (Tensor) – Input image from domain B.
meta (list[dict]) – Input meta data.

Returns

The real images from domain A/B, and the image path as the metadata.

Return type

Tensor, Tensor, list[str]

train_step(data_batch, optimizer)[source]¶

Training step function.

Parameters

data_batch (dict) – Dict of the input data batch.
optimizer (dict[torch.optim.Optimizer]) – Dict of optimizers for the generator and discriminator.

Returns

Dict of loss, information for logger, the number of samples and results for visualization.

Return type

dict

val_step(data_batch, **kwargs)[source]¶

Validation step function.

Parameters

data_batch (dict) – Dict of the input data batch.
kwargs (dict) – Other arguments.

Returns

Dict of evaluation results for validation.

Return type

dict

class mmedit.models.SRGAN(generator, discriminator=None, gan_loss=None, pixel_loss=None, perceptual_loss=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶

SRGAN model for single image super-resolution.

Ref: Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.

Parameters

generator (dict) – Config for the generator.
discriminator (dict) – Config for the discriminator. Default: None.
gan_loss (dict) – Config for the gan loss. Note that the loss weight in gan loss is only for the generator.
pixel_loss (dict) – Config for the pixel loss. Default: None.
perceptual_loss (dict) – Config for the perceptual loss. Default: None.
train_cfg (dict) – Config for training. Default: None. You may change the training of gan by setting: disc_steps: how many discriminator updates after one generate update; disc_init_steps: how many discriminator updates at the start of the training. These two keys are useful when training with WGAN.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path for pretrained model. Default: None.

forward(lq, gt=None, test_mode=False, **kwargs)[source]¶

Forward function.

Parameters

lq (Tensor) – Input lq images.
gt (Tensor) – Ground-truth image. Default: None.
test_mode (bool) – Whether in test mode or not. Default: False.
kwargs (dict) – Other arguments.

init_weights(pretrained=None)[source]¶

Init weights for models.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.

train_step(data_batch, optimizer)[source]¶

Train step.

Parameters

data_batch (dict) – A batch of data.
optimizer (obj) – Optimizer.

Returns

Returned output.

Return type

dict

class mmedit.models.TwoStageInpaintor(*args, stage1_loss_type=('loss_l1_hole'), stage2_loss_type=('loss_l1_hole', 'loss_gan'), input_with_ones=True, disc_input_with_mask=False, **kwargs)[source]¶

Two-Stage Inpaintor.

Currently, we support these loss types in each of two stage inpaintors: [‘loss_gan’, ‘loss_l1_hole’, ‘loss_l1_valid’, ‘loss_composed_percep’, ‘loss_out_percep’, ‘loss_tv’] The stage1_loss_type and stage2_loss_type should be chosen from these loss types.

Parameters

stage1_loss_type (tuple[str]) – Contains the loss names used in the first stage model.
stage2_loss_type (tuple[str]) – Contains the loss names used in the second stage model.
input_with_ones (bool) – Whether to concatenate an extra ones tensor in input. Default: True.
disc_input_with_mask (bool) – Whether to add mask as input in discriminator. Default: False.

calculate_loss_with_type(loss_type, fake_res, fake_img, gt, mask, prefix='stage1_')[source]¶

Calculate multiple types of losses.

Parameters

loss_type (str) – Type of the loss.
fake_res (torch.Tensor) – Direct results from model.
fake_img (torch.Tensor) – Composited results from model.
gt (torch.Tensor) – Ground-truth tensor.
mask (torch.Tensor) – Mask tensor.
prefix (str, optional) – Prefix for loss name. Defaults to ‘stage1_’.

Returns

Contain loss value with its name.

Return type

dict

forward_test(masked_img, mask, save_image=False, save_path=None, iteration=None, **kwargs)[source]¶

Forward function for testing.

Parameters

masked_img (torch.Tensor) – Tensor with shape of (n, 3, h, w).
mask (torch.Tensor) – Tensor with shape of (n, 1, h, w).
save_image (bool, optional) – If True, results will be saved as image. Defaults to False.
save_path (str, optional) – If given a valid str, the reuslts will be saved in this path. Defaults to None.
iteration (int, optional) – Iteration number. Defaults to None.

Returns

Contain output results and eval metrics (if have).

Return type

dict

save_visualization(img, filename)[source]¶

Save visualization results.

Parameters

img (torch.Tensor) – Tensor with shape of (n, 3, h, w).
filename (str) – Path to save visualization.

train_step(data_batch, optimizer)[source]¶

Train step function.

In this function, the inpaintor will finish the train step following the pipeline:

get fake res/image

optimize discriminator (if have)

optimize generator

If self.train_cfg.disc_step > 1, the train step will contain multiple iterations for optimizing discriminator with different input data and only one iteration for optimizing gerator after disc_step iterations for discriminator.

Parameters

data_batch (torch.Tensor) – Batch of data as input.
optimizer (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).

Returns

Dict with loss, information for logger, the number of samples and results for visualization.

Return type

dict

two_stage_loss(stage1_data, stage2_data, data_batch)[source]¶

Calculate two-stage loss.

Parameters

stage1_data (dict) – Contain stage1 results.
stage2_data (dict) – Contain stage2 results.
data_batch (dict) – Contain data needed to calculate loss.

Returns

Contain losses with name.

Return type

dict

mmedit.models.build(cfg, registry, default_args=None)[source]¶

Build module function.

Parameters

cfg (dict) – Configuration for building modules.
registry (obj) – registry object.
default_args (dict, optional) – Default arguments. Defaults to None.

mmedit.models.build_backbone(cfg)[source]¶

Build backbone.

Parameters: cfg (dict) – Configuration for building backbone.

mmedit.models.build_component(cfg)[source]¶

Build component.

Parameters: cfg (dict) – Configuration for building component.

mmedit.models.build_loss(cfg)[source]¶

Build loss.

Parameters: cfg (dict) – Configuration for building loss.

mmedit.models.build_model(cfg, train_cfg=None, test_cfg=None)[source]¶

Build model.

Parameters

cfg (dict) – Configuration for building model.
train_cfg (dict) – Training configuration. Default: None.
test_cfg (dict) – Testing configuration. Default: None.

common¶

class mmedit.models.common.ASPP(in_channels, out_channels=256, mid_channels=256, dilations=(12, 24, 36), conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, separable_conv=False)[source]¶

ASPP module from DeepLabV3.

The code is adopted from https://github.com/pytorch/vision/blob/master/torchvision/models/segmentation/deeplabv3.py # noqa

For more information about the module: “Rethinking Atrous Convolution for Semantic Image Segmentation”.

Parameters

in_channels (int) – Input channels of the module.
out_channels (int) – Output channels of the module.
mid_channels (int) – Output channels of the intermediate ASPP conv modules.
dilations (Sequence[int]) – Dilation rate of three ASPP conv module. Default: [12, 24, 36].
conv_cfg (dict) – Config dict for convolution layer. If “None”, nn.Conv2d will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
separable_conv (bool) – Whether replace normal conv with depthwise separable conv which is faster. Default: False.

forward(x)[source]¶

Forward function for ASPP module.

Parameters: x (Tensor) – Input tensor with shape (N, C, H, W).
Returns: Output tensor.
Return type: Tensor

class mmedit.models.common.ContextualAttentionModule(unfold_raw_kernel_size=4, unfold_raw_stride=2, unfold_raw_padding=1, unfold_corr_kernel_size=3, unfold_corr_stride=1, unfold_corr_dilation=1, unfold_corr_padding=1, scale=0.5, fuse_kernel_size=3, softmax_scale=10, return_attenion_score=True)[source]¶

Contexture attention module.

The details of this module can be found in: Generative Image Inpainting with Contextual Attention

Parameters

unfold_raw_kernel_size (int) – Kernel size used in unfolding raw feature. Default: 4.
unfold_raw_stride (int) – Stride used in unfolding raw feature. Default: 2.
unfold_raw_padding (int) – Padding used in unfolding raw feature. Default: 1.
unfold_corr_kernel_size (int) – Kernel size used in unfolding context for computing correlation maps. Default: 3.
unfold_corr_stride (int) – Stride used in unfolding context for computing correlation maps. Default: 1.
unfold_corr_dilation (int) – Dilation used in unfolding context for computing correlation maps. Default: 1.
unfold_corr_padding (int) – Padding used in unfolding context for computing correlation maps. Default: 1.
scale (float) – The resale factor used in resize input features. Default: 0.5.
fuse_kernel_size (int) – The kernel size used in fusion module. Default: 3.
softmax_scale (float) – The scale factor for softmax function. Default: 10.
return_attenion_score (bool) – If True, the attention score will be returned. Default: True.

calculate_overlap_factor(attention_score)[source]¶

Calculte the overlap factor after applying deconv.

Parameters: attention_score (torch.Tensor) – The attention score with shape of (n, c, h, w).
Returns: The overlap factor will be returned.
Return type: torch.Tensor

calculate_unfold_hw(input_size, kernel_size=3, stride=1, dilation=1, padding=0)[source]¶

Calculate (h, w) after unfolding

The official implementation of unfold in pytorch will put the dimension (h, w) into L. Thus, this function is just to calculate the (h, w) according to the equation in: https://pytorch.org/docs/stable/nn.html#torch.nn.Unfold

forward(x, context, mask=None)[source]¶

Forward Function.

Parameters

x (torch.Tensor) – Tensor with shape (n, c, h, w).
context (torch.Tensor) – Tensor with shape (n, c, h, w).
mask (torch.Tensor) – Tensor with shape (n, 1, h, w). Default: None.

Returns

Features after contextural attention.

Return type

tuple(torch.Tensor)

fuse_correlation_map(correlation_map, h_unfold, w_unfold)[source]¶

Fuse correlation map.

This operation is to fuse correlation map for increasing large consistent correlation regions.

The mechanism behind this op is simple and easy to understand. A standard ‘Eye’ matrix will be applied as a filter on the correlation map in horizontal and vertical direction.

The shape of input correlation map is (n, h_unfold*w_unfold, h, w). When adopting fusing, we will apply convolutional filter in the reshaped feature map with shape of (n, 1, h_unfold*w_fold, h*w).

A simple specification for horizontal direction is shown below:

       (h, (h, (h, (h,
        0)  1)  2)  3)  ...
(h, 0)
(h, 1)      1
(h, 2)          1
(h, 3)              1
...

im2col(img, kernel_size, stride=1, padding=0, dilation=1, normalize=False, return_cols=False)[source]¶

Reshape image-style feature to columns.

This function is used for unfold feature maps to columns. The details of this function can be found in: https://pytorch.org/docs/1.1.0/nn.html?highlight=unfold#torch.nn.Unfold

Parameters

img (torch.Tensor) – Features to be unfolded. The shape of this feature should be (n, c, h, w).
kernel_size (int) – In this function, we only support square kernel with same height and width.
stride (int) – Stride number in unfolding. Default: 1.
padding (int) – Padding number in unfolding. Default: 0.
dilation (int) – Dilation number in unfolding. Default: 1.
normalize (bool) – If True, the unfolded feature will be normalized. Default: False.
return_cols (bool) – The official implementation in PyTorch of unfolding will return features with shape of (n, c*$prod{kernel_size}$, L). If True, the features will be reshaped to (n, L, c, kernel_size, kernel_size). Otherwise, the results will maintain the shape as the official implementation.

Returns

Unfolded columns. If return_cols is True, the shape of output tensor is (n, L, c, kernel_size, kernel_size). Otherwise, the shape will be (n, c*$prod{kernel_size}$, L).

Return type

torch.Tensor

mask_correlation_map(correlation_map, mask)[source]¶

Add mask weight for correlation map.

Add a negative infinity number to the masked regions so that softmax function will result in ‘zero’ in those regions.

Parameters

correlation_map (torch.Tensor) – Correlation map with shape of (n, h_unfold*w_unfold, h_map, w_map).
mask (torch.Tensor) – Mask tensor with shape of (n, c, h, w). ‘1’ in the mask indicates masked region while ‘0’ indicates valid region.

Returns

Updated correlation map with mask.

Return type

torch.Tensor

patch_copy_deconv(attention_score, context_filter)[source]¶

Copy patches using deconv.

Parameters

attention_score (torch.Tensor) – Tensor with shape of (n, l , h, w).
context_filter (torch.Tensor) – Filter kernel.

Returns

Tensor with shape of (n, c, h, w).

Return type

torch.Tensor

patch_correlation(x, kernel)[source]¶

Calculate patch correlation.

Parameters

x (torch.Tensor) – Input tensor.
kernel (torch.Tensor) – Kernel tensor.

Returns

Tensor with shape of (n, l, h, w).

Return type

torch.Tensor

class mmedit.models.common.DepthwiseSeparableConvModule(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, norm_cfg=None, act_cfg={'type': 'ReLU'}, dw_norm_cfg='default', dw_act_cfg='default', pw_norm_cfg='default', pw_act_cfg='default', **kwargs)[source]¶

Depthwise separable convolution module.

See https://arxiv.org/pdf/1704.04861.pdf for details.

This module can replace a ConvModule with the conv block replaced by two conv block: depthwise conv block and pointwise conv block. The depthwise conv block contains depthwise-conv/norm/activation layers. The pointwise conv block contains pointwise-conv/norm/activation layers. It should be noted that there will be norm/activation layer in the depthwise conv block if norm_cfg and act_cfg are specified.

Parameters

in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
stride (int or tuple[int]) – Same as nn.Conv2d. Default: 1.
padding (int or tuple[int]) – Same as nn.Conv2d. Default: 0.
dilation (int or tuple[int]) – Same as nn.Conv2d. Default: 1.
norm_cfg (dict) – Default norm config for both depthwise ConvModule and pointwise ConvModule. Default: None.
act_cfg (dict) – Default activation config for both depthwise ConvModule and pointwise ConvModule. Default: dict(type=’ReLU’).
dw_norm_cfg (dict) – Norm config of depthwise ConvModule. If it is ‘default’, it will be the same as norm_cfg. Default: ‘default’.
dw_act_cfg (dict) – Activation config of depthwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: ‘default’.
pw_norm_cfg (dict) – Norm config of pointwise ConvModule. If it is ‘default’, it will be the same as norm_cfg. Default: ‘default’.
pw_act_cfg (dict) – Activation config of pointwise ConvModule. If it is ‘default’, it will be the same as act_cfg. Default: ‘default’.
kwargs (optional) – Other shared arguments for depthwise and pointwise ConvModule. See ConvModule for ref.

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (N, C, H, W).
Returns: Output tensor.
Return type: Tensor

class mmedit.models.common.GANImageBuffer(buffer_size, buffer_ratio=0.5)[source]¶

This class implements an image buffer that stores previously generated images.

This buffer allows us to update the discriminator using a history of generated images rather than the ones produced by the latest generator to reduce model oscillation.

Parameters

buffer_size (int) – The size of image buffer. If buffer_size = 0, no buffer will be created.
buffer_ratio (float) – The chance / possibility to use the images previously stored in the buffer.

query(images)[source]¶

Query current image batch using a history of generated images.

Parameters: images (Tensor) – Current image batch without history information.

class mmedit.models.common.GCAModule(in_channels, out_channels, kernel_size=3, stride=1, rate=2, pad_args={'mode': 'reflect'}, interpolation='nearest', penalty=- 10000.0, eps=0.0001)[source]¶

Guided Contextual Attention Module.

From https://arxiv.org/pdf/2001.04069.pdf. Based on https://github.com/nbei/Deep-Flow-Guided-Video-Inpainting. This module use image feature map to augment the alpha feature map with guided contextual attention score.

Image feature and alpha feature are unfolded to small patches and later used as conv kernel. Thus, we refer the unfolding size as kernel size. Image feature patches have a default kernel size 3 while the kernel size of alpha feature patches could be specified by rate (see rate below). The image feature patches are used to convolve with the image feature itself to calculate the contextual attention. Then the attention feature map is convolved by alpha feature patches to obtain the attentioned alpha feature. At last, the attentioned alpah feature is added to the input alpha feature.

Parameters

in_channels (int) – Input channels of the guided contextual attention module.
out_channels (int) – Output channels of the guided contextual attention module.
kernel_size (int) – Kernel size of image feature patches. Default 3.
stride (int) – Stride when unfolding the image feature. Default 1.
rate (int) – The downsample rate of image feature map. The corresponding kernel size and stride of alpha feature patches will be rate x 2 and rate. It could be regarded as the granularity of the gca module. Default: 2.
pad_args (dict) – Parameters of padding when convolve image feature with image feature patches or alpha feature patches. Allowed keys are mode and value. See torch.nn.functional.pad() for more information. Default: dict(mode=’reflect’).
interpolation (str) – Interpolation method in upsampling and downsampling.
penalty (float) – Punishment hyperparameter to avoid a large correlation between each unknown patch and itself.
eps (float) – A small number to avoid dividing by 0 when calculating the normed image feature patch. Default: 1e-4.

compute_guided_attention_score(similarity_map, unknown_ps, scale, self_mask)[source]¶

Compute guided attention score.

Parameters

similarity_map (Tensor) – Similarity map of image feature with shape (1, img_h*img_w, img_h, img_w).
unknown_ps (Tensor) – Unknown area patches tensor of shape (1, img_h*img_w, 1, 1).
scale (Tensor) – Softmax scale of known and unknown area: [unknown_scale, known_scale].
self_mask (Tensor) – Self correlation mask of shape (1, img_h*img_w, img_h, img_w). At (1, i*i, i, i) mask value equals -1e4 for i in [1, img_h*img_w] and other area is all zero.

Returns

Similarity map between image feature patches with shape (1, img_h*img_w, img_h, img_w).

Return type

Tensor

compute_similarity_map(img_feat, img_ps)[source]¶

Compute similarity between image feature patches.

Parameters

img_feat (Tensor) – Image feature map of shape (1, img_c, img_h, img_w).
img_ps (Tensor) – Image feature patches tensor of shape (1, img_h*img_w, img_c, img_ks, img_ks).

Returns

Similarity map between image feature patches with shape (1, img_h*img_w, img_h, img_w).

Return type

Tensor

extract_feature_maps_patches(img_feat, alpha_feat, unknown)[source]¶

Extract image feature, alpha feature unknown patches.

Parameters

img_feat (Tensor) – Image feature map of shape (N, img_c, img_h, img_w).
alpha_feat (Tensor) – Alpha feature map of shape (N, alpha_c, ori_h, ori_w).
unknown (Tensor, optional) – Unknown area map generated by trimap of shape (N, 1, img_h, img_w).

Returns

3-tuple of

Tensor: Image feature patches of shape (N, img_h*img_w, img_c, img_ks, img_ks).

Tensor: Guided contextual attentioned alpha feature map. (N, img_h*img_w, alpha_c, alpha_ks, alpha_ks).

Tensor: Unknown mask of shape (N, img_h*img_w, 1, 1).

Return type

tuple

extract_patches(x, kernel_size, stride)[source]¶

Extract feature patches.

The feature map will be padded automatically to make sure the number of patches is equal to (H / stride) * (W / stride).

Parameters

x (Tensor) – Feature map of shape (N, C, H, W).
kernel_size (int) – Size of each patches.
stride (int) – Stride between patches.

Returns

Extracted patches of shape (N, (H / stride) * (W / stride) , C, kernel_size, kernel_size).

Return type

Tensor

forward(img_feat, alpha_feat, unknown=None, softmax_scale=1.0)[source]¶

Forward function of GCAModule.

Parameters

img_feat (Tensor) – Image feature map of shape (N, ori_c, ori_h, ori_w).
alpha_feat (Tensor) – Alpha feature map of shape (N, alpha_c, ori_h, ori_w).
unknown (Tensor, optional) – Unknown area map generated by trimap. If specified, this tensor should have shape (N, 1, ori_h, ori_w).
softmax_scale (float, optional) – The softmax scale of the attention if unknown area is not provided in forward. Default: 1.

Returns

The augmented alpha feature.

Return type

Tensor

process_unknown_mask(unknown, img_feat, softmax_scale)[source]¶

Process unknown mask.

Parameters

unknown (Tensor, optional) – Unknown area map generated by trimap of shape (N, 1, ori_h, ori_w)
img_feat (Tensor) – The interpolated image feature map of shape (N, img_c, img_h, img_w).
softmax_scale (float, optional) – The softmax scale of the attention if unknown area is not provided in forward. Default: 1.

Returns

2-tuple of

Tensor: Interpolated unknown area map of shape (N, img_h*img_w, img_h, img_w).

Tensor: Softmax scale tensor of known and unknown area of shape (N, 2).

Return type

tuple

propagate_alpha_feature(gca_score, alpha_ps)[source]¶

Propagate alpha feature based on guided attention score.

Parameters

gca_score (Tensor) – Guided attention score map of shape (1, img_h*img_w, img_h, img_w).
alpha_ps (Tensor) – Alpha feature patches tensor of shape (1, img_h*img_w, alpha_c, alpha_ks, alpha_ks).

Returns

Propagted alpha feature map of shape (1, alpha_c, alpha_h, alpha_w).

Return type

Tensor

class mmedit.models.common.LinearModule(in_features, out_features, bias=True, act_cfg={'type': 'ReLU'}, inplace=True, with_spectral_norm=False, order=('linear', 'act'))[source]¶

A linear block that contains linear/norm/activation layers.

For low level visioin, we add spectral norm and padding layer.

Parameters

in_features (int) – Same as nn.Linear.
out_features (int) – Same as nn.Linear.
bias (bool) – Same as nn.Linear.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
inplace (bool) – Whether to use inplace mode for activation.
with_spectral_norm (bool) – Whether use spectral norm in linear module.
order (tuple[str]) – The order of linear/activation layers. It is a sequence of “linear”, “norm” and “act”. Examples are (“linear”, “act”) and (“act”, “linear”).

forward(x, activate=True)[source]¶

Foward Function.

Parameters

x (torch.Tensor) – Input tensor with shape of (n, *, # noqa: W605 c). Same as torch.nn.Linear.
activate (bool, optional) – Whether to use activation layer. Defaults to True.

Returns

Same as torch.nn.Linear.

Return type

torch.Tensor

class mmedit.models.common.MaskConvModule(*args, **kwargs)[source]¶

Mask convolution module.

This is a simple wrapper for mask convolution like: ‘partial conv’. Convolutions in this module always need a mask as extra input.

Parameters

in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
stride (int or tuple[int]) – Same as nn.Conv2d.
padding (int or tuple[int]) – Same as nn.Conv2d.
dilation (int or tuple[int]) – Same as nn.Conv2d.
groups (int) – Same as nn.Conv2d.
bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.
conv_cfg (dict) – Config dict for convolution layer.
norm_cfg (dict) – Config dict for normalization layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
inplace (bool) – Whether to use inplace mode for activation.
with_spectral_norm (bool) – Whether use spectral norm in conv module.
padding_mode (str) – If the padding_mode has not been supported by current Conv2d in Pytorch, we will use our own padding layer instead. Currently, we support [‘zeros’, ‘circular’] with official implementation and [‘reflect’] with our own implementaion. Default: ‘zeros’.
order (tuple[str]) – The order of conv/norm/activation layers. It is a sequence of “conv”, “norm” and “act”. Examples are (“conv”, “norm”, “act”) and (“act”, “conv”, “norm”).

forward(x, mask=None, activate=True, norm=True, return_mask=True)[source]¶

Forward function for partial conv2d.

Parameters

input (torch.Tensor) – Tensor with shape of (n, c, h, w).
mask (torch.Tensor) – Tensor with shape of (n, c, h, w) or (n, 1, h, w). If mask is not given, the function will work as standard conv2d. Default: None.
activate (bool) – Whether use activation layer.
norm (bool) – Whether use norm layer.
return_mask (bool) – If True and mask is not None, the updated mask will be returned. Default: True.

Returns

Result Tensor or 2-tuple of

Tensor: Results after partial conv.

Tensor: Updated mask will be returned if mask is given and return_mask is True.

Return type

Tensor or tuple

class mmedit.models.common.PartialConv2d(*args, multi_channel=False, eps=1e-08, **kwargs)[source]¶

Implementation for partial convolution.

Image Inpainting for Irregular Holes Using Partial Convolutions [https://arxiv.org/abs/1804.07723]

Parameters

multi_channel (bool) – If True, the mask is multi-channle. Otherwise, the mask is single-channel.
eps (float) – Need to be changed for mixed precision training. For mixed precision training, you need change 1e-8 to 1e-6.

forward(input, mask=None, return_mask=True)[source]¶

Forward function for partial conv2d.

Parameters

input (torch.Tensor) – Tensor with shape of (n, c, h, w).
mask (torch.Tensor) – Tensor with shape of (n, c, h, w) or (n, 1, h, w). If mask is not given, the function will work as standard conv2d. Default: None.
return_mask (bool) – If True and mask is not None, the updated mask will be returned. Default: True.

Returns

Results after partial conv. torch.Tensor : Updated mask will be returned if mask is given and return_mask is True.

Return type

torch.Tensor

class mmedit.models.common.PixelShufflePack(in_channels, out_channels, scale_factor, upsample_kernel)[source]¶

Pixel Shuffle upsample layer.

Parameters

in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
scale_factor (int) – Upsample ratio.
upsample_kernel (int) – Kernel size of Conv layer to expand channels.

Returns

Upsampled feature map.

forward(x)[source]¶

Forward function for PixelShufflePack.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

init_weights()[source]¶: Initialize weights for PixelShufflePack.

class mmedit.models.common.ResidualBlockNoBN(mid_channels=64, res_scale=1.0)[source]¶

Residual block without BN.

It has a style of:

---Conv-ReLU-Conv-+-
 |________________|

Parameters

mid_channels (int) – Channel number of intermediate features. Default: 64.
res_scale (float) – Used to scale the residual before addition. Default: 1.0.

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

init_weights()[source]¶

Initialize weights for ResidualBlockNoBN.

Initialization methods like kaiming_init are for VGG-style modules. For modules with residual paths, using smaller std is better for stability and performance. We empirically use 0.1. See more details in “ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks”

class mmedit.models.common.ResidualBlockWithDropout(channels, padding_mode, norm_cfg={'type': 'BN'}, use_dropout=True)[source]¶

Define a Residual Block with dropout layers.

Ref: Deep Residual Learning for Image Recognition

A residual block is a conv block with skip connections. A dropout layer is added between two common conv modules.

Parameters

channels (int) – Number of channels in the conv layer.
padding_mode (str) – The name of padding layer: ‘reflect’ | ‘replicate’ | ‘zeros’.
norm_cfg (dict) – Config dict to build norm layer. Default: dict(type=’IN’).
use_dropout (bool) – Whether to use dropout layers. Default: True.

forward(x)[source]¶

Forward function. Add skip connections without final ReLU.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

class mmedit.models.common.SimpleGatedConvModule(in_channels, out_channels, kernel_size, feat_act_cfg={'type': 'ELU'}, gate_act_cfg={'type': 'Sigmoid'}, **kwargs)[source]¶

Simple Gated Convolutional Module.

This module is a simple gated convolutional module. The detailed formula is:

\[y = \phi(conv1(x)) * \sigma(conv2(x)),\]

where phi is the feature activation function and sigma is the gate activation function. In default, the gate activation function is sigmoid.

Parameters

in_channels (int) – Same as nn.Conv2d.
out_channels (int) – The number of channels of the output feature. Note that out_channels in the conv module is doubled since this module contains two convolutions for feature and gate seperately.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
feat_act_cfg (dict) – Config dict for feature activation layer.
gate_act_cfg (dict) – Config dict for gate activation layer.
kwargs (keyword arguments) – Same as ConvModule.

forward(x)[source]¶

Forward Function.

Parameters: x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
Returns: Output tensor with shape of (n, c, h’, w’).
Return type: torch.Tensor

class mmedit.models.common.UnetSkipConnectionBlock(outer_channels, inner_channels, in_channels=None, submodule=None, is_outermost=False, is_innermost=False, norm_cfg={'type': 'BN'}, use_dropout=False)[source]¶

Construct a Unet submodule with skip connections, with the following structure: downsampling - submodule - upsampling.

Parameters

outer_channels (int) – Number of channels at the outer conv layer.
inner_channels (int) – Number of channels at the inner conv layer.
in_channels (int) – Number of channels in input images/features. If is None, equals to outer_channels. Default: None.
submodule (UnetSkipConnectionBlock) – Previously constructed submodule. Default: None.
is_outermost (bool) – Whether this module is the outermost module. Default: False.
is_innermost (bool) – Whether this module is the innermost module. Default: False.
norm_cfg (dict) – Config dict to build norm layer. Default: dict(type=’BN’).
use_dropout (bool) – Whether to use dropout layers. Default: False.

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

mmedit.models.common.default_init_weights(module, scale=1)[source]¶

Initialize network weights.

Parameters

modules (nn.Module) – Modules to be initialized.
scale (float) – Scale initialized weights, especially for residual blocks.

mmedit.models.common.extract_around_bbox(img, bbox, target_size, channel_first=True)[source]¶

Extract patches around the given bbox.

Parameters

bbox (np.ndarray | torch.Tensor) – Bboxes to be modified. Bbox can be in batch or not.
target_size (List(int)) – Target size of final bbox.

Returns

Extracted patches. The dimension of the output should be the same as img.

Return type

(torch.Tensor | numpy.array)

mmedit.models.common.extract_bbox_patch(bbox, img, channel_first=True)[source]¶

Extract patch from a given bbox

Parameters

bbox (torch.Tensor | numpy.array) – Bbox with (top, left, h, w). If img has batch dimension, the bbox must be stacked at first dimension. The shape should be (4,) or (n, 4).
img (torch.Tensor | numpy.array) – Image data to be extracted. If organized in batch dimension, the batch dimension must be the first order like (n, h, w, c) or (n, c, h, w).
channel_first (bool) – If True, the channel dimension of img is before height and width, e.g. (c, h, w). Otherwise, the img shape (samples in the batch) is like (h, w, c).

Returns

Extracted patches. The dimension of the output should be the same as img.

Return type

(torch.Tensor | numpy.array)

mmedit.models.common.flow_warp(x, flow, interpolation='bilinear', padding_mode='zeros', align_corners=True)[source]¶

Warp an image or a feature map with optical flow.

Parameters

x (Tensor) – Tensor with size (n, c, h, w).
flow (Tensor) – Tensor with size (n, h, w, 2). The last dimension is a two-channel, denoting the width and height relative offsets. Note that the values are not normalized to [-1, 1].
interpolation (str) – Interpolation mode: ‘nearest’ or ‘bilinear’. Default: ‘bilinear’.
padding_mode (str) – Padding mode: ‘zeros’ or ‘border’ or ‘reflection’. Default: ‘zeros’.
align_corners (bool) – Whether align corners. Default: True.

Returns

Warped image or feature map.

Return type

Tensor

mmedit.models.common.generation_init_weights(module, init_type='normal', init_gain=0.02)[source]¶

Default initialization of network weights for image generation.

By default, we use normal init, but xavier and kaiming might work better for some applications.

Parameters

module (nn.Module) – Module to be initialized.
init_type (str) – The name of an initialization method: normal | xavier | kaiming | orthogonal.
init_gain (float) – Scaling factor for normal, xavier and orthogonal.

mmedit.models.common.make_layer(block, num_blocks, **kwarg)[source]¶

Make layers by stacking the same blocks.

Parameters

block (nn.module) – nn.module class for basic block.
num_blocks (int) – number of blocks.

Returns

Stacked blocks in nn.Sequential.

Return type

nn.Sequential

mmedit.models.common.scale_bbox(bbox, target_size)[source]¶

Modify bbox to target size.

The original bbox will be enlarged to the target size with the original bbox in the center of the new bbox.

Parameters

bbox (np.ndarray | torch.Tensor) – Bboxes to be modified. Bbox can be in batch or not. The shape should be (4,) or (n, 4).
target_size (tuple[int]) – Target size of final bbox.

Returns

Modified bboxes.

Return type

(np.ndarray | torch.Tensor)

mmedit.models.common.set_requires_grad(nets, requires_grad=False)[source]¶

Set requies_grad for all the networks.

Parameters

nets (nn.Module | list[nn.Module]) – A list of networks or a single network.
requires_grad (bool) – Whether the networks require gradients or not

backbones¶

class mmedit.models.backbones.ContextualAttentionNeck(in_channels, conv_type='conv', conv_cfg=None, norm_cfg=None, act_cfg={'type': 'ELU'}, contextual_attention_args={'softmax_scale': 10.0}, **kwargs)[source]¶

Neck with contextual attention module.

Parameters

in_channels (int) – The number of input channels.
conv_type (str) – The type of conv module. In DeepFillv1 model, the conv_type should be ‘conv’. In DeepFillv2 model, the conv_type should be ‘gated_conv’.
conv_cfg (dict | None) – Config of conv module. Default: None.
norm_cfg (dict | None) – Config of norm module. Default: None.
act_cfg (dict | None) – Config of activation layer. Default: dict(type=’ELU’).
contextual_attention_args (dict) – Config of contextual attention module. Default: dict(softmax_scale=10.).
kwargs (keyword arguments) –

forward(x, mask)[source]¶

Forward Function.

Parameters

x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
mask (torch.Tensor) – Input tensor with shape of (n, 1, h, w).

Returns

Output tensor with shape of (n, c, h’, w’).

Return type

torch.Tensor

class mmedit.models.backbones.DeepFillDecoder(in_channels, conv_type='conv', norm_cfg=None, act_cfg={'type': 'ELU'}, out_act_cfg={'max': 1.0, 'min': - 1.0, 'type': 'clip'}, channel_factor=1.0, **kwargs)[source]¶

Decoder used in DeepFill model.

This implementation follows: Generative Image Inpainting with Contextual Attention

Parameters

in_channels (int) – The number of input channels.
conv_type (str) – The type of conv module. In DeepFillv1 model, the conv_type should be ‘conv’. In DeepFillv2 model, the conv_type should be ‘gated_conv’.
norm_cfg (dict) – Config dict to build norm layer. Default: None.
act_cfg (dict) – Config dict for activation layer, “elu” by default.
out_act_cfg (dict) – Config dict for output activation layer. Here, we provide commonly used clamp or clip operation.
channel_factor (float) – The scale factor for channel size. Default: 1.
kwargs (keyword arguments) –

forward(input_dict)[source]¶

Forward Function.

Parameters: input_dict (dict | torch.Tensor) – Input dict with middle features or torch.Tensor.
Returns: Output tensor with shape of (n, c, h, w).
Return type: torch.Tensor

class mmedit.models.backbones.DeepFillEncoder(in_channels=5, conv_type='conv', norm_cfg=None, act_cfg={'type': 'ELU'}, encoder_type='stage1', channel_factor=1.0, **kwargs)[source]¶

Encoder used in DeepFill model.

This implementation follows: Generative Image Inpainting with Contextual Attention

Parameters

in_channels (int) – The number of input channels. Default: 5.
conv_type (str) – The type of conv module. In DeepFillv1 model, the conv_type should be ‘conv’. In DeepFillv2 model, the conv_type should be ‘gated_conv’.
norm_cfg (dict) – Config dict to build norm layer. Default: None.
act_cfg (dict) – Config dict for activation layer, “elu” by default.
encoder_type (str) – Type of the encoder. Should be one of [‘stage1’, ‘stage2_conv’, ‘stage2_attention’]. Default: ‘stage1’.
channel_factor (float) – The scale factor for channel size. Default: 1.
kwargs (keyword arguments) –

forward(x)[source]¶

Forward Function.

Parameters: x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
Returns: Output tensor with shape of (n, c, h’, w’).
Return type: torch.Tensor

class mmedit.models.backbones.DeepFillEncoderDecoder(stage1={'decoder': {'in_channels': 128, 'type': 'DeepFillDecoder'}, 'dilation_neck': {'act_cfg': {'type': 'ELU'}, 'in_channels': 128, 'type': 'GLDilationNeck'}, 'encoder': {'type': 'DeepFillEncoder'}, 'type': 'GLEncoderDecoder'}, stage2={'type': 'DeepFillRefiner'}, return_offset=False)[source]¶

Two-stage encoder-decoder structure used in DeepFill model.

The details are in: Generative Image Inpainting with Contextual Attention

Parameters

stage1 (dict) – Config dict for building stage1 model. As DeepFill model uses Global&Local model as baseline in first stage, the stage1 model can be easily built with GLEncoderDecoder.
stage2 (dict) – Config dict for building stage2 model.
return_offset (bool) – Whether to return offset feature in contextual attention module. Default: False.

forward(x)[source]¶

Forward function.

Parameters: x (torch.Tensor) – This input tensor has the shape of (n, 5, h, w). In channel dimension, we concatenate [masked_img, ones, mask] as DeepFillv1 models do.
Returns: The first two item is the results from first and second stage. If set return_offset as True, the offset will be returned as the third item.
Return type: tuple[torch.Tensor]

init_weights(pretrained=None)[source]¶

Init weights for models.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.

class mmedit.models.backbones.DepthwiseIndexBlock(in_channels, norm_cfg={'type': 'BN'}, use_context=False, use_nonlinear=False, mode='o2o')[source]¶

Depthwise index block.

From https://arxiv.org/abs/1908.00672.

Parameters

in_channels (int) – Input channels of the holistic index block.
kernel_size (int) – Kernel size of the conv layers. Default: 2.
padding (int) – Padding number of the conv layers. Default: 0.
mode (str) – Mode of index block. Should be ‘o2o’ or ‘m2o’. In ‘o2o’ mode, the group of the conv layers is 1; In ‘m2o’ mode, the group of the conv layer is in_channels.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
use_nonlinear (bool) – Whether add a non-linear conv layer in the index blocks. Default: False.

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input feature map with shape (N, C, H, W).
Returns: Encoder index feature and decoder index feature.
Return type: tuple(Tensor)

class mmedit.models.backbones.EDSR(in_channels, out_channels, mid_channels=64, num_blocks=16, upscale_factor=4, res_scale=1, rgb_mean=(0.4488, 0.4371, 0.404), rgb_std=(1.0, 1.0, 1.0))[source]¶

EDSR network structure.

Paper: Enhanced Deep Residual Networks for Single Image Super-Resolution. Ref repo: https://github.com/thstkdgus35/EDSR-PyTorch

Parameters

in_channels (int) – Channel number of inputs.
out_channels (int) – Channel number of outputs.
mid_channels (int) – Channel number of intermediate features. Default: 64.
num_blocks (int) – Block number in the trunk network. Default: 16.
upscale_factor (int) – Upsampling factor. Support 2^n and 3. Default: 4.
res_scale (float) – Used to scale the residual in residual block. Default: 1.
rgb_mean (tuple[float]) – Image mean in RGB orders. Default: (0.4488, 0.4371, 0.4040), calculated from DIV2K dataset.
rgb_std (tuple[float]) – Image std in RGB orders. In EDSR, it uses (1.0, 1.0, 1.0). Default: (1.0, 1.0, 1.0).

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

init_weights(pretrained=None, strict=True)[source]¶

Init weights for models.

Parameters

pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
strict (boo, optional) – Whether strictly load the pretrained model. Defaults to True.

class mmedit.models.backbones.EDVRNet(in_channels, out_channels, mid_channels=64, num_frames=5, deform_groups=8, num_blocks_extraction=5, num_blocks_reconstruction=10, center_frame_idx=2, with_tsa=True)[source]¶

EDVR network structure for video super-resolution.

Now only support X4 upsampling factor. Paper: EDVR: Video Restoration with Enhanced Deformable Convolutional Networks.

Parameters

in_channels (int) – Channel number of inputs.
out_channels (int) – Channel number of outputs.
mid_channels (int) – Channel number of intermediate features. Default: 64.
num_frames (int) – Number of input frames. Default: 5.
deform_groups (int) – Deformable groups. Defaults: 8.
num_blocks_extraction (int) – Number of blocks for feature extraction. Default: 5.
num_blocks_reconstruction (int) – Number of blocks for reconstruction. Default: 10.
center_frame_idx (int) – The index of center frame. Frame counting from 0. Default: 2.
with_tsa (bool) – Whether to use TSA module. Default: True.

forward(x)[source]¶

Forward function for EDVRNet.

Parameters: x (Tensor) – Input tensor with shape (n, t, c, h, w).
Returns: SR center frame with shape (n, c, h, w).
Return type: Tensor

init_weights(pretrained=None, strict=True)[source]¶

Init weights for models.

Parameters

pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
strict (boo, optional) – Whether strictly load the pretrained model. Defaults to True.

class mmedit.models.backbones.GLDecoder(in_channels=256, norm_cfg=None, act_cfg={'type': 'ReLU'}, out_act='clip')[source]¶

Decoder used in Global&Local model.

This implementation follows: Globally and locally Consistent Image Completion

Parameters

in_channels (int) – Channel number of input feature.
norm_cfg (dict) – Config dict to build norm layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
out_act (str) – Output activation type, “clip” by default. Noted that in our implementation, we clip the output with range [-1, 1].

forward(x)[source]¶

Forward Function.

Parameters: x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
Returns: Output tensor with shape of (n, c, h’, w’).
Return type: torch.Tensor

class mmedit.models.backbones.GLDilationNeck(in_channels=256, conv_type='conv', norm_cfg=None, act_cfg={'type': 'ReLU'}, **kwargs)[source]¶

Dilation Backbone used in Global&Local model.

This implementation follows: Globally and locally Consistent Image Completion

Parameters

in_channels (int) – Channel number of input feature.
conv_type (str) – The type of conv module. In DeepFillv1 model, the conv_type should be ‘conv’. In DeepFillv2 model, the conv_type should be ‘gated_conv’.
norm_cfg (dict) – Config dict to build norm layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
kwargs (keyword arguments) –

forward(x)[source]¶

Forward Function.

Parameters: x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
Returns: Output tensor with shape of (n, c, h’, w’).
Return type: torch.Tensor

class mmedit.models.backbones.GLEncoder(norm_cfg=None, act_cfg={'type': 'ReLU'})[source]¶

Encoder used in Global&Local model.

This implementation follows: Globally and locally Consistent Image Completion

Parameters

norm_cfg (dict) – Config dict to build norm layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.

forward(x)[source]¶

Forward Function.

Parameters: x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
Returns: Output tensor with shape of (n, c, h’, w’).
Return type: torch.Tensor

class mmedit.models.backbones.GLEncoderDecoder(encoder={'type': 'GLEncoder'}, decoder={'type': 'GLDecoder'}, dilation_neck={'type': 'GLDilationNeck'})[source]¶

Encoder-Decoder used in Global&Local model.

This implementation follows: Globally and locally Consistent Image Completion

The architecture of the encoder-decoder is: (conv2d x 6) –> (dilated conv2d x 4) –> (conv2d or deconv2d x 7)

Parameters

encoder (dict) – Config dict to encoder.
decoder (dict) – Config dict to build decoder.
dilation_neck (dict) – Config dict to build dilation neck.

forward(x)[source]¶

Forward Function.

Parameters: x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
Returns: Output tensor with shape of (n, c, h’, w’).
Return type: torch.Tensor

init_weights(pretrained=None)[source]¶

Init weights for models.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.

class mmedit.models.backbones.HolisticIndexBlock(in_channels, norm_cfg={'type': 'BN'}, use_context=False, use_nonlinear=False)[source]¶

Holistic Index Block.

From https://arxiv.org/abs/1908.00672.

Parameters

in_channels (int) – Input channels of the holistic index block.
kernel_size (int) – Kernel size of the conv layers. Default: 2.
padding (int) – Padding number of the conv layers. Default: 0.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
use_nonlinear (bool) – Whether add a non-linear conv layer in the index block. Default: False.

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input feature map with shape (N, C, H, W).
Returns: Encoder index feature and decoder index feature.
Return type: tuple(Tensor)

class mmedit.models.backbones.IndexNetDecoder(in_channels, kernel_size=5, norm_cfg={'type': 'BN'}, separable_conv=False)[source]¶

forward(inputs)[source]¶

Forward function.

Parameters: inputs (dict) – Output dict of IndexNetEncoder.
Returns: Predicted alpha matte of the current batch.
Return type: Tensor

init_weights()[source]¶: Init weights for the module.

class mmedit.models.backbones.IndexNetEncoder(in_channels, out_stride=32, width_mult=1, index_mode='m2o', aspp=True, norm_cfg={'type': 'BN'}, freeze_bn=False, use_nonlinear=True, use_context=True)[source]¶

Encoder for IndexNet.

Please refer to https://arxiv.org/abs/1908.00672.

Parameters

in_channels (int, optional) – Input channels of the encoder.
out_stride (int, optional) – Output stride of the encoder. For example, if out_stride is 32, the input feature map or image will be downsample to the 1/32 of original size. Defaults to 32.
width_mult (int, optional) – Width multiplication factor of channel dimension in MobileNetV2. Defaults to 1.
index_mode (str, optional) – Index mode of the index network. It must be one of {‘holistic’, ‘o2o’, ‘m2o’}. If it is set to ‘holistic’, then Holistic index network will be used as the index network. If it is set to ‘o2o’ (or ‘m2o’), when O2O (or M2O) Depthwise index network will be used as the index network. Defaults to ‘m2o’.
aspp (bool, optional) – Whether use ASPP module to augment output feature. Defaults to True.
norm_cfg (None | dict, optional) – Config dict for normalization layer. Defaults to dict(type=’BN’).
freeze_bn (bool, optional) – Whether freeze batch norm layer. Defaults to False.
use_nonlinear (bool, optional) – Whether use nonlinearty in index network. Refer to the paper for more information. Defaults to True.
use_context (bool, optional) – Whether use larger kernel size in index network. Refer to the paper for more information. Defaults to True.

Raises

ValueError – out_stride must 16 or 32.
NameError – Supported index_mode are {‘holistic’, ‘o2o’, ‘m2o’}.

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input feature map with shape (N, C, H, W).
Returns: Output tensor, shortcut feature and decoder index feature.
Return type: dict

freeze_bn()[source]¶: Set BatchNorm modules in the model to evaluation mode.

init_weights(pretrained=None)[source]¶

Init weights for the model.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.

class mmedit.models.backbones.IndexedUpsample(in_channels, out_channels, kernel_size=5, norm_cfg={'type': 'BN'}, conv_module=<class 'mmcv.cnn.bricks.conv_module.ConvModule'>)[source]¶

Indexed upsample module.

Parameters

in_channels (int) – Input channels.
out_channels (int) – Output channels.
kernel_size (int, optional) – Kernel size of the convolution layer. Defaults to 5.
norm_cfg (dict, optional) – Config dict for normalization layer. Defaults to dict(type=’BN’).
conv_module (ConvModule | DepthwiseSeparableConvModule, optional) – Conv module. Defaults to ConvModule.

forward(x, shortcut, dec_idx_feat=None)[source]¶

Forward function.

Parameters

x (Tensor) – Input feature map with shape (N, C, H, W).
shortcut (Tensor) – The shortcut connection with shape (N, C, H’, W’).
dec_idx_feat (Tensor, optional) – The decode index feature map with shape (N, C, H’, W’). Defaults to None.

Returns

Output tensor with shape (N, C, H’, W’).

Return type

Tensor

init_weights()[source]¶: Init weights for the module.

class mmedit.models.backbones.MSRResNet(in_channels, out_channels, mid_channels=64, num_blocks=16, upscale_factor=4)[source]¶

Modified SRResNet.

A compacted version modified from SRResNet in “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network”.

It uses residual blocks without BN, similar to EDSR. Currently, it supports x2, x3 and x4 upsampling scale factor.

Parameters

in_channels (int) – Channel number of inputs.
out_channels (int) – Channel number of outputs.
mid_channels (int) – Channel number of intermediate features. Default: 64.
num_blocks (int) – Block number in the trunk network. Default: 16.
upscale_factor (int) – Upsampling factor. Support x2, x3 and x4. Default: 4.

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

init_weights(pretrained=None, strict=True)[source]¶

Init weights for models.

Parameters

pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
strict (boo, optional) – Whether strictly load the pretrained model. Defaults to True.

class mmedit.models.backbones.PConvDecoder(num_layers=7, interpolation='nearest', conv_cfg={'multi_channel': True, 'type': 'PConv'}, norm_cfg={'type': 'BN'})[source]¶

Decoder with partial conv.

About the details for this architecture, pls see: Image Inpainting for Irregular Holes Using Partial Convolutions

Parameters

num_layers (int) – The number of convolutional layers. Default: 7.
interpolation (str) – The upsample mode. Default: ‘nearest’.
conv_cfg (dict) – Config for convolution module. Default: {‘type’: ‘PConv’, ‘multi_channel’: True}.
norm_cfg (dict) – Config for norm layer. Default: {‘type’: ‘BN’}.

forward(input_dict)[source]¶

Forward Function.

Parameters: input_dict (dict | torch.Tensor) – Input dict with middle features or torch.Tensor.
Returns: Output tensor with shape of (n, c, h, w).
Return type: torch.Tensor

class mmedit.models.backbones.PConvEncoder(in_channels=3, num_layers=7, conv_cfg={'multi_channel': True, 'type': 'PConv'}, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False)[source]¶

Encoder with partial conv.

About the details for this architecture, pls see: Image Inpainting for Irregular Holes Using Partial Convolutions

Parameters

in_channels (int) – The number of input channels. Default: 3.
num_layers (int) – The number of convolutional layers. Default 7.
conv_cfg (dict) – Config for convolution module. Default: {‘type’: ‘PConv’, ‘multi_channel’: True}.
norm_cfg (dict) – Config for norm layer. Default: {‘type’: ‘BN’}.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effective on Batch Norm and its variants only.

forward(x, mask)[source]¶

Forward function for partial conv encoder.

Parameters

x (torch.Tensor) – Masked image with shape (n, c, h, w).
mask (torch.Tensor) – Mask tensor with shape (n, c, h, w).

Returns

Contains the results and middle level features in this module. hidden_feats contain the middle feature maps and hidden_masks store updated masks.

Return type

dict

train(mode=True)[source]¶

Sets the module in training mode.

This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g. Dropout, BatchNorm, etc.

Parameters: mode (bool) – whether to set training mode (True) or evaluation mode (False). Default: True.
Returns: self
Return type: Module

class mmedit.models.backbones.PConvEncoderDecoder(encoder, decoder)[source]¶

Encoder-Decoder with partial conv module.

Parameters

encoder (dict) – Config of the encoder.
decoder (dict) – Config of the decoder.

forward(x, mask_in)[source]¶

Forward Function.

Parameters

x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
mask_in (torch.Tensor) – Input tensor with shape of (n, c, h, w).

Returns

Output tensor with shape of (n, c, h’, w’).

Return type

torch.Tensor

init_weights(pretrained=None)[source]¶

Init weights for models.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.

class mmedit.models.backbones.PlainDecoder(in_channels)[source]¶

Simple decoder from Deep Image Matting.

Parameters: in_channels (int) – Channel num of input features.

forward(inputs)[source]¶

Forward function of PlainDecoder.

Parameters

inputs (dict) –

Output dictionary of the VGG encoder containing:

out (Tensor): Output of the VGG encoder.
max_idx_1 (Tensor): Index of the first maxpooling layer in the VGG encoder.
max_idx_2 (Tensor): Index of the second maxpooling layer in the VGG encoder.
max_idx_3 (Tensor): Index of the third maxpooling layer in the VGG encoder.
max_idx_4 (Tensor): Index of the fourth maxpooling layer in the VGG encoder.
max_idx_5 (Tensor): Index of the fifth maxpooling layer in the VGG encoder.

Returns

Output tensor.

Return type

Tensor

init_weights()[source]¶: Init weights for the module.

class mmedit.models.backbones.RRDBNet(in_channels, out_channels, mid_channels=64, num_blocks=23, growth_channels=32)[source]¶

Networks consisting of Residual in Residual Dense Block, which is used in ESRGAN.

ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. Currently, it supports x4 upsampling scale factor.

Parameters

in_channels (int) – Channel number of inputs.
out_channels (int) – Channel number of outputs.
mid_channels (int) – Channel number of intermediate features. Default: 64
num_blocks (int) – Block number in the trunk network. Defaults: 23
growth_channels (int) – Channels for each growth. Default: 32.

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

init_weights(pretrained=None, strict=True)[source]¶

Init weights for models.

Parameters

pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
strict (boo, optional) – Whether strictly load the pretrained model. Defaults to True.

class mmedit.models.backbones.ResGCADecoder(block, layers, in_channels, kernel_size=3, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'inplace': True, 'negative_slope': 0.2, 'type': 'LeakyReLU'}, with_spectral_norm=False, late_downsample=False)[source]¶

ResNet decoder with shortcut connection and gca module.

feat1 ---------------------------------------- conv2 --- out
                                            |
feat2 ----------------------------------- conv1
                                       |
feat3 ------------------------------ layer4
                                  |
feat4, img_feat -- gca_module - layer3
                |
feat5 ------- layer2
           |
out ---  layer1

gca module also requires unknown tensor generated by trimap which is ignored in the above graph.

Parameters

block (str) – Type of residual block. Currently only BasicBlockDec is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Channel number of input features.
kernel_size (int) – Kernel size of the conv layers in the decoder.
conv_cfg (dict) – Dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
late_downsample (bool) – Whether to adopt late downsample strategy, Default: False.

forward(inputs)[source]¶

Forward function of resnet shortcut decoder.

Parameters

inputs (dict) –

Output dictionary of the ResGCAEncoder containing:

out (Tensor): Output of the ResGCAEncoder.
feat1 (Tensor): Shortcut connection from input image.
feat2 (Tensor): Shortcut connection from conv2 of ResGCAEncoder.
feat3 (Tensor): Shortcut connection from layer1 of ResGCAEncoder.
feat4 (Tensor): Shortcut connection from layer2 of ResGCAEncoder.
feat5 (Tensor): Shortcut connection from layer3 of ResGCAEncoder.
img_feat (Tensor): Image feature extracted by guidance head.
unknown (Tensor): Unknown tensor generated by trimap.

Returns

Output tensor.

Return type

Tensor

class mmedit.models.backbones.ResGCAEncoder(block, layers, in_channels, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, with_spectral_norm=False, late_downsample=False, order=('conv', 'act', 'norm'))[source]¶

ResNet backbone with shortcut connection and gca module.

image ---------------- shortcut[0] -------------- feat1
 |
conv1-conv2 ---------- shortcut[1] -------------- feat2
       |
     conv3-layer1 ---- shortcut[2] -------------- feat3
             |
             | image - guidance_conv ------------ img_feat
             |             |
            layer2 --- gca_module - shortcut[4] - feat4
                            |
                          layer3 -- shortcut[5] - feat5
                             |
                           layer4 --------------- out

gca module also requires unknown tensor generated by trimap which is ignored in the above graph.

Implementation of Natural Image Matting via Guided Contextual Attention https://arxiv.org/pdf/2001.04069.pdf.

Parameters

block (str) – Type of residual block. Currently only BasicBlock is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Number of input channels.
conv_cfg (dict) – Dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
late_downsample (bool) – Whether to adopt late downsample strategy. Default: False.
order (tuple[str]) – Order of conv, norm and act layer in shortcut convolution module. Default: (‘conv’, ‘act’, ‘norm’).

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (N, C, H, W).
Returns: Contains the output tensor, shortcut feature and intermediate feature.
Return type: dict

class mmedit.models.backbones.ResNetDec(block, layers, in_channels, kernel_size=3, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'inplace': True, 'negative_slope': 0.2, 'type': 'LeakyReLU'}, with_spectral_norm=False, late_downsample=False)[source]¶

ResNet decoder for image matting.

This class is adopted from https://github.com/Yaoyi-Li/GCA-Matting.

Parameters

block (str) – Type of residual block. Currently only BasicBlockDec is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Channel num of input features.
kernel_size (int) – Kernel size of the conv layers in the decoder.
conv_cfg (dict) – dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
with_spectral_norm (bool) – Whether use spectral norm after conv. Default: False.
late_downsample (bool) – Whether to adopt late downsample strategy, Default: False.

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (N, C, H, W).
Returns: Output tensor.
Return type: Tensor

init_weights()[source]¶: Init weights for the module.

class mmedit.models.backbones.ResNetEnc(block, layers, in_channels, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, with_spectral_norm=False, late_downsample=False)[source]¶

ResNet encoder for image matting.

This class is adopted from https://github.com/Yaoyi-Li/GCA-Matting. Implement and pre-train on ImageNet with the tricks from https://arxiv.org/abs/1812.01187 without the mix-up part.

Parameters

block (str) – Type of residual block. Currently only BasicBlock is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Number of input channels.
conv_cfg (dict) – dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
with_spectral_norm (bool) – Whether use spectral norm after conv. Default: False.
late_downsample (bool) – Whether to adopt late downsample strategy, Default: False.

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (N, C, H, W).
Returns: Output tensor.
Return type: Tensor

class mmedit.models.backbones.ResShortcutDec(block, layers, in_channels, kernel_size=3, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'inplace': True, 'negative_slope': 0.2, 'type': 'LeakyReLU'}, with_spectral_norm=False, late_downsample=False)[source]¶

ResNet decoder for image matting with shortcut connection.

feat1 --------------------------- conv2 --- out
                               |
feat2 ---------------------- conv1
                          |
feat3 ----------------- layer4
                     |
feat4 ------------ layer3
                |
feat5 ------- layer2
           |
out ---  layer1

Parameters

block (str) – Type of residual block. Currently only BasicBlockDec is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Channel number of input features.
kernel_size (int) – Kernel size of the conv layers in the decoder.
conv_cfg (dict) – Dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
late_downsample (bool) – Whether to adopt late downsample strategy, Default: False.

forward(inputs)[source]¶

Forward function of resnet shortcut decoder.

Parameters

inputs (dict) –

Output dictionary of the ResNetEnc containing:

out (Tensor): Output of the ResNetEnc.
feat1 (Tensor): Shortcut connection from input image.
feat2 (Tensor): Shortcut connection from conv2 of ResNetEnc.
feat3 (Tensor): Shortcut connection from layer1 of ResNetEnc.
feat4 (Tensor): Shortcut connection from layer2 of ResNetEnc.
feat5 (Tensor): Shortcut connection from layer3 of ResNetEnc.

Returns

Output tensor.

Return type

Tensor

class mmedit.models.backbones.ResShortcutEnc(block, layers, in_channels, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, with_spectral_norm=False, late_downsample=False, order=('conv', 'act', 'norm'))[source]¶

ResNet backbone for image matting with shortcut connection.

image ---------------- shortcut[0] --- feat1
  |
conv1-conv2 ---------- shortcut[1] --- feat2
       |
      conv3-layer1 --- shortcut[2] --- feat3
              |
             layer2 -- shortcut[4] --- feat4
               |
              layer3 - shortcut[5] --- feat5
                |
               layer4 ---------------- out

Baseline model of Natural Image Matting via Guided Contextual Attention https://arxiv.org/pdf/2001.04069.pdf.

Parameters

block (str) – Type of residual block. Currently only BasicBlock is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Number of input channels.
conv_cfg (dict) – Dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
with_spectral_norm (bool) – Whether use spectral norm after conv. Default: False.
late_downsample (bool) – Whether to adopt late downsample strategy. Default: False.
order (tuple[str]) – Order of conv, norm and act layer in shortcut convolution module. Default: (‘conv’, ‘act’, ‘norm’).

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (N, C, H, W).
Returns: Contains the output tensor and shortcut feature.
Return type: dict

class mmedit.models.backbones.ResnetGenerator(in_channels, out_channels, base_channels=64, norm_cfg={'type': 'IN'}, use_dropout=False, num_blocks=9, padding_mode='reflect', init_cfg={'gain': 0.02, 'type': 'normal'})[source]¶

Construct a Resnet-based generator that consists of residual blocks between a few downsampling/upsampling operations.

Parameters

in_channels (int) – Number of channels in input images.
out_channels (int) – Number of channels in output images.
base_channels (int) – Number of filters at the last conv layer. Default: 64.
norm_cfg (dict) – Config dict to build norm layer. Default: dict(type=’IN’).
use_dropout (bool) – Whether to use dropout layers. Default: False.
num_blocks (int) – Number of residual blocks. Default: 9.
padding_mode (str) – The name of padding layer in conv layers: ‘reflect’ | ‘replicate’ | ‘zeros’. Default: ‘reflect’.
init_cfg (dict) – Config dict for initialization. type: The name of our initialization method. Default: ‘normal’. gain: Scaling factor for normal, xavier and orthogonal. Default: 0.02.

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

init_weights(pretrained=None, strict=True)[source]¶

Initialize weights for the model.

Parameters

pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Default: None.
strict (bool, optional) – Whether to allow different params for the model and checkpoint. Default: True.

class mmedit.models.backbones.SRCNN(channels=(3, 64, 32, 3), kernel_sizes=(9, 1, 5), upscale_factor=4)[source]¶

SRCNN network structure for image super resolution.

SRCNN has three conv layers. For each layer, we can define the in_channels, out_channels and kernel_size. The input image will first be upsampled with a bicubic upsampler, and then super-resolved in the HR spatial size.

Paper: Learning a Deep Convolutional Network for Image Super-Resolution.

Parameters

channels (tuple[int]) – A tuple of channel numbers for each layer including channels of input and output . Default: (3, 64, 32, 3).
kernel_sizes (tuple[int]) – A tuple of kernel sizes for each conv layer. Default: (9, 1, 5).
upscale_factor (int) – Upsampling factor. Default: 4.

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

init_weights(pretrained=None, strict=True)[source]¶

Init weights for models.

Parameters

pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
strict (boo, optional) – Whether strictly load the pretrained model. Defaults to True.

class mmedit.models.backbones.SimpleEncoderDecoder(encoder, decoder)[source]¶

Simple encoder-decoder model from matting.

Parameters

encoder (dict) – Config of the encoder.
decoder (dict) – Config of the decoder.

forward(*args, **kwargs)[source]¶

Forward function.

Returns: The output tensor of the decoder.
Return type: Tensor

class mmedit.models.backbones.TOFlow(adapt_official_weights=False)[source]¶

PyTorch implementation of TOFlow.

In TOFlow, the LR frames are pre-upsampled and have the same size with the GT frames.

Paper: Xue et al., Video Enhancement with Task-Oriented Flow, IJCV 2018 Code reference:

Parameters: adapt_official_weights (bool) – Whether to adapt the weights translated from the official implementation. Set to false if you want to train from scratch. Default: False

denormalize(img)[source]¶

Denormalize the output image.

Parameters: img (Tensor) – Output image.
Returns: Denormalized image.
Return type: Tensor

forward(lrs)[source]¶

Parameters: lrs – Input lr frames: (b, 7, 3, h, w).
Returns: SR frame: (b, 3, h, w).
Return type: Tensor

init_weights(pretrained=None, strict=True)[source]¶

Init weights for models.

Parameters

pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
strict (boo, optional) – Whether strictly load the pretrained model. Defaults to True.

normalize(img)[source]¶

Normalize the input image.

Parameters: img (Tensor) – Input image.
Returns: Normalized image.
Return type: Tensor

class mmedit.models.backbones.UnetGenerator(in_channels, out_channels, num_down=8, base_channels=64, norm_cfg={'type': 'BN'}, use_dropout=False, init_cfg={'gain': 0.02, 'type': 'normal'})[source]¶

Construct the Unet-based generator from the innermost layer to the outermost layer, which is a recursive process.

Parameters

in_channels (int) – Number of channels in input images.
out_channels (int) – Number of channels in output images.
num_down (int) – Number of downsamplings in Unet. If num_down is 8, the image with size 256x256 will become 1x1 at the bottleneck. Default: 8.
base_channels (int) – Number of channels at the last conv layer. Default: 64.
norm_cfg (dict) – Config dict to build norm layer. Default: dict(type=’BN’).
use_dropout (bool) – Whether to use dropout layers. Default: False.
init_cfg (dict) – Config dict for initialization. type: The name of our initialization method. Default: ‘normal’. gain: Scaling factor for normal, xavier and orthogonal. Default: 0.02.

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

init_weights(pretrained=None, strict=True)[source]¶

Initialize weights for the model.

Parameters

pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Default: None.
strict (bool, optional) – Whether to allow different params for the model and checkpoint. Default: True.

class mmedit.models.backbones.VGG16(in_channels, batch_norm=False, aspp=False, dilations=None)[source]¶

Customed VGG16 Encoder.

A 1x1 conv is added after the original VGG16 conv layers. The indices of max pooling layers are returned for unpooling layers in decoders.

Parameters

in_channels (int) – Number of input channels.
batch_norm (bool, optional) – Whether use nn.BatchNorm2d. Default to False.
aspp (bool, optional) – Whether use ASPP module after the last conv layer. Default to False.
dilations (list[int], optional) – Atrous rates of ASPP module. Default to None.

forward(x)[source]¶

Forward function for ASPP module.

Parameters: x (Tensor) – Input tensor with shape (N, C, H, W).
Returns: Dict containing output tensor and maxpooling indices.
Return type: dict

components¶

class mmedit.models.components.DeepFillRefiner(encoder_attention={'encoder_type': 'stage2_attention', 'type': 'DeepFillEncoder'}, encoder_conv={'encoder_type': 'stage2_conv', 'type': 'DeepFillEncoder'}, dilation_neck={'act_cfg': {'type': 'ELU'}, 'in_channels': 128, 'type': 'GLDilationNeck'}, contextual_attention={'in_channels': 128, 'type': 'ContextualAttentionNeck'}, decoder={'in_channels': 256, 'type': 'DeepFillDecoder'})[source]¶

Refiner used in DeepFill model.

This implementation follows: Generative Image Inpainting with Contextual Attention.

Parameters

encoder_attention (dict) – Config dict for encoder used in branch with contextual attention module.
encoder_conv (dict) – Config dict for encoder used in branch with just convolutional operation.
dilation_neck (dict) – Config dict for dilation neck in branch with just convolutional operation.
contextual_attention (dict) – Config dict for contextual attention neck.
decoder (dict) – Config dict for decoder used to fuse and decode features.

forward(x, mask)[source]¶

Forward Function.

Parameters

x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
mask (torch.Tensor) – Input tensor with shape of (n, 1, h, w).

Returns

Output tensor with shape of (n, c, h’, w’).

Return type

torch.Tensor

class mmedit.models.components.DeepFillv1Discriminators(global_disc_cfg, local_disc_cfg)[source]¶

Discriminators used in DeepFillv1 model.

In DeepFillv1 model, the discriminators are independent without any concatenation like Global&Local model. Thus, we call this model DeepFillv1Discriminators. There exist a global discriminator and a local discriminator with global and local input respectively.

The details can be found in: Generative Image Inpainting with Contextual Attention.

Parameters

global_disc_cfg (dict) – Config dict for global discriminator.
local_disc_cfg (dict) – Config dict for local discriminator.

forward(x)[source]¶

Forward function.

Parameters: x (tuple[torch.Tensor]) – Contains global image and the local image patch.
Returns: Contains the prediction from discriminators in global image and local image patch.
Return type: tuple[torch.Tensor]

init_weights(pretrained=None)[source]¶

Init weights for models.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.

class mmedit.models.components.GLDiscs(global_disc_cfg, local_disc_cfg)[source]¶

Discriminators in Global&Local

This discriminator contains a local discriminator and a global discriminator as described in the original paper: Globally and locally Consistent Image Completion

Parameters

global_disc_cfg (dict) – Config dict to build global discriminator.
local_disc_cfg (dict) – Config dict to build local discriminator.

forward(x)[source]¶

Forward function.

Parameters: x (tuple[torch.Tensor]) – Contains global image and the local image patch.
Returns: Contains the prediction from discriminators in global image and local image patch.
Return type: tuple[torch.Tensor]

init_weights(pretrained=None)[source]¶

Init weights for models.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.

class mmedit.models.components.ModifiedVGG(in_channels, mid_channels)[source]¶

A modified VGG discriminator with input size 128 x 128.

It is used to train SRGAN and ESRGAN.

Parameters

in_channels (int) – Channel number of inputs. Default: 3.
mid_channels (int) – Channel number of base intermediate features. Default: 64.

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

init_weights(pretrained=None, strict=True)[source]¶

Init weights for models.

Parameters

pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
strict (boo, optional) – Whether strictly load the pretrained model. Defaults to True.

class mmedit.models.components.MultiLayerDiscriminator(in_channels, max_channels, num_convs=5, fc_in_channels=None, fc_out_channels=1024, kernel_size=5, conv_cfg=None, norm_cfg=None, act_cfg={'type': 'ReLU'}, out_act_cfg={'type': 'ReLU'}, with_input_norm=True, with_out_convs=False, with_spectral_norm=False, **kwargs)[source]¶

Multilayer Discriminator.

This is a commonly used structure with stacked multiply convolution layers.

Parameters

in_channels (int) – Input channel of the first input convolution.
max_channels (int) – The maximum channel number in this structure.
num_conv (int) – Number of stacked intermediate convs (including input conv but excluding output conv).
fc_in_channels (int | None) – Input dimension of the fully connected layer. If fc_in_channels is None, the fully connected layer will be removed.
fc_out_channels (int) – Output dimension of the fully connected layer.
kernel_size (int) – Kernel size of the conv modules. Default to 5.
conv_cfg (dict) – Config dict to build conv layer.
norm_cfg (dict) – Config dict to build norm layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
out_act_cfg (dict) – Config dict for output activation, “relu” by default.
with_input_norm (bool) – Whether add normalization after the input conv. Default to True.
with_out_convs (bool) – Whether add output convs to the discriminator. The output convs contain two convs. The first out conv has the same setting as the intermediate convs but a stride of 1 instead of 2. The second out conv is a conv similar to the first out conv but reduces the number of channels to 1 and has no activation layer. Default to False.
with_spectral_norm (bool) – Whether use spectral norm after the conv layers. Default to False.
kwargs (keyword arguments) –

forward(x)[source]¶

Forward Function.

Parameters: x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
Returns: Output tensor with shape of (n, c, h’, w’) or (n, c).
Return type: torch.Tensor

init_weights(pretrained=None)[source]¶

Init weights for models.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.

class mmedit.models.components.PatchDiscriminator(in_channels, base_channels=64, num_conv=3, norm_cfg={'type': 'BN'}, init_cfg={'gain': 0.02, 'type': 'normal'})[source]¶

A PatchGAN discriminator.

Parameters

in_channels (int) – Number of channels in input images.
base_channels (int) – Number of channels at the first conv layer. Default: 64.
num_conv (int) – Number of stacked intermediate convs (excluding input and output conv). Default: 3.
norm_cfg (dict) – Config dict to build norm layer. Default: dict(type=’BN’).
init_cfg (dict) – Config dict for initialization. type: The name of our initialization method. Default: ‘normal’. gain: Scaling factor for normal, xavier and orthogonal. Default: 0.02.

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

init_weights(pretrained=None)[source]¶

Initialize weights for the model.

Parameters: pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Default: None.

class mmedit.models.components.PlainRefiner(conv_channels=64, pretrained=None)[source]¶

Simple refiner from Deep Image Matting.

Parameters

conv_channels (int) – Number of channels produced by the three main convolutional layer.
loss_refine (dict) – Config of the loss of the refiner. Default: None.
pretrained (str) – Name of pretrained model. Default: None.

forward(x, raw_alpha)[source]¶

Forward function.

Parameters

x (Tensor) – The input feature map of refiner.
raw_alpha (Tensor) – The raw predicted alpha matte.

Returns

The refined alpha matte.

Return type

Tensor

losses¶

class mmedit.models.losses.CharbonnierCompLoss(loss_weight=1.0, reduction='mean', sample_wise=False, eps=1e-12)[source]¶

Charbonnier composition loss.

Parameters

loss_weight (float) – Loss weight for L1 loss. Default: 1.0.
reduction (str) – Specifies the reduction to apply to the output. Supported choices are ‘none’ | ‘mean’ | ‘sum’. Default: ‘mean’.
sample_wise (bool) – Whether calculate the loss sample-wise. This argument only takes effect when reduction is ‘mean’ and weight (argument of forward()) is not None. It will first reduces loss with ‘mean’ per-sample, and then it means over all the samples. Default: False.
eps (float) – A value used to control the curvature near zero. Default: 1e-12.

forward(pred_alpha, fg, bg, ori_merged, weight=None, **kwargs)[source]¶

Parameters

pred_alpha (Tensor) – of shape (N, 1, H, W). Predicted alpha matte.
fg (Tensor) – of shape (N, 3, H, W). Tensor of foreground object.
bg (Tensor) – of shape (N, 3, H, W). Tensor of background object.
ori_merged (Tensor) – of shape (N, 3, H, W). Tensor of origin merged image before normalized by ImageNet mean and std.
weight (Tensor, optional) – of shape (N, 1, H, W). It is an indicating matrix: weight[trimap == 128] = 1. Default: None.

class mmedit.models.losses.CharbonnierLoss(loss_weight=1.0, reduction='mean', sample_wise=False, eps=1e-12)[source]¶

Charbonnier loss (one variant of Robust L1Loss, a differentiable variant of L1Loss).

Described in “Deep Laplacian Pyramid Networks for Fast and Accurate: Super-Resolution”.

Parameters

loss_weight (float) – Loss weight for L1 loss. Default: 1.0.
reduction (str) – Specifies the reduction to apply to the output. Supported choices are ‘none’ | ‘mean’ | ‘sum’. Default: ‘mean’.
sample_wise (bool) – Whether calculate the loss sample-wise. This argument only takes effect when reduction is ‘mean’ and weight (argument of forward()) is not None. It will first reduces loss with ‘mean’ per-sample, and then it means over all the samples. Default: False.
eps (float) – A value used to control the curvature near zero. Default: 1e-12.

forward(pred, target, weight=None, **kwargs)[source]¶

Forward Function.

Parameters

pred (Tensor) – of shape (N, C, H, W). Predicted tensor.
target (Tensor) – of shape (N, C, H, W). Ground truth tensor.
weight (Tensor, optional) – of shape (N, C, H, W). Element-wise weights. Default: None.

class mmedit.models.losses.DiscShiftLoss(loss_weight=0.1)[source]¶

Disc shift loss.

Parameters: loss_weight (float, optional) – Loss weight. Defaults to 1.0.

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Tensor with shape (n, c, h, w)
Returns: Loss.
Return type: Tensor

class mmedit.models.losses.GANLoss(gan_type, real_label_val=1.0, fake_label_val=0.0, loss_weight=1.0)[source]¶

Define GAN loss.

Parameters

gan_type (str) – Support ‘vanilla’, ‘lsgan’, ‘wgan’, ‘hinge’.
real_label_val (float) – The value for real label. Default: 1.0.
fake_label_val (float) – The value for fake label. Default: 0.0.
loss_weight (float) – Loss weight. Default: 1.0. Note that loss_weight is only for generators; and it is always 1.0 for discriminators.

forward(input, target_is_real, is_disc=False)[source]¶

Parameters

input (Tensor) – The input for the loss module, i.e., the network prediction.
target_is_real (bool) – Whether the targe is real or fake.
is_disc (bool) – Whether the loss for discriminators or not. Default: False.

Returns

GAN loss value.

Return type

Tensor

get_target_label(input, target_is_real)[source]¶

Get target label.

Parameters

input (Tensor) – Input tensor.
target_is_real (bool) – Whether the target is real or fake.

Returns

Target tensor. Return bool for wgan, otherwise,: return Tensor.

Return type

(bool | Tensor)

class mmedit.models.losses.GradientLoss(loss_weight=1.0, reduction='mean')[source]¶

Gradient loss.

Parameters

loss_weight (float) – Loss weight for L1 loss. Default: 1.0.
reduction (str) – Specifies the reduction to apply to the output. Supported choices are ‘none’ | ‘mean’ | ‘sum’. Default: ‘mean’.

forward(pred, target, weight=None)[source]¶

Parameters

pred (Tensor) – of shape (N, C, H, W). Predicted tensor.
target (Tensor) – of shape (N, C, H, W). Ground truth tensor.
weight (Tensor, optional) – of shape (N, C, H, W). Element-wise weights. Default: None.

class mmedit.models.losses.GradientPenaltyLoss(loss_weight=1.0)[source]¶

Gradient penalty loss for wgan-gp.

Parameters: loss_weight (float) – Loss weight. Default: 1.0.

forward(discriminator, real_data, fake_data, mask=None)[source]¶

Forward function.

Parameters

discriminator (nn.Module) – Network for the discriminator.
real_data (Tensor) – Real input data.
fake_data (Tensor) – Fake input data.
mask (Tensor) – Masks for inpaitting. Default: None.

Returns

Loss.

Return type

Tensor

class mmedit.models.losses.L1CompositionLoss(loss_weight=1.0, reduction='mean', sample_wise=False)[source]¶

L1 composition loss.

Parameters

loss_weight (float) – Loss weight for L1 loss. Default: 1.0.
reduction (str) – Specifies the reduction to apply to the output. Supported choices are ‘none’ | ‘mean’ | ‘sum’. Default: ‘mean’.
sample_wise (bool) – Whether calculate the loss sample-wise. This argument only takes effect when reduction is ‘mean’ and weight (argument of forward()) is not None. It will first reduces loss with ‘mean’ per-sample, and then it means over all the samples. Default: False.

forward(pred_alpha, fg, bg, ori_merged, weight=None, **kwargs)[source]¶

Parameters

pred_alpha (Tensor) – of shape (N, 1, H, W). Predicted alpha matte.
fg (Tensor) – of shape (N, 3, H, W). Tensor of foreground object.
bg (Tensor) – of shape (N, 3, H, W). Tensor of background object.
ori_merged (Tensor) – of shape (N, 3, H, W). Tensor of origin merged image before normalized by ImageNet mean and std.
weight (Tensor, optional) – of shape (N, 1, H, W). It is an indicating matrix: weight[trimap == 128] = 1. Default: None.

class mmedit.models.losses.L1Loss(loss_weight=1.0, reduction='mean', sample_wise=False)[source]¶

L1 (mean absolute error, MAE) loss.

Parameters

loss_weight (float) – Loss weight for L1 loss. Default: 1.0.
reduction (str) – Specifies the reduction to apply to the output. Supported choices are ‘none’ | ‘mean’ | ‘sum’. Default: ‘mean’.
sample_wise (bool) – Whether calculate the loss sample-wise. This argument only takes effect when reduction is ‘mean’ and weight (argument of forward()) is not None. It will first reduce loss with ‘mean’ per-sample, and then it means over all the samples. Default: False.

forward(pred, target, weight=None, **kwargs)[source]¶

Forward Function.

Parameters

pred (Tensor) – of shape (N, C, H, W). Predicted tensor.
target (Tensor) – of shape (N, C, H, W). Ground truth tensor.
weight (Tensor, optional) – of shape (N, C, H, W). Element-wise weights. Default: None.

class mmedit.models.losses.MSECompositionLoss(loss_weight=1.0, reduction='mean', sample_wise=False)[source]¶

MSE (L2) composition loss.

Parameters

loss_weight (float) – Loss weight for MSE loss. Default: 1.0.
reduction (str) – Specifies the reduction to apply to the output. Supported choices are ‘none’ | ‘mean’ | ‘sum’. Default: ‘mean’.
sample_wise (bool) – Whether calculate the loss sample-wise. This argument only takes effect when reduction is ‘mean’ and weight (argument of forward()) is not None. It will first reduces loss with ‘mean’ per-sample, and then it means over all the samples. Default: False.

forward(pred_alpha, fg, bg, ori_merged, weight=None, **kwargs)[source]¶

Parameters

pred_alpha (Tensor) – of shape (N, 1, H, W). Predicted alpha matte.
fg (Tensor) – of shape (N, 3, H, W). Tensor of foreground object.
bg (Tensor) – of shape (N, 3, H, W). Tensor of background object.
ori_merged (Tensor) – of shape (N, 3, H, W). Tensor of origin merged image before normalized by ImageNet mean and std.
weight (Tensor, optional) – of shape (N, 1, H, W). It is an indicating matrix: weight[trimap == 128] = 1. Default: None.

class mmedit.models.losses.MSELoss(loss_weight=1.0, reduction='mean', sample_wise=False)[source]¶

MSE (L2) loss.

Parameters

loss_weight (float) – Loss weight for MSE loss. Default: 1.0.
reduction (str) – Specifies the reduction to apply to the output. Supported choices are ‘none’ | ‘mean’ | ‘sum’. Default: ‘mean’.
sample_wise (bool) – Whether calculate the loss sample-wise. This argument only takes effect when reduction is ‘mean’ and weight (argument of forward()) is not None. It will first reduces loss with ‘mean’ per-sample, and then it means over all the samples. Default: False.

forward(pred, target, weight=None, **kwargs)[source]¶

Forward Function.

Parameters

pred (Tensor) – of shape (N, C, H, W). Predicted tensor.
target (Tensor) – of shape (N, C, H, W). Ground truth tensor.
weight (Tensor, optional) – of shape (N, C, H, W). Element-wise weights. Default: None.

class mmedit.models.losses.MaskedTVLoss(loss_weight=1.0)[source]¶

Masked TV loss.

Parameters: loss_weight (float, optional) – Loss weight. Defaults to 1.0.

forward(pred, mask=None)[source]¶

Forward function.

Parameters

pred (torch.Tensor) – Tensor with shape of (n, c, h, w).
mask (torch.Tensor, optional) – Tensor with shape of (n, 1, h, w). Defaults to None.

Returns

[description]

Return type

[type]

class mmedit.models.losses.PerceptualLoss(layer_weights, vgg_type='vgg19', use_input_norm=True, perceptual_weight=1.0, style_weight=1.0, norm_img=True, pretrained='torchvision://vgg19', criterion='l1')[source]¶

Perceptual loss with commonly used style loss.

Parameters

layers_weights (dict) – The weight for each layer of vgg feature. Here is an example: {‘4’: 1., ‘9’: 1., ‘18’: 1.}, which means the 5th, 10th and 18th feature layer will be extracted with weight 1.0 in calculting losses.
vgg_type (str) – The type of vgg network used as feature extractor. Default: ‘vgg19’.
use_input_norm (bool) – If True, normalize the input image in vgg. Default: True.
perceptual_weight (float) – If perceptual_weight > 0, the perceptual loss will be calculated and the loss will multiplied by the weight. Default: 1.0.
style_weight (float) – If style_weight > 0, the style loss will be calculated and the loss will multiplied by the weight. Default: 1.0.
norm_img (bool) – If True, the image will be normed to [0, 1]. Note that this is different from the use_input_norm which norm the input in in forward function of vgg according to the statistics of dataset. Importantly, the input image must be in range [-1, 1].
pretrained (str) – Path for pretrained weights. Default: ‘torchvision://vgg19’

forward(x, gt)[source]¶

Forward function.

Parameters

x (Tensor) – Input tensor with shape (n, c, h, w).
gt (Tensor) – Ground-truth tensor with shape (n, c, h, w).

Returns

Forward results.

Return type

Tensor

class mmedit.models.losses.PerceptualVGG(layer_name_list, vgg_type='vgg19', use_input_norm=True, pretrained='torchvision://vgg19')[source]¶

VGG network used in calculating perceptual loss.

In this implementation, we allow users to choose whether use normalization in the input feature and the type of vgg network. Note that the pretrained path must fit the vgg type.

Parameters

layer_name_list (list[str]) – According to the name in this list, forward function will return the corresponding features. This list contains the name each layer in vgg.feature. An example of this list is [‘4’, ‘10’].
vgg_type (str) – Set the type of vgg network. Default: ‘vgg19’.
use_input_norm (bool) – If True, normalize the input image. Importantly, the input feature must in the range [0, 1]. Default: True.
pretrained (str) – Path for pretrained weights. Default: ‘torchvision://vgg19’

forward(x)[source]¶

Forward function.

Parameters: x (Tensor) – Input tensor with shape (n, c, h, w).
Returns: Forward results.
Return type: Tensor

init_weights(model, pretrained)[source]¶

Init weights.

Parameters

model (nn.Module) – Models to be inited.
pretrained (str) – Path for pretrained weights.

mmedit.models.losses.mask_reduce_loss(loss, weight=None, reduction='mean', sample_wise=False)[source]¶

Apply element-wise weight and reduce loss.

Parameters

loss (Tensor) – Element-wise loss.
weight (Tensor) – Element-wise weights. Default: None.
reduction (str) – Same as built-in losses of PyTorch. Options are “none”, “mean” and “sum”. Default: ‘mean’.
sample_wise (bool) – Whether calculate the loss sample-wise. This argument only takes effect when reduction is ‘mean’ and weight (argument of forward()) is not None. It will first reduces loss with ‘mean’ per-sample, and then it means over all the samples. Default: False.

Returns

Processed loss values.

Return type

Tensor

mmedit.models.losses.reduce_loss(loss, reduction)[source]¶

Reduce loss as specified.

Parameters

loss (Tensor) – Elementwise loss tensor.
reduction (str) – Options are “none”, “mean” and “sum”.

Returns

Reduced loss tensor.

Return type

Tensor

mmedit.utils¶

mmedit.utils.get_root_logger(log_file=None, log_level=20)[source]¶

Get the root logger.

The logger will be initialized if it has not been initialized. By default a StreamHandler will be added. If log_file is specified, a FileHandler will also be added. The name of the root logger is the top-level package name, e.g., “mmedit”.

Parameters

log_file (str | None) – The log filename. If specified, a FileHandler will be added to the root logger.
log_level (int) – The root logger level. Note that only the process of rank 0 is affected, while other processes will set the level to “Error” and be silent most of the time.

Returns

The root logger.

Return type

logging.Logger