API Reference¶
mmedit.core¶
-
class
mmedit.core.
DistEvalIterHook
(dataloader, interval=1, gpu_collect=False, **eval_kwargs)[source]¶ Distributed evaluation hook.
- Parameters
dataloader (DataLoader) – A PyTorch dataloader.
interval (int) – Evaluation interval. Default: 1.
tmpdir (str | None) – Temporary directory to save the results of all processes. Default: None.
gpu_collect (bool) – Whether to use gpu or cpu to collect results. Default: False.
eval_kwargs (dict) – Other eval kwargs. It may contain: save_image (bool): Whether save image. save_path (str): The path to save image.
-
class
mmedit.core.
EvalIterHook
(dataloader, interval=1, **eval_kwargs)[source]¶ Non-Distributed evaluation hook for iteration-based runner.
This hook will regularly perform evaluation in a given interval when performing in non-distributed environment.
- Parameters
dataloader (DataLoader) – A PyTorch dataloader.
interval (int) – Evaluation interval. Default: 1.
eval_kwargs (dict) – Other eval kwargs. It contains: save_image (bool): Whether to save image. save_path (str): The path to save image.
-
class
mmedit.core.
L1Evaluation
[source]¶ L1 evaluation metric.
- Parameters
data_dict (dict) – Must contain keys of ‘gt_img’ and ‘fake_res’. If ‘mask’ is given, the results will be computed with mask as weight.
-
class
mmedit.core.
LinearLrUpdaterHook
(target_lr=0, start=0, interval=1, **kwargs)[source]¶ Linear learning rate scheduler for image generation.
In the beginning, the learning rate is ‘base_lr’ defined in mmcv. We give a target learning rate ‘target_lr’ and a start point ‘start’ (iteration / epoch). Before ‘start’, we fix learning rate as ‘base_lr’; After ‘start’, we linearly update learning rate to ‘target_lr’.
- Parameters
target_lr (float) – The target learning rate. Default: 0.
start (int) – The start point (iteration / epoch, specified by args ‘by_epoch’ in its parent class in mmcv) to update learning rate. Default: 0.
interval (int) – The interval to update the learning rate. Default: 1.
-
class
mmedit.core.
VisualizationHook
(output_dir, res_name_list, interval=- 1, filename_tmpl='iter_{}.png', rerange=True, bgr2rgb=True, nrow=1, padding=4)[source]¶ Visualization hook.
In this hook, we use the official api save_image in torchvision to save the visualization results.
- Parameters
output_dir (str) – The file path to store visualizations.
res_name_list (str) – The list contains the name of results in outputs dict. The results in outputs dict must be a torch.Tensor with shape (n, c, h, w).
interval (int) – The interval of calling this hook. If set to -1, the visualization hook will not be called. Default: -1.
filename_tmpl (str) – Format string used to save images. The output file name will be formatted as this args. Default: ‘iter_{}.png’.
rerange (bool) – Whether to rerange the output value from [-1, 1] to [0, 1]. We highly recommend users should preprocess the visualization results on their own. Here, we just provide a simple interface. Default: True.
bgr2rgb (bool) – Whether to reformat the channel dimension from BGR to RGB. The final image we will save is following RGB style. Default: True.
nrow (int) – The number of samples in a row. Default: 1.
padding (int) – The number of padding pixels between each samples. Default: 4.
-
mmedit.core.
build_optimizers
(model, cfgs)[source]¶ Build multiple optimizers from configs.
If cfgs contains several dicts for optimizers, then a dict for each constructed optimizers will be returned. If cfgs only contains one optimizer config, the constructed optimizer itself will be returned.
For example,
Multiple optimizer configs:
optimizer_cfg = dict( model1=dict(type='SGD', lr=lr), model2=dict(type='SGD', lr=lr))
The return dict is
dict('model1': torch.optim.Optimizer, 'model2': torch.optim.Optimizer)
Single optimizer config:
optimizer_cfg = dict(type='SGD', lr=lr)
The return is
torch.optim.Optimizer
.- Parameters
model (
nn.Module
) – The model with parameters to be optimized.cfgs (dict) – The config dict of the optimizer.
- Returns
The initialized optimizers.
- Return type
dict[
torch.optim.Optimizer
] |torch.optim.Optimizer
-
mmedit.core.
psnr
(img1, img2, crop_border=0, input_order='HWC')[source]¶ Calculate PSNR (Peak Signal-to-Noise Ratio).
Ref: https://en.wikipedia.org/wiki/Peak_signal-to-noise_ratio
- Parameters
img1 (ndarray) – Images with range [0, 255].
img2 (ndarray) – Images with range [0, 255].
crop_border (int) – Cropped pixels in each edges of an image. These pixels are not involved in the PSNR calculation. Default: 0.
input_order (str) – Whether the input order is ‘HWC’ or ‘CHW’. Default: ‘HWC’.
- Returns
psnr result.
- Return type
float
-
mmedit.core.
reorder_image
(img, input_order='HWC')[source]¶ Reorder images to ‘HWC’ order.
If the input_order is (h, w), return (h, w, 1); If the input_order is (c, h, w), return (h, w, c); If the input_order is (h, w, c), return as it is.
- Parameters
img (ndarray) – Input image.
input_order (str) – Whether the input order is ‘HWC’ or ‘CHW’. If the input image shape is (h, w), input_order will not have effects. Default: ‘HWC’.
- Returns
reordered image.
- Return type
ndarray
-
mmedit.core.
ssim
(img1, img2, crop_border=0, input_order='HWC')[source]¶ Calculate SSIM (structural similarity).
Ref: Image quality assessment: From error visibility to structural similarity
The results are the same as that of the official released MATLAB code in https://ece.uwaterloo.ca/~z70wang/research/ssim/.
For three-channel images, SSIM is calculated for each channel and then averaged.
- Parameters
img1 (ndarray) – Images with range [0, 255].
img2 (ndarray) – Images with range [0, 255].
crop_border (int) – Cropped pixels in each edges of an image. These pixels are not involved in the SSIM calculation. Default: 0.
input_order (str) – Whether the input order is ‘HWC’ or ‘CHW’. Default: ‘HWC’.
- Returns
ssim result.
- Return type
float
-
mmedit.core.
tensor2img
(tensor, out_type=<class 'numpy.uint8'>, min_max=(0, 1))[source]¶ Convert torch Tensors into image numpy arrays.
After clamping to (min, max), image values will be normalized to [0, 1].
For differnet tensor shapes, this function will have different behaviors:
- 4D mini-batch Tensor of shape (N x 3/1 x H x W):
Use make_grid to stitch images in the batch dimension, and then convert it to numpy array.
- 3D Tensor of shape (3/1 x H x W) and 2D Tensor of shape (H x W):
Directly change to numpy array.
Note that the image channel in input tensors should be RGB order. This function will convert it to cv2 convention, i.e., (H x W x C) with BGR order.
- Parameters
tensor (Tensor | list[Tensor]) – Input tensors.
out_type (numpy type) – Output types. If
np.uint8
, transform outputs to uint8 type with range [0, 255]; otherwise, float type with range [0, 1]. Default:np.uint8
.min_max (tuple) – min and max values for clamp.
- Returns
3D ndarray of shape (H x W x C) or 2D ndarray of shape (H x W).
- Return type
(Tensor | list[Tensor])
mmedit.datasets¶
datasets¶
-
class
mmedit.datasets.
AdobeComp1kDataset
(ann_file, pipeline, data_prefix=None, test_mode=False)[source]¶ Adobe composition-1k dataset.
The dataset loads (alpha, fg, bg) data and apply specified transforms to the data. You could specify whether composite merged image online or load composited merged image in pipeline.
Example for online comp-1k dataset:
[ { "alpha": 'alpha/000.png', "fg": 'fg/000.png', "bg": 'bg/000.png' }, { "alpha": 'alpha/001.png', "fg": 'fg/001.png', "bg": 'bg/001.png' }, ]
Example for offline comp-1k dataset:
[ { "alpha": 'alpha/000.png', "merged": 'merged/000.png', "fg": 'fg/000.png', "bg": 'bg/000.png' }, { "alpha": 'alpha/001.png', "merged": 'merged/001.png', "fg": 'fg/001.png', "bg": 'bg/001.png' }, ]
-
class
mmedit.datasets.
BaseDataset
(pipeline, test_mode=False)[source]¶ Base class for datasets.
All datasets should subclass it. All subclasses should overwrite:
load_annotations
, supporting to load information and generate image lists.- Parameters
pipeline (list[dict | callable]) – A sequence of data transforms.
test_mode (bool) – If True, the dataset will work in test mode. Otherwise, in train mode.
-
abstract
load_annotations
()[source]¶ Abstract function for loading annotation.
All subclasses should overwrite this function
-
class
mmedit.datasets.
BaseGenerationDataset
(pipeline, test_mode=False)[source]¶ Base class for generation datasets.
-
class
mmedit.datasets.
BaseMattingDataset
(ann_file, pipeline, data_prefix=None, test_mode=False)[source]¶ Base image matting dataset.
-
class
mmedit.datasets.
BaseSRDataset
(pipeline, scale, test_mode=False)[source]¶ Base class for super resolution datasets.
-
class
mmedit.datasets.
GenerationPairedDataset
(dataroot, pipeline, test_mode=False)[source]¶ General paired image folder dataset for image generation.
It assumes that the training directory is ‘/path/to/data/train’. During test time, the directory is ‘/path/to/data/test’. ‘/path/to/data’ can be initialized by args ‘dataroot’. Each sample contains a pair of images concatenated in the w dimension (A|B).
- Parameters
dataroot (str |
Path
) – Path to the folder root of paired images.pipeline (List[dict | callable]) – A sequence of data transformations.
test_mode (bool) – Store True when building test dataset. Default: False.
-
class
mmedit.datasets.
GenerationUnpairedDataset
(dataroot, pipeline, test_mode=False)[source]¶ General unpaired image folder dataset for image generation.
It assumes that the training directory of images from domain A is ‘/path/to/data/trainA’, and that from domain B is ‘/path/to/data/trainB’, respectively. ‘/path/to/data’ can be initialized by args ‘dataroot’. During test time, the directory is ‘/path/to/data/testA’ and ‘/path/to/data/testB’, respectively.
- Parameters
dataroot (str |
Path
) – Path to the folder root of unpaired images.pipeline (List[dict | callable]) – A sequence of data transformations.
test_mode (bool) – Store True when building test dataset. Default: False.
-
load_annotations
(dataroot)[source]¶ Load unpaired image paths of one domain.
- Parameters
dataroot (str) – Path to the folder root for unpaired images of one domain.
- Returns
List that contains unpaired image paths of one domain.
- Return type
list[dict]
-
class
mmedit.datasets.
ImgInpaintingDataset
(ann_file, pipeline, data_prefix=None, test_mode=False)[source]¶ Image dataset for inpainting.
-
class
mmedit.datasets.
RepeatDataset
(dataset, times)[source]¶ A wrapper of repeated dataset.
The length of repeated dataset will be times larger than the original dataset. This is useful when the data loading time is long but the dataset is small. Using RepeatDataset can reduce the data loading time between epochs.
- Parameters
dataset (
Dataset
) – The dataset to be repeated.times (int) – Repeat times.
-
class
mmedit.datasets.
SRAnnotationDataset
(lq_folder, gt_folder, ann_file, pipeline, scale, data_prefix=None, test_mode=False, filename_tmpl='{}')[source]¶ General paired image dataset with an annotation file for image restoration.
The dataset loads lq (Low Quality) and gt (Ground-Truth) image pairs, applies specified transforms and finally returns a dict containing paired data and other information.
This is the “annotation file mode”: Each line in the annotation file contains the image names and image shape (usually for gt), separated by a white space.
Example of an annotation file:
0001_s001.png (480,480,3) 0001_s002.png (480,480,3)
- Parameters
lq_folder (str |
Path
) – Path to a lq folder.gt_folder (str |
Path
) – Path to a gt folder.ann_file (str |
Path
) – Path to the annotation file.pipeline (list[dict | callable]) – A sequence of data transformations.
scale (int) – Upsampling scale ratio.
test_mode (bool) – Store True when building test dataset. Default: False.
filename_tmpl (str) – Template for each filename. Note that the template excludes the file extension. Default: ‘{}’.
-
class
mmedit.datasets.
SRFolderDataset
(lq_folder, gt_folder, pipeline, scale, test_mode=False, filename_tmpl='{}')[source]¶ General paired image folder dataset for image restoration.
The dataset loads lq (Low Quality) and gt (Ground-Truth) image pairs, applies specified transforms and finally returns a dict containing paired data and other information.
This is the “folder mode”, which needs to specify the lq folder path and gt folder path, each folder containing the corresponding images. Image lists will be generated automatically. You can also specify the filename template to match the lq and gt pairs.
For example, we have two folders with the following structures:
data_root ├── lq │ ├── 0001_x4.png │ ├── 0002_x4.png ├── gt │ ├── 0001.png │ ├── 0002.png
then, you need to set:
lq_folder = data_root/lq gt_folder = data_root/gt filename_tmpl = '{}_x4'
- Parameters
lq_folder (str |
Path
) – Path to a lq folder.gt_folder (str |
Path
) – Path to a gt folder.pipeline (List[dict | callable]) – A sequence of data transformations.
scale (int) – Upsampling scale ratio.
test_mode (bool) – Store True when building test dataset. Default: False.
filename_tmpl (str) – Template for each filename. Note that the template excludes the file extension. Default: ‘{}’.
-
class
mmedit.datasets.
SRLmdbDataset
(lq_folder, gt_folder, pipeline, scale, test_mode=False)[source]¶ General paired image lmdb dataset for image restoration.
The dataset loads lq (Low Quality) and gt (Ground-Truth) image pairs, applies specified transforms and finally returns a dict containing paired data and other information.
This is the “lmdb mode”. In order to speed up IO, you are recommended to use lmdb. First, you need to make lmdb files. Suppose the lmdb files are path_to_lq/lq.lmdb and path_to_gt/gt.lmdb, then you can just set:
lq_folder = path_to_lq/lq.lmdb gt_folder = path_to_gt/gt.lmdb
Contents of lmdb. Taking the lq.lmdb for example, the file structure is:
lq.lmdb ├── data.mdb ├── lock.mdb ├── meta_info.txt
The data.mdb and lock.mdb are standard lmdb files and you can refer to https://lmdb.readthedocs.io/en/release/ for more details.
The meta_info.txt is a specified txt file to record the meta information of our datasets. It will be automatically created when preparing datasets by our provided dataset tools. Each line in the txt file records
image name (with extension);
image shape;
compression level, separated by a white space.
For example, the meta information of the lq.lmdb is: baboon.png (120,125,3) 1, which means: 1) image name (with extension): baboon.png; 2) image shape: (120,125,3); and 3) compression level: 1
We use the image name without extension as the lmdb key. Note that we use the same key for the corresponding lq and gt images.
- Parameters
lq_folder (str |
Path
) – Path to a lq lmdb file.gt_folder (str |
Path
) – Path to a gt lmdb file.pipeline (list[dict | callable]) – A sequence of data transformations.
scale (int) – Upsampling scale ratio.
test_mode (bool) – Store True when building test dataset. Default: False.
-
class
mmedit.datasets.
SRREDSDataset
(lq_folder, gt_folder, ann_file, num_input_frames, pipeline, scale, val_partition='official', test_mode=False)[source]¶ REDS dataset for video super resolution.
The dataset loads several LQ (Low-Quality) frames and a center GT (Ground-Truth) frame. Then it applies specified transforms and finally returns a dict containing paired data and other information.
It reads REDS keys from the txt file. Each line contains: 1. image name; 2, image shape, seperated by a white space. Examples:
000/00000000.png (720, 1280, 3) 000/00000001.png (720, 1280, 3)
- Parameters
lq_folder (str |
Path
) – Path to a lq folder.gt_folder (str |
Path
) – Path to a gt folder.ann_file (str |
Path
) – Path to the annotation file.num_input_frames (int) – Window size for input frames.
pipeline (list[dict | callable]) – A sequence of data transformations.
scale (int) – Upsampling scale ratio.
val_partition (str) – Validation partition mode. Choices [‘official’ or
Default ('REDS4']) – ‘official’.
test_mode (bool) – Store True when building test dataset. Default: False.
-
class
mmedit.datasets.
SRVid4Dataset
(lq_folder, gt_folder, ann_file, num_input_frames, pipeline, scale, filename_tmpl='{:08d}', test_mode=False)[source]¶ Vid4 dataset for video super resolution.
The dataset loads several LQ (Low-Quality) frames and a center GT (Ground-Truth) frame. Then it applies specified transforms and finally returns a dict containing paired data and other information.
It reads Vid4 keys from the txt file. Each line contains:
folder name;
number of frames in this clip (in the same folder);
image shape, seperated by a white space.
Examples:
calendar 40 (320,480,3) city 34 (320,480,3)
- Parameters
lq_folder (str |
Path
) – Path to a lq folder.gt_folder (str |
Path
) – Path to a gt folder.ann_file (str |
Path
) – Path to the annotation file.num_input_frames (int) – Window size for input frames.
pipeline (list[dict | callable]) – A sequence of data transformations.
scale (int) – Upsampling scale ratio.
filename_tmpl (str) – Template for each filename. Note that the template excludes the file extension. Default: ‘{:08d}’.
test_mode (bool) – Store True when building test dataset. Default: False.
-
class
mmedit.datasets.
SRVimeo90KDataset
(lq_folder, gt_folder, ann_file, num_input_frames, pipeline, scale, test_mode=False)[source]¶ Vimeo90K dataset for video super resolution.
The dataset loads several LQ (Low-Quality) frames and a center GT (Ground-Truth) frame. Then it applies specified transforms and finally returns a dict containing paired data and other information.
It reads Vimeo90K keys from the txt file. Each line contains: 1. image name; 2, image shape, seperated by a white space. Examples:
00001/0266 (256, 448, 3) 00001/0268 (256, 448, 3)
- Parameters
lq_folder (str |
Path
) – Path to a lq folder.gt_folder (str |
Path
) – Path to a gt folder.ann_file (str |
Path
) – Path to the annotation file.num_input_frames (int) – Window size for input frames.
pipeline (list[dict | callable]) – A sequence of data transformations.
scale (int) – Upsampling scale ratio.
test_mode (bool) – Store True when building test dataset. Default: False.
-
mmedit.datasets.
build_dataloader
(dataset, samples_per_gpu, workers_per_gpu, num_gpus=1, dist=True, shuffle=True, seed=None, drop_last=False, pin_memory=True, **kwargs)[source]¶ Build PyTorch DataLoader.
In distributed training, each GPU/process has a dataloader. In non-distributed training, there is only one dataloader for all GPUs.
- Parameters
dataset (
Dataset
) – A PyTorch dataset.samples_per_gpu (int) – Number of samples on each GPU, i.e., batch size of each GPU.
workers_per_gpu (int) – How many subprocesses to use for data loading for each GPU.
num_gpus (int) – Number of GPUs. Only used in non-distributed training. Default: 1.
dist (bool) – Distributed training/test or not. Default: True.
shuffle (bool) – Whether to shuffle the data at every epoch. Default: True.
seed (int | None) – Seed to be used. Default: None.
drop_last (bool) – Whether to drop the last incomplete batch in epoch. Default: False
pin_memory (bool) – Whether to use pin_memory in DataLoader. Default: True
kwargs (dict, optional) – Any keyword argument to be used to initialize DataLoader.
- Returns
A PyTorch dataloader.
- Return type
DataLoader
-
mmedit.datasets.
build_dataset
(cfg, default_args=None)[source]¶ Build a dataset from config dict.
It supports a variety of dataset config. If
cfg
is a Sequential (list or dict), it will be a concatenated dataset of the datasets specified by the Sequential. If it is aRepeatDataset
, then it will repeat the datasetcfg['dataset']
forcfg['times']
times. If theann_file
of the dataset is a Sequential, then it will build a concatenated dataset with the same dataset type but differentann_file
.- Parameters
cfg (dict) – Config dict. It should at least contain the key “type”.
default_args (dict, optional) – Default initialization arguments. Default: None.
- Returns
The constructed dataset.
- Return type
Dataset
pipelines¶
-
class
mmedit.datasets.pipelines.
BinarizeImage
(keys, binary_thr, to_int=False)[source]¶ Binarize image.
- Parameters
keys (Sequence[str]) – The images to be binarized.
binary_thr (float) – Threshold for binarization.
to_int (bool) – If True, return image as int32, otherwise return image as float32.
-
class
mmedit.datasets.pipelines.
Collect
(keys, meta_keys=None)[source]¶ Collect data from the loader relevant to the specific task.
This is usually the last stage of the data loader pipeline. Typically keys is set to some subset of “img”, “gt_labels”.
The “img_meta” item is always populated. The contents of the “meta” dictionary depends on “meta_keys”.
- Parameters
keys (Sequence[str]) – Required keys to be collected.
meta_keys (Sequence[str]) – Required keys to be collected to “meta”. Default: None.
-
class
mmedit.datasets.pipelines.
Compose
(transforms)[source]¶ Compose a data pipeline with a sequence of transforms.
- Parameters
transforms (list[dict | callable]) – Either config dicts of transforms or transform objects.
-
class
mmedit.datasets.pipelines.
CompositeFg
(fg_dirs, alpha_dirs, interpolation='nearest')[source]¶ Composite foreground with a random foreground.
This class composites the current training sample with additional data randomly (could be from the same dataset). With probability 0.5, the sample will be composited with a random sample from the specified directory. The composition is performed as:
\[ \begin{align}\begin{aligned}fg_{new} = \alpha_1 * fg_1 + (1 - \alpha_1) * fg_2\\\alpha_{new} = 1 - (1 - \alpha_1) * (1 - \alpha_2)\end{aligned}\end{align} \]where \((fg_1, \alpha_1)\) is from the current sample and \((fg_2, \alpha_2)\) is the randomly loaded sample. With the above composition, \(\alpha_{new}\) is still in [0, 1].
Required keys are “alpha” and “fg”. Modified keys are “alpha” and “fg”.
- Parameters
fg_dirs (str | list[str]) – Path of directories to load foreground images from.
alpha_dirs (str | list[str]) – Path of directories to load alpha mattes from.
interpolation (str) – Interpolation method of mmcv.imresize to resize the randomly loaded images.
-
class
mmedit.datasets.pipelines.
Crop
(keys, crop_size, random_crop=True)[source]¶ Crop data to specific size for training.
- Parameters
keys (Sequence[str]) – The images to be cropped.
crop_size (Tuple[int]) – Target spatial size (h, w).
random_crop (bool) – If set to True, it will random crop image. Otherwise, it will work as center crop.
-
class
mmedit.datasets.pipelines.
CropAroundCenter
(crop_size)[source]¶ Randomly crop the images around unknown area in the center 1/4 images.
This cropping strategy is adopted in GCA matting. The unknown area is the same as semi-transparent area. https://arxiv.org/pdf/2001.04069.pdf
It retains the center 1/4 images and resizes the images to ‘crop_size’. Required keys are “fg”, “bg”, “trimap” and “alpha”, added or modified keys are “crop_bbox”, “fg”, “bg”, “trimap” and “alpha”.
- Parameters
crop_size (int | tuple) – Desired output size. If int, square crop is applied.
-
class
mmedit.datasets.pipelines.
CropAroundFg
(keys, bd_ratio_range=(0.1, 0.4), test_mode=False)[source]¶ Crop around the whole foreground in the segmentation mask.
Required keys are “seg” and the keys in argument keys. Meanwhile, “seg” must be in argument keys. Added or modified keys are “crop_bbox” and the keys in argument keys.
- Parameters
keys (Sequence[str]) – The images to be cropped. It must contain ‘seg’.
bd_ratio_range (tuple, optional) – The range of the boundary (bd) ratio to select from. The boundary ratio is the ratio of the boundary to the minimal bbox that contains the whole foreground given by segmentation. Default to (0.1, 0.4).
test_mode (bool) – Whether use test mode. In test mode, the tight crop area of foreground will be extended to the a square. Default to False.
-
class
mmedit.datasets.pipelines.
CropAroundUnknown
(keys, crop_sizes, unknown_source='alpha', interpolations='bilinear')[source]¶ Crop around unknown area with a randomly selected scale.
Randomly select the w and h from a list of (w, h). Required keys are the keys in argument keys, added or modified keys are “crop_bbox” and the keys in argument keys. This class assumes value of “alpha” ranges from 0 to 255.
- Parameters
keys (Sequence[str]) – The images to be cropped. It must contain ‘alpha’. If unknown_source is set to ‘trimap’, then it must also contain ‘trimap’.
crop_sizes (list[int | tuple[int]]) – List of (w, h) to be selected.
unknown_source (str, optional) – Unknown area to select from. It must be ‘alpha’ or ‘tirmap’. Default to ‘alpha’.
interpolations (str | list[str], optional) – Interpolation method of mmcv.imresize. The interpolation operation will be applied when image size is smaller than the crop_size. If given as a list of str, it should have the same length as keys. Or if given as a str all the keys will be resized with the same method. Default to ‘bilinear’.
-
class
mmedit.datasets.pipelines.
FixedCrop
(keys, crop_size, crop_pos=None)[source]¶ Crop paired data (at a specific position) to specific size for training.
- Parameters
keys (Sequence[str]) – The images to be cropped.
crop_size (Tuple[int]) – Target spatial size (h, w).
crop_pos (Tuple[int]) – Specific position (x, y). If set to None, random initialize the position to crop paired data batch.
-
class
mmedit.datasets.pipelines.
Flip
(keys, flip_ratio=0.5, direction='horizontal')[source]¶ Flip the input data with a probability.
Reverse the order of elements in the given data with a specific direction. The shape of the data is preserved, but the elements are reordered. Required keys are the keys in attributes “keys”, added or modified keys are “flip”, “flip_direction” and the keys in attributes “keys”. It also supports flipping a list of images with the same flip.
- Parameters
keys (list[str]) – The images to be flipped.
flip_ratio (float) – The propability to flip the images.
direction (str) – Flip images horizontally or vertically. Options are “horizontal” | “vertical”. Default: “horizontal”.
-
class
mmedit.datasets.pipelines.
FormatTrimap
(to_onehot=False)[source]¶ Convert trimap (tensor) to one-hot representation.
It transforms the trimap label from (0, 128, 255) to (0, 1, 2). If
to_onehot
is set to True, the trimap will convert to one-hot tensor of shape (3, H, W). Required key is “trimap”, added or modified key are “trimap” and “to_onehot”.- Parameters
to_onehot (bool) – whether convert trimap to one-hot tensor. Default:
False
.
-
class
mmedit.datasets.pipelines.
GenerateFrameIndices
(interval_list, frames_per_clip=99)[source]¶ Generate frame index for REDS datasets. It also performs temporal augmention with random interval.
Required keys: lq_path, gt_path, key, num_input_frames Added or modified keys: lq_path, gt_path, interval, reverse
- Parameters
interval_list (list[int]) – Interval list for temporal augmentation. It will randomly pick an interval from interval_list and sample frame index with the interval.
frames_per_clip (int) – Number of frames per clips. Default: 99 for REDS dataset.
-
class
mmedit.datasets.pipelines.
GenerateFrameIndiceswithPadding
(padding, filename_tmpl='{:08d}')[source]¶ Generate frame index with padding for REDS dataset and Vid4 dataset during testing.
Required keys: lq_path, gt_path, key, num_input_frames, max_frame_num Added or modified keys: lq_path, gt_path
- Parameters
padding –
padding mode, one of ‘replicate’ | ‘reflection’ | ‘reflection_circle’ | ‘circle’.
Examples: current_idx = 0, num_input_frames = 5 The generated frame indices under different padding mode:
replicate: [0, 0, 0, 1, 2] reflection: [2, 1, 0, 1, 2] reflection_circle: [4, 3, 0, 1, 2] circle: [3, 4, 0, 1, 2]
-
class
mmedit.datasets.pipelines.
GenerateSeg
(kernel_size=5, erode_iter_range=(10, 20), dilate_iter_range=(15, 30), num_holes_range=(0, 3), hole_sizes=[(15, 15), (25, 25), (35, 35), (45, 45)], blur_ksizes=[(21, 21), (31, 31), (41, 41)])[source]¶ Generate segmentation mask from alpha matte.
- Parameters
kernel_size (int, optional) – Kernel size for both erosion and dilation. The kernel will have the same height and width. Defaults to 5.
erode_iter_range (tuple, optional) – Iteration of erosion. Defaults to (10, 20).
dilate_iter_range (tuple, optional) – Iteration of dilation. Defaults to (15, 30).
num_holes_range (tuple, optional) – Range of number of holes to randomly select from. Defaults to (0, 3).
hole_sizes (list, optional) – List of (h, w) to be selected as the size of the rectangle hole. Defaults to [(15, 15), (25, 25), (35, 35), (45, 45)].
blur_ksizes (list, optional) – List of (h, w) to be selected as the kernel_size of the gaussian blur. Defaults to [(21, 21), (31, 31), (41, 41)].
-
class
mmedit.datasets.pipelines.
GenerateSoftSeg
(fg_thr=0.2, border_width=25, erode_ksize=3, dilate_ksize=5, erode_iter_range=(10, 20), dilate_iter_range=(3, 7), blur_ksizes=[(21, 21), (31, 31), (41, 41)])[source]¶ Generate soft segmentation mask from input segmentation mask.
Required key is “seg”, added key is “soft_seg”.
- Parameters
fg_thr (float, optional) – Threhold of the foreground in the normalized input segmentation mask. Defaults to 0.2.
border_width (int, optional) – Width of border to be padded to the bottom of the mask. Defaults to 25.
erode_ksize (int, optional) – Fixed kernel size of the erosion. Defaults to 5.
dilate_ksize (int, optional) – Fixed kernel size of the dilation. Defaults to 5.
erode_iter_range (tuple, optional) – Iteration of erosion. Defaults to (10, 20).
dilate_iter_range (tuple, optional) – Iteration of dilation. Defaults to (3, 7).
blur_ksizes (list, optional) – List of (h, w) to be selected as the kernel_size of the gaussian blur. Defaults to [(21, 21), (31, 31), (41, 41)].
-
class
mmedit.datasets.pipelines.
GenerateTrimap
(kernel_size, iterations=1, random=True)[source]¶ Using random erode/dilate to generate trimap from alpha matte.
Required key is “alpha”, added key is “trimap”.
- Parameters
kernel_size (int | tuple[int]) – The range of random kernel_size of erode/dilate; int indicates a fixed kernel_size. If random is set to False and kernel_size is a tuple of length 2, then it will be interpreted as (erode kernel_size, dilate kernel_size). It should be noted that the kernel of the erosion and dilation has the same height and width.
iterations (int | tuple[int], optional) – The range of random iterations of erode/dilate; int indicates a fixed iterations. If random is set to False and iterations is a tuple of length 2, then it will be interpreted as (erode iterations, dilate iterations). Default to 1.
random (bool, optional) – Whether use random kernel_size and iterations when generating trimap. See kernel_size and iterations for more information.
-
class
mmedit.datasets.pipelines.
GenerateTrimapWithDistTransform
(dist_thr=20, random=True)[source]¶ Generate trimap with distance transform function.
- Parameters
dist_thr (int, optional) – Distance threshold. Area with alpha value between (0, 255) will be considered as initial unknown area. Then area with distance to unknown area smaller than the distance threshold will also be consider as unknown area. Defaults to 20.
random (bool, optional) – If True, use random distance threshold from [1, dist_thr). If False, use dist_thr as the distance threshold directly. Defaults to True.
-
class
mmedit.datasets.pipelines.
GetMaskedImage
(img_name='gt_img', mask_name='mask')[source]¶ Get masked image.
- Parameters
img_name (str) – Key for clean image.
mask_name (str) – Key for mask image. The mask shape should be (h, w, 1) while ‘1’ indicate holes and ‘0’ indicate valid regions.
-
class
mmedit.datasets.pipelines.
GetSpatialDiscountMask
(gamma=0.99, beta=1.5)[source]¶ Get spatial discounting mask constant.
Spatial discounting mask is first introduced in: Generative Image Inpainting with Contextual Attention.
- Parameters
gamma (float, optional) – Gamma for computing spatial discounting. Defaults to 0.99.
beta (float, optional) – Beta for computing spatial discounting. Defaults to 1.5.
-
class
mmedit.datasets.pipelines.
ImageToTensor
(keys, to_float32=True)[source]¶ Convert image type to torch.Tensor type.
- Parameters
keys (Sequence[str]) – Required keys to be converted.
to_float32 (bool) – Whether convert numpy image array to np.float32 before converted to tensor. Default: True.
-
class
mmedit.datasets.pipelines.
LoadImageFromFile
(io_backend='disk', key='gt', flag='color', channel_order='bgr', save_original_img=False, **kwargs)[source]¶ Load image from file.
- Parameters
io_backend (str) – io backend where images are store. Default: ‘disk’.
key (str) – Keys in results to find corresponding path. Default: ‘gt’.
flag (str) – Loading flag for images. Default: ‘color’.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘bgr’.
save_original_img (bool) – If True, maintain a copy of the image in results dict with name of f’ori_{key}’. Default: False.
kwargs (dict) – Args for file client.
-
class
mmedit.datasets.pipelines.
LoadImageFromFileList
(io_backend='disk', key='gt', flag='color', channel_order='bgr', save_original_img=False, **kwargs)[source]¶ Load image from file list.
It accepts a list of path and read each frame from each path. A list of frames will be returned.
- Parameters
io_backend (str) – io backend where images are store. Default: ‘disk’.
key (str) – Keys in results to find corresponding path. Default: ‘gt’.
flag (str) – Loading flag for images. Default: ‘color’.
save_original_img (bool) – If True, maintain a copy of the image in results dict with name of f’ori_{key}’. Default: False.
kwargs (dict) – Args for file client.
-
class
mmedit.datasets.pipelines.
LoadMask
(mask_mode='bbox', mask_config=None)[source]¶ Load Mask for multiple types.
For different types of mask, users need to provide the corresponding config dict.
Example config for bbox:
config = dict(img_shape=(256, 256), max_bbox_shape=128)
Example config for irregular:
config = dict( img_shape=(256, 256), num_vertexes=(4, 12), max_angle=4., length_range=(10, 100), brush_width=(10, 40), area_ratio_range=(0.15, 0.5))
Example config for ff:
config = dict( img_shape=(256, 256), num_vertexes=(4, 12), mean_angle=1.2, angle_range=0.4, brush_width=(12, 40))
Example config for set:
config = dict( mask_list_file='xxx/xxx/ooxx.txt', prefix='/xxx/xxx/ooxx/', io_backend='disk', flag='unchanged', file_client_kwargs=dict() ) The mask_list_file contains the list of mask file name like this: test1.jpeg test2.jpeg ... ... The prefix gives the data path.
- Parameters
mask_mode (str) – Mask mode in [‘bbox’, ‘irregular’, ‘ff’, ‘set’, ‘file’]. * bbox: square bounding box masks. * irregular: irregular holes. * ff: free-form holes from DeepFillv2. * set: randomly get a mask from a mask set. * file: get mask from ‘mask_path’ in results.
mask_config (dict) – Params for creating masks. Each type of mask needs different configs.
-
class
mmedit.datasets.pipelines.
LoadPairedImageFromFile
(io_backend='disk', key='gt', flag='color', channel_order='bgr', save_original_img=False, **kwargs)[source]¶ Load a pair of images from file.
Each sample contains a pair of images, which are concatenated in the w dimension (a|b). This is a special loading class for generation paired dataset. It loads a pair of images as the common loader does and crops it into two images with the same shape in different domains.
Required key is “pair_path”. Added or modified keys are “pair”, “pair_ori_shape”, “ori_pair”, “img_a”, “img_b”, “img_a_path”, “img_b_path”, “img_a_ori_shape”, “img_b_ori_shape”, “ori_img_a” and “ori_img_b”.
- Parameters
io_backend (str) – io backend where images are store. Default: ‘disk’.
key (str) – Keys in results to find corresponding path. Default: ‘gt’.
flag (str) – Loading flag for images. Default: ‘color’.
channel_order (str) – Order of channel, candidates are ‘bgr’ and ‘rgb’. Default: ‘bgr’.
save_original_img (bool) – If True, maintain a copy of the image in results dict with name of f’ori_{key}’. Default: False.
kwargs (dict) – Args for file client.
-
class
mmedit.datasets.pipelines.
MergeFgAndBg
[source]¶ Composite foreground image and background image with alpha.
Required keys are “alpha”, “fg” and “bg”, added key is “merged”.
-
class
mmedit.datasets.pipelines.
ModCrop
[source]¶ Mod crop gt images, used during testing.
Required keys are “scale” and “gt”, added or modified keys are “gt”.
-
class
mmedit.datasets.pipelines.
Normalize
(keys, mean, std, to_rgb=False)[source]¶ Normalize images with the given mean and std value.
Required keys are the keys in attribute “keys”, added or modified keys are the keys in attribute “keys” and these keys with postfix ‘_norm_cfg’. It also supports normalizing a list of images.
- Parameters
keys (Sequence[str]) – The images to be normalized.
mean (np.ndarray) – Mean values of different channels.
std (np.ndarray) – Std values of different channels.
to_rgb (bool) – Whether to convert channels from BGR to RGB.
-
class
mmedit.datasets.pipelines.
Pad
(keys, ds_factor=32, **kwargs)[source]¶ Pad the images to align with network downsample factor for testing.
See Reshape for more explanation. numpy.pad is used for the pad operation. Required keys are the keys in attribute “keys”, added or modified keys are “test_trans” and the keys in attribute “keys”. All keys in “keys” should have the same shape. “test_trans” is used to record the test transformation to align the input’s shape.
- Parameters
keys (list[str]) – The images to be padded.
ds_factor (int) – Downsample factor of the network. The height and weight will be padded to a multiple of ds_factor. Default: 32.
kwargs (option) – any keyword arguments to be passed to numpy.pad.
-
class
mmedit.datasets.pipelines.
PairedRandomCrop
(gt_patch_size)[source]¶ Paried random crop.
It crops a pair of lq and gt images with corresponding locations. It also supports accepting lq list and gt list. Required keys are “scale”, “lq”, and “gt”, added or modified keys are “lq” and “gt”.
- Parameters
gt_patch_size (int) – cropped gt patch size.
-
class
mmedit.datasets.pipelines.
PerturbBg
(gamma_ratio=0.6)[source]¶ Randomly add gaussian noise or gamma change to background image.
Required key is “bg”, added key is “noisy_bg”.
- Parameters
gamma_ratio (float, optional) – The probability to use gamma correction instead of gaussian noise. Defaults to 0.6.
-
class
mmedit.datasets.pipelines.
RandomAffine
(keys, degrees, translate=None, scale=None, shear=None, flip_ratio=None)[source]¶ Apply random affine to input images.
This class is adopted from https://github.com/pytorch/vision/blob/v0.5.0/torchvision/transforms/transforms.py#L1015 # noqa It should be noted that in https://github.com/Yaoyi-Li/GCA-Matting/blob/master/dataloader/data_generator.py#L70 # noqa random flip is added. See explanation of flip_ratio below. Required keys are the keys in attribute “keys”, modified keys are keys in attribute “keys”.
- Parameters
keys (Sequence[str]) – The images to be affined.
degrees (float | tuple[float]) – Range of degrees to select from. If it is a float instead of a tuple like (min, max), the range of degrees will be (-degrees, +degrees). Set to 0 to deactivate rotations.
translate (tuple, optional) – Tuple of maximum absolute fraction for horizontal and vertical translations. For example translate=(a, b), then horizontal shift is randomly sampled in the range -img_width * a < dx < img_width * a and vertical shift is randomly sampled in the range -img_height * b < dy < img_height * b. Default: None.
scale (tuple, optional) – Scaling factor interval, e.g (a, b), then scale is randomly sampled from the range a <= scale <= b. Default: None.
shear (float | tuple[float], optional) – Range of shear degrees to select from. If shear is a float, a shear parallel to the x axis and a shear parallel to the y axis in the range (-shear, +shear) will be applied. Else if shear is a tuple of 2 values, a x-axis shear and a y-axis shear in (shear[0], shear[1]) will be applied. Default: None.
flip_ratio (float, optional) – Probability of the image being flipped. The flips in horizontal direction and vertical direction are independent. The image may be flipped in both directions. Default: None.
-
class
mmedit.datasets.pipelines.
RandomJitter
(hue_range=40)[source]¶ Randomly jitter the foreground in hsv space.
The jitter range of hue is adjustable while the jitter ranges of saturation and value are adaptive to the images. Side effect: the “fg” image will be converted to np.float32. Required keys are “fg” and “alpha”, modified key is “fg”.
- Parameters
hue_range (float | tuple[float]) – Range of hue jittering. If it is a float instead of a tuple like (min, max), the range of hue jittering will be (-hue_range, +hue_range). Default: 40.
-
class
mmedit.datasets.pipelines.
RandomLoadResizeBg
(bg_dir, io_backend='disk', flag='color', **kwargs)[source]¶ Randomly load a background image and resize it.
Required key is “fg”, added key is “bg”.
- Parameters
bg_dir (str) – Path of directory to load background images from.
io_backend (str) – io backend where images are store. Default: ‘disk’.
flag (str) – Loading flag for images. Default: ‘color’.
kwargs (dict) – Args for file client.
-
class
mmedit.datasets.pipelines.
RandomMaskDilation
(keys, binary_thr=0.0, kernel_min=9, kernel_max=49)[source]¶ Randomly dilate binary masks.
- Parameters
keys (Sequence[str]) – The images to be resized.
get_binary (bool) – If True, according to binary_thr, reset final output as binary mask. Otherwise, return masks directly.
binary_thr (float) – Threshold for obtaining binary mask.
kernel_min (int) – Min size of dilation kernel.
kernel_max (int) – Max size of dilation kernel.
-
class
mmedit.datasets.pipelines.
RandomTransposeHW
(keys, transpose_ratio=0.5)[source]¶ Randomly transpose images in H and W dimensions with a probability.
(TransposeHW = horizontal flip + anti-clockwise rotatation by 90 degrees) When used with horizontal/vertical flips, it serves as a way of rotation augmentation. It also supports randomly transposing a list of images.
Required keys are the keys in attributes “keys”, added or modified keys are “transpose” and the keys in attributes “keys”.
- Parameters
keys (list[str]) – The images to be transposed.
transpose_ratio (float) – The propability to transpose the images.
-
class
mmedit.datasets.pipelines.
RescaleToZeroOne
(keys)[source]¶ Transform the images into a range between 0 and 1.
Required keys are the keys in attribute “keys”, added or modified keys are the keys in attribute “keys”. It also supports rescaling a list of images.
- Parameters
keys (Sequence[str]) – The images to be transformed.
-
class
mmedit.datasets.pipelines.
Resize
(keys, scale=None, keep_ratio=False, size_factor=None, max_size=None, interpolation='bilinear')[source]¶ Resize data to a specific size for training or resize the images to fit the network input regulation for testing.
When used for resizing images to fit network input regulation, the case is that a network may have several downsample and then upsample operation, then the input height and width should be divisible by the downsample factor of the network. For example, the network would downsample the input for 5 times with stride 2, then the downsample factor is 2^5 = 32 and the height and width should be divisible by 32.
Required keys are the keys in attribute “keys”, added or modified keys are “keep_ratio”, “scale_factor”, “interpolation” and the keys in attribute “keys”.
All keys in “keys” should have the same shape. “test_trans” is used to record the test transformation to align the input’s shape.
- Parameters
keys (list[str]) – The images to be resized.
scale (float | Tuple[int]) – If scale is Tuple(int), target spatial size (h, w). Otherwise, target spatial size is scaled by input size. If any of scale is -1, we will rescale short edge. Note that when it is used, size_factor and max_size are useless. Default: None
keep_ratio (bool) – If set to True, images will be resized without changing the aspect ratio. Otherwise, it will resize images to a given size. Default: False. Note that it is used togher with scale.
size_factor (int) – Let the output shape be a multiple of size_factor. Default:None. Note that when it is used, scale should be set to None and keep_ratio should be set to False.
max_size (int) – The maximum size of the longest side of the output. Default:None. Note that it is used togher with size_factor.
interpolation (str) – Algorithm used for interpolation: “nearest” | “bilinear” | “bicubic” | “area” | “lanczos”. Default: “bilinear”.
-
class
mmedit.datasets.pipelines.
TemporalReverse
(keys, reverse_ratio=0.5)[source]¶ Reverse frame lists for temporal augmentation.
Required keys are the keys in attributes “lq” and “gt”, added or modified keys are “lq”, “gt” and “reverse”.
- Parameters
keys (list[str]) – The frame lists to be reversed.
reverse_ratio (float) – The propability to reverse the frame lists. Default: 0.5.
mmedit.models¶
models¶
-
class
mmedit.models.
BaseMattor
(backbone, refiner=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶ Base class for matting model.
A matting model must contain a backbone which produces alpha, a dense prediction with the same height and width of input image. In some cases, the model will has a refiner which refines the prediction of the backbone.
The subclasses should overwrite the function
forward_train
andforward_test
which define the output of the model and maybe the connection between the backbone and the refiner.- Parameters
backbone (dict) – Config of backbone.
refiner (dict) – Config of refiner.
train_cfg (dict) – Config of training. In
train_cfg
,train_backbone
should be specified. If the model has a refiner,train_refiner
should be specified.test_cfg (dict) – Config of testing. In
test_cfg
, If the model has a refiner,train_refiner
should be specified.pretrained (str) – Path of pretrained model.
-
evaluate
(pred_alpha, meta)[source]¶ Evaluate predicted alpha matte.
The evaluation metrics are determined by
self.test_cfg.metrics
.- Parameters
pred_alpha (np.ndarray) – The predicted alpha matte of shape (H, W).
meta (list[dict]) – Meta data about the current data batch. Currently only batch_size 1 is supported. Required keys in the meta dict are
ori_alpha
andori_trimap
.
- Returns
The evaluation result.
- Return type
dict
-
forward
(merged, trimap, meta, alpha=None, test_mode=False, **kwargs)[source]¶ Defines the computation performed at every call.
- Parameters
merged (Tensor) – Image to predict alpha matte.
trimap (Tensor) – Trimap of the input image.
meta (list[dict]) – Meta data about the current data batch. Defaults to None.
alpha (Tensor, optional) – Ground-truth alpha matte. Defaults to None.
test_mode (bool, optional) – Whether in test mode. If
True
, it will callforward_test
of the model. Otherwise, it will callforward_train
of the model. Defaults to False.
- Returns
Return the output of
self.forward_test
iftest_mode
are set toTrue
. Otherwise return the output ofself.forward_train
.- Return type
dict
-
abstract
forward_test
(merged, trimap, meta, **kwargs)[source]¶ Defines the computation performed at every test call.
-
abstract
forward_train
(merged, trimap, alpha, **kwargs)[source]¶ Defines the computation performed at every training call.
- Parameters
merged (Tensor) – Image to predict alpha matte.
trimap (Tensor) – Trimap of the input image.
alpha (Tensor) – Ground-truth alpha matte.
-
init_weights
(pretrained=None)[source]¶ Initialize the model network weights.
- Parameters
pretrained (str, optional) – Path to the pretrained weight. Defaults to None.
-
restore_shape
(pred_alpha, meta)[source]¶ Restore the predicted alpha to the original shape.
The shape of the predicted alpha may not be the same as the shape of original input image. This function restores the shape of the predicted alpha.
- Parameters
pred_alpha (np.ndarray) – The predicted alpha.
meta (list[dict]) – Meta data about the current data batch. Currently only batch_size 1 is supported.
- Returns
The reshaped predicted alpha.
- Return type
np.ndarray
-
save_image
(pred_alpha, meta, save_path, iteration)[source]¶ Save predicted alpha to file.
- Parameters
pred_alpha (np.ndarray) – The predicted alpha matte of shape (H, W).
meta (list[dict]) – Meta data about the current data batch. Currently only batch_size 1 is supported. Required keys in the meta dict are
merged_path
.save_path (str) – The directory to save predicted alpha matte.
iteration (int | None) – If given as None, the saved alpha matte will have the same file name with
merged_path
in meta dict. If given as an int, the saved alpha matte would named with postfix_{iteration}.png
.
-
train_step
(data_batch, optimizer)[source]¶ Defines the computation and network update at every training call.
- Parameters
data_batch (torch.Tensor) – Batch of data as input.
optimizer (torch.optim.Optimizer) – Optimizer of the model.
- Returns
Output of
train_step
containing the logging variables of the current data batch.- Return type
dict
-
property
with_refiner
¶ Whether the matting model has a refiner.
-
class
mmedit.models.
BaseModel
[source]¶ Base model.
All models should subclass it. All subclass should overwrite:
init_weights
, supporting to initialize models.forward_train
, supporting to forward when training.forward_test
, supporting to forward when testing.train_step
, supporting to train one step when training.-
forward
(imgs, labels, test_mode, **kwargs)[source]¶ Forward function for base model.
- Parameters
imgs (Tensor) – Input image(s).
labels (Tensor) – Ground-truth label(s).
test_mode (bool) – Whether in test mode.
kwargs (dict) – Other arguments.
- Returns
Forward results.
- Return type
Tensor
-
abstract
forward_test
(imgs)[source]¶ Abstract method for testing forward.
All subclass should overwrite it.
-
abstract
forward_train
(imgs, labels)[source]¶ Abstract method for training forward.
All subclass should overwrite it.
-
abstract
init_weights
()[source]¶ Abstract method for initializing weight.
All subclass should overwrite it.
-
parse_losses
(losses)[source]¶ Parse losses dict for different loss variants.
- Parameters
losses (dict) – Loss dict.
- Returns
Sum of the total loss. log_vars (dict): loss dict for different variants.
- Return type
loss (float)
-
-
class
mmedit.models.
BasicRestorer
(generator, pixel_loss, train_cfg=None, test_cfg=None, pretrained=None)[source]¶ Basic model for image restoration.
It must contain a generator that takes an image as inputs and outputs a restored image. It also has a pixel-wise loss for training.
The subclasses should overwrite the function forward_train, forward_test and train_step.
- Parameters
generator (dict) – Config for the generator structure.
pixel_loss (dict) – Config for pixel-wise loss.
train_cfg (dict) – Config for training. Default: None.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path for pretrained model. Default: None.
-
evaluate
(output, gt)[source]¶ Evaluation function.
- Parameters
output (Tensor) – Model output with shape (n, c, h, w).
gt (Tensor) – GT Tensor with shape (n, c, h, w).
- Returns
Evaluation results.
- Return type
dict
-
forward
(lq, gt=None, test_mode=False, **kwargs)[source]¶ Forward function.
- Parameters
lq (Tensor) – Input lq images.
gt (Tensor) – Ground-truth image. Default: None.
test_mode (bool) – Whether in test mode or not. Default: False.
kwargs (dict) – Other arguments.
-
forward_dummy
(img)[source]¶ Used for computing network FLOPs.
- Parameters
img (Tensor) – Input image.
- Returns
Output image.
- Return type
Tensor
-
forward_test
(lq, gt=None, meta=None, save_image=False, save_path=None, iteration=None)[source]¶ Testing forward function.
- Parameters
lq (Tensor) – LQ Tensor with shape (n, c, h, w).
gt (Tensor) – GT Tensor with shape (n, c, h, w). Default: None.
save_image (bool) – Whether to save image. Default: False.
save_path (str) – Path to save image. Default: None.
iteration (int) – Iteration for the saving image name. Default: None.
- Returns
Output results.
- Return type
dict
-
forward_train
(lq, gt)[source]¶ Training forward function.
- Parameters
lq (Tensor) – LQ Tensor with shape (n, c, h, w).
gt (Tensor) – GT Tensor with shape (n, c, h, w).
- Returns
Output tensor.
- Return type
Tensor
-
init_weights
(pretrained=None)[source]¶ Init weights for models.
- Parameters
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
-
class
mmedit.models.
CycleGAN
(generator, discriminator, gan_loss, cycle_loss, id_loss=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶ CycleGAN model for unpaired image-to-image translation.
Ref: Unpaired Image-to-Image Translation using Cycle-Consistent Adversarial Networks
- Parameters
generator (dict) – Config for the generator.
discriminator (dict) – Config for the discriminator.
gan_loss (dict) – Config for the gan loss.
cycle_loss (dict) – Config for the cycle-consistency loss.
id_loss (dict) – Config for the identity loss. Default: None.
train_cfg (dict) – Config for training. Default: None. You may change the training of gan by setting: disc_steps: how many discriminator updates after one generator update. disc_init_steps: how many discriminator updates at the start of the training. These two keys are useful when training with WGAN. direction: image-to-image translation direction (the model training direction): a2b | b2a. buffer_size: GAN image buffer size.
test_cfg (dict) – Config for testing. Default: None. You may change the testing of gan by setting: direction: image-to-image translation direction (the model training direction): a2b | b2a. show_input: whether to show input real images. test_direction: direction in the test mode (the model testing direction). CycleGAN has two generators. It decides whether to perform forward or backward translation with respect to direction during testing: a2b | b2a.
pretrained (str) – Path for pretrained model. Default: None.
-
backward_discriminators
(outputs)[source]¶ Backward function for the discriminators.
- Parameters
outputs (dict) – Dict of forward results.
- Returns
Loss dict.
- Return type
dict
-
backward_generators
(outputs)[source]¶ Backward function for the generators.
- Parameters
outputs (dict) – Dict of forward results.
- Returns
Loss dict.
- Return type
dict
-
forward
(img_a, img_b, meta, test_mode=False, **kwargs)[source]¶ Forward function.
- Parameters
img_a (Tensor) – Input image from domain A.
img_b (Tensor) – Input image from domain B.
meta (list[dict]) – Input meta data.
test_mode (bool) – Whether in test mode or not. Default: False.
kwargs (dict) – Other arguments.
-
forward_dummy
(img)[source]¶ Used for computing network FLOPs.
- Parameters
img (Tensor) – Dummy input used to compute FLOPs.
- Returns
Dummy output produced by forwarding the dummy input.
- Return type
Tensor
-
forward_test
(img_a, img_b, meta, save_image=False, save_path=None, iteration=None)[source]¶ Forward function for testing.
- Parameters
img_a (Tensor) – Input image from domain A.
img_b (Tensor) – Input image from domain B.
meta (list[dict]) – Input meta data.
save_image (bool, optional) – If True, results will be saved as images. Default: False.
save_path (str, optional) – If given a valid str path, the results will be saved in this path. Default: None.
iteration (int, optional) – Iteration number. Default: None.
- Returns
Dict of forward and evaluation results for testing.
- Return type
dict
-
forward_train
(img_a, img_b, meta)[source]¶ Forward function for training.
- Parameters
img_a (Tensor) – Input image from domain A.
img_b (Tensor) – Input image from domain B.
meta (list[dict]) – Input meta data.
- Returns
Dict of forward results for training.
- Return type
dict
-
get_module
(module)[source]¶ Get nn.ModuleDict to fit the MMDistributedDataParallel interface.
- Parameters
module (MMDistributedDataParallel | nn.ModuleDict) – The input module that needs processing.
- Returns
The ModuleDict of multiple networks.
- Return type
nn.ModuleDict
-
init_weights
(pretrained=None)[source]¶ Initialize weights for the model.
- Parameters
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Default: None.
-
setup
(img_a, img_b, meta)[source]¶ Perform necessary pre-processing steps.
- Parameters
img_a (Tensor) – Input image from domain A.
img_b (Tensor) – Input image from domain B.
meta (list[dict]) – Input meta data.
- Returns
The real images from domain A/B, and the image path as the metadata.
- Return type
Tensor, Tensor, list[str]
-
train_step
(data_batch, optimizer)[source]¶ Training step function.
- Parameters
data_batch (dict) – Dict of the input data batch.
optimizer (dict[torch.optim.Optimizer]) – Dict of optimizers for the generators and discriminators.
- Returns
Dict of loss, information for logger, the number of samples and results for visualization.
- Return type
dict
-
class
mmedit.models.
DIM
(backbone, refiner=None, train_cfg=None, test_cfg=None, pretrained=None, loss_alpha=None, loss_comp=None, loss_refine=None)[source]¶ Deep Image Matting model.
https://arxiv.org/abs/1703.03872
Note
For
(self.train_cfg.train_backbone, self.train_cfg.train_refiner)
:(True, False)
corresponds to the encoder-decoder stage in the paper.(False, True)
corresponds to the refinement stage in the paper.(True, True)
corresponds to the fine-tune stage in the paper.
- Parameters
backbone (dict) – Config of backbone.
refiner (dict) – Config of refiner.
train_cfg (dict) – Config of training. In
train_cfg
,train_backbone
should be specified. If the model has a refiner,train_refiner
should be specified.test_cfg (dict) – Config of testing. In
test_cfg
, If the model has a refiner,train_refiner
should be specified.pretrained (str) – Path of pretrained model.
loss_alpha (dict) – Config of the alpha prediction loss. Default: None.
loss_comp (dict) – Config of the composition loss. Default: None.
loss_refine (dict) – Config of the loss of the refiner. Default: None.
-
forward_test
(merged, trimap, meta, save_image=False, save_path=None, iteration=None)[source]¶ Defines the computation performed at every test call.
- Parameters
merged (Tensor) – Image to predict alpha matte.
trimap (Tensor) – Trimap of the input image.
meta (list[dict]) – Meta data about the current data batch. Currently only batch_size 1 is supported. It may contain information needed to calculate metrics (
ori_alpha
andori_trimap
) or save predicted alpha matte (merged_path
).save_image (bool, optional) – Whether save predicted alpha matte. Defaults to False.
save_path (str, optional) – The directory to save predicted alpha matte. Defaults to None.
iteration (int, optional) – If given as None, the saved alpha matte will have the same file name with
merged_path
in meta dict. If given as an int, the saved alpha matte would named with postfix_{iteration}.png
. Defaults to None.
- Returns
Contains the predicted alpha and evaluation result.
- Return type
dict
-
forward_train
(merged, trimap, meta, alpha, ori_merged, fg, bg)[source]¶ Defines the computation performed at every training call.
- Parameters
merged (Tensor) – of shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
trimap (Tensor) – of shape (N, 1, H, W). Tensor of trimap read by opencv.
meta (list[dict]) – Meta data about the current data batch.
alpha (Tensor) – of shape (N, 1, H, W). Tensor of alpha read by opencv.
ori_merged (Tensor) – of shape (N, C, H, W). Tensor of origin merged image read by opencv (not normalized).
fg (Tensor) – of shape (N, C, H, W). Tensor of fg read by opencv.
bg (Tensor) – of shape (N, C, H, W). Tensor of bg read by opencv.
- Returns
Contains the loss items and batch infomation.
- Return type
dict
-
class
mmedit.models.
DeepFillv1Inpaintor
(*args, stage1_loss_type=('loss_l1_hole'), stage2_loss_type=('loss_l1_hole', 'loss_gan'), input_with_ones=True, disc_input_with_mask=False, **kwargs)[source]¶ -
calculate_loss_with_type
(loss_type, fake_res, fake_img, gt, mask, prefix='stage1_', fake_local=None)[source]¶ Calculate multiple types of losses.
- Parameters
loss_type (str) – Type of the loss.
fake_res (torch.Tensor) – Direct results from model.
fake_img (torch.Tensor) – Composited results from model.
gt (torch.Tensor) – Ground-truth tensor.
mask (torch.Tensor) – Mask tensor.
prefix (str, optional) – Prefix for loss name. Defaults to ‘stage1_’.
fake_local (torch.Tensor, optional) – Local results from model. Defaults to None.
- Returns
Contain loss value with its name.
- Return type
dict
-
forward_train_d
(data_batch, is_real, is_disc)[source]¶ Forward function in discriminator training step.
In this function, we modify the default implementation with only one discriminator. In DeepFillv1 model, they use two separated discriminators for global and local consistency.
- Parameters
data (torch.Tensor) – Batch of real data or fake data.
is_real (bool) – If True, the gan loss will regard this batch as real data. Otherwise, the gan loss will regard this batch as fake data.
is_disc (bool) – If True, this function is called in discriminator training step. Otherwise, this function is called in generator training step. This will help us to compute different types of adversarial loss, like LSGAN.
- Returns
Contains the loss items computed in this function.
- Return type
dict
-
get_module
(model, module_name)[source]¶ Get an inner module from model.
Since we will wrapper DDP for some model, we have to judge whether the module can be indexed directly.
- Parameters
model (nn.Module) – This model may wrapped with DDP or not.
module_name (str) – The name of specific module.
- Returns
Returned sub module.
- Return type
nn.Module
-
train_step
(data_batch, optimizer)[source]¶ Train step function.
In this function, the inpaintor will finish the train step following the pipeline:
get fake res/image
optimize discriminator (if have)
optimize generator
If self.train_cfg.disc_step > 1, the train step will contain multiple iterations for optimizing discriminator with different input data and only one iteration for optimizing gerator after disc_step iterations for discriminator.
- Parameters
data_batch (torch.Tensor) – Batch of data as input.
optimizer (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).
- Returns
Dict with loss, information for logger, the number of samples and results for visualization.
- Return type
dict
-
-
class
mmedit.models.
ESRGAN
(generator, discriminator=None, gan_loss=None, pixel_loss=None, perceptual_loss=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶ Enhanced SRGAN model for single image super-resolution.
Ref: ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. It uses RaGAN for GAN updates: The relativistic discriminator: a key element missing from standard GAN.
- Parameters
generator (dict) – Config for the generator.
discriminator (dict) – Config for the discriminator. Default: None.
gan_loss (dict) – Config for the gan loss. Note that the loss weight in gan loss is only for the generator.
pixel_loss (dict) – Config for the pixel loss. Default: None.
perceptual_loss (dict) – Config for the perceptual loss. Default: None.
train_cfg (dict) – Config for training. Default: None. You may change the training of gan by setting: disc_steps: how many discriminator updates after one generate update; disc_init_steps: how many discriminator updates at the start of the training. These two keys are useful when training with WGAN.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path for pretrained model. Default: None.
-
class
mmedit.models.
GCA
(backbone, train_cfg=None, test_cfg=None, pretrained=None, loss_alpha=None)[source]¶ Guided Contextual Attention image matting model.
https://arxiv.org/abs/2001.04069
- Parameters
backbone (dict) – Config of backbone.
train_cfg (dict) – Config of training. In
train_cfg
,train_backbone
should be specified. If the model has a refiner,train_refiner
should be specified.test_cfg (dict) – Config of testing. In
test_cfg
, If the model has a refiner,train_refiner
should be specified.pretrained (str) – Path of the pretrained model.
loss_alpha (dict) – Config of the alpha prediction loss. Default: None.
-
forward_test
(merged, trimap, meta, save_image=False, save_path=None, iteration=None)[source]¶ Defines the computation performed at every test call.
- Parameters
merged (Tensor) – Image to predict alpha matte.
trimap (Tensor) – Trimap of the input image.
meta (list[dict]) – Meta data about the current data batch. Currently only batch_size 1 is supported. It may contain information needed to calculate metrics (
ori_alpha
andori_trimap
) or save predicted alpha matte (merged_path
).save_image (bool, optional) – Whether save predicted alpha matte. Defaults to False.
save_path (str, optional) – The directory to save predicted alpha matte. Defaults to None.
iteration (int, optional) – If given as None, the saved alpha matte will have the same file name with
merged_path
in meta dict. If given as an int, the saved alpha matte would named with postfix_{iteration}.png
. Defaults to None.
- Returns
Contains the predicted alpha and evaluation result.
- Return type
dict
-
forward_train
(merged, trimap, meta, alpha)[source]¶ Forward function for training GCA model.
- Parameters
merged (Tensor) – with shape (N, C, H, W) encoding input images. Typically these should be mean centered and std scaled.
trimap (Tensor) – with shape (N, C’, H, W). Tensor of trimap. C’ might be 1 or 3.
meta (list[dict]) – Meta data about the current data batch.
alpha (Tensor) – with shape (N, 1, H, W). Tensor of alpha.
- Returns
Contains the loss items and batch infomation.
- Return type
dict
-
class
mmedit.models.
GLInpaintor
(encdec, disc=None, loss_gan=None, loss_gp=None, loss_disc_shift=None, loss_composed_percep=None, loss_out_percep=False, loss_l1_hole=None, loss_l1_valid=None, loss_tv=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶ Inpaintor for global&local method.
This inpaintor is implemented according to the paper: Globally and Locally Consistent Image Completion
Importantly, this inpaintor is an example for using custom training schedule based on OneStageInpaintor.
The training pipeline of global&local is as following:
if cur_iter < iter_tc: update generator with only l1 loss else: update discriminator if cur_iter > iter_td: update generator with l1 loss and adversarial loss
The new attribute cur_iter is added for recording current number of iteration. The train_cfg contains the setting of the training schedule:
train_cfg = dict( start_iter=0, disc_step=1, iter_tc=90000, iter_td=100000 )
iter_tc and iter_td correspond to the noation \(T_C\) and \(T_D\) of theoriginal paper.
- Parameters
generator (dict) – Config for encoder-decoder style generator.
disc (dict) – Config for discriminator.
loss_gan (dict) – Config for adversarial loss.
loss_gp (dict) – Config for gradient penalty loss.
loss_disc_shift (dict) – Config for discriminator shift loss.
loss_composed_percep (dict) – Config for perceptural and style loss with composed image as input.
loss_out_percep (dict) – Config for perceptural and style loss with direct output as input.
loss_l1_hole (dict) – Config for l1 loss in the hole.
loss_l1_valid (dict) – Config for l1 loss in the valid region.
loss_tv (dict) – Config for total variation loss.
train_cfg (dict) – Configs for training scheduler. disc_step must be contained for indicates the discriminator updating steps in each training step.
test_cfg (dict) – Configs for testing scheduler.
pretrained (str) – Path for pretrained model. Default None.
-
generator_loss
(fake_res, fake_img, fake_local, data_batch)[source]¶ Forward function in generator training step.
In this function, we mainly compute the loss items for generator with the given (fake_res, fake_img). In general, the fake_res is the direct output of the generator and the fake_img is the composition of direct output and ground-truth image.
- Parameters
fake_res (torch.Tensor) – Direct output of the generator.
fake_img (torch.Tensor) – Composition of fake_res and ground-truth image.
data_batch (dict) – Contain other elements for computing losses.
- Returns
A tuple containing two dictionaries. The first one is the result dict, which contains the results computed within this function for visualization. The second one is the loss dict, containing loss items computed in this function.
- Return type
tuple[dict]
-
train_step
(data_batch, optimizer)[source]¶ Train step function.
In this function, the inpaintor will finish the train step following the pipeline:
get fake res/image
optimize discriminator (if in current schedule)
optimzie generator (if in current schedule)
If
self.train_cfg.disc_step > 1
, the train step will contain multiple iterations for optimizing discriminator with different input data and sonly one iteration for optimizing generator after disc_step iterations for discriminator.- Parameters
data_batch (torch.Tensor) – Batch of data as input.
optimizer (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).
- Returns
Dict with loss, information for logger, the number of samples and results for visualization.
- Return type
dict
-
class
mmedit.models.
IndexNet
(backbone, train_cfg=None, test_cfg=None, pretrained=None, loss_alpha=None, loss_comp=None)[source]¶ IndexNet matting model.
This implementation follows: Indices Matter: Learning to Index for Deep Image Matting
- Parameters
backbone (dict) – Config of backbone.
train_cfg (dict) – Config of training. In ‘train_cfg’, ‘train_backbone’ should be specified.
test_cfg (dict) – Config of testing.
pretrained (str) – path of pretrained model.
loss_alpha (dict) – Config of the alpha prediction loss. Default: None.
loss_comp (dict) – Config of the composition loss. Default: None.
-
forward_test
(merged, trimap, meta, save_image=False, save_path=None, iteration=None)[source]¶ Defines the computation performed at every test call.
- Parameters
merged (Tensor) – Image to predict alpha matte.
trimap (Tensor) – Trimap of the input image.
meta (list[dict]) – Meta data about the current data batch. Currently only batch_size 1 is supported. It may contain information needed to calculate metrics (
ori_alpha
andori_trimap
) or save predicted alpha matte (merged_path
).save_image (bool, optional) – Whether save predicted alpha matte. Defaults to False.
save_path (str, optional) – The directory to save predicted alpha matte. Defaults to None.
iteration (int, optional) – If given as None, the saved alpha matte will have the same file name with
merged_path
in meta dict. If given as an int, the saved alpha matte would named with postfix_{iteration}.png
. Defaults to None.
- Returns
Contains the predicted alpha and evaluation result.
- Return type
dict
-
forward_train
(merged, trimap, meta, alpha, ori_merged, fg, bg)[source]¶ Forward function for training IndexNet model.
- Parameters
merged (Tensor) – Input images tensor with shape (N, C, H, W). Typically these should be mean centered and std scaled.
trimap (Tensor) – Tensor of trimap with shape (N, 1, H, W).
meta (list[dict]) – Meta data about the current data batch.
alpha (Tensor) – Tensor of alpha with shape (N, 1, H, W).
ori_merged (Tensor) – Tensor of origin merged images (not normalized) with shape (N, C, H, W).
fg (Tensor) – Tensor of foreground with shape (N, C, H, W).
bg (Tensor) – Tensor of background with shape (N, C, H, W).
- Returns
Contains the loss items and batch infomation.
- Return type
dict
-
class
mmedit.models.
OneStageInpaintor
(encdec, disc=None, loss_gan=None, loss_gp=None, loss_disc_shift=None, loss_composed_percep=None, loss_out_percep=False, loss_l1_hole=None, loss_l1_valid=None, loss_tv=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶ Standard one-stage inpaintor with commonly used losses.
An inpaintor must contain an encoder-decoder style generator to inpaint masked regions. A discriminator will be adopted when adversarial training is needed.
In this class, we provide a common interface for inpaintors. For other inpaintors, only some funcs may be modified to fit the input style or training schedule.
- Parameters
generator (dict) – Config for encoder-decoder style generator.
disc (dict) – Config for discriminator.
loss_gan (dict) – Config for adversarial loss.
loss_gp (dict) – Config for gradient penalty loss.
loss_disc_shift (dict) – Config for discriminator shift loss.
loss_composed_percep (dict) – Config for perceptural and style loss with composed image as input.
loss_out_percep (dict) – Config for perceptural and style loss with direct output as input.
loss_l1_hole (dict) – Config for l1 loss in the hole.
loss_l1_valid (dict) – Config for l1 loss in the valid region.
loss_tv (dict) – Config for total variation loss.
train_cfg (dict) – Configs for training scheduler. disc_step must be contained for indicates the discriminator updating steps in each training step.
test_cfg (dict) – Configs for testing scheduler.
pretrained (str) – Path for pretrained model. Default None.
-
forward
(masked_img, mask, test_mode=True, **kwargs)[source]¶ Forward function.
- Parameters
masked_img (torch.Tensor) – Image with hole as input.
mask (torch.Tensor) – Mask as input.
test_mode (bool, optional) – Whether use testing mode. Defaults to True.
- Returns
Dict contains output results.
- Return type
dict
-
forward_dummy
(x)[source]¶ Forward dummy function for getting flops.
- Parameters
x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
- Returns
Results tensor with shape of (n, 3, h, w).
- Return type
torch.Tensor
-
forward_test
(masked_img, mask, save_image=False, save_path=None, iteration=None, **kwargs)[source]¶ Forward function for testing.
- Parameters
masked_img (torch.Tensor) – Tensor with shape of (n, 3, h, w).
mask (torch.Tensor) – Tensor with shape of (n, 1, h, w).
save_image (bool, optional) – If True, results will be saved as image. Defaults to False.
save_path (str, optional) – If given a valid str, the reuslts will be saved in this path. Defaults to None.
iteration (int, optional) – Iteration number. Defaults to None.
- Returns
Contain output results and eval metrics (if have).
- Return type
dict
-
forward_train
(*args, **kwargs)[source]¶ Forward function for training.
In this version, we do not use this interface.
-
forward_train_d
(data_batch, is_real, is_disc)[source]¶ Forward function in discriminator training step.
In this function, we compute the prediction for each data batch (real or fake). Meanwhile, the standard gan loss will be computed with several proposed losses fro stable training.
- Parameters
data (torch.Tensor) – Batch of real data or fake data.
is_real (bool) – If True, the gan loss will regard this batch as real data. Otherwise, the gan loss will regard this batch as fake data.
is_disc (bool) – If True, this function is called in discriminator training step. Otherwise, this function is called in generator training step. This will help us to compute different types of adversarial loss, like LSGAN.
- Returns
Contains the loss items computed in this function.
- Return type
dict
-
generator_loss
(fake_res, fake_img, data_batch)[source]¶ Forward function in generator training step.
In this function, we mainly compute the loss items for generator with the given (fake_res, fake_img). In general, the fake_res is the direct output of the generator and the fake_img is the composition of direct output and ground-truth image.
- Parameters
fake_res (torch.Tensor) – Direct output of the generator.
fake_img (torch.Tensor) – Composition of fake_res and ground-truth image.
data_batch (dict) – Contain other elements for computing losses.
- Returns
Dict contains the results computed within this function for visualization and dict contains the loss items computed in this function.
- Return type
tuple(dict)
-
init_weights
(pretrained=None)[source]¶ Init weights for models.
- Parameters
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
-
save_visualization
(img, filename)[source]¶ Save visualization results.
- Parameters
img (torch.Tensor) – Tensor with shape of (n, 3, h, w).
filename (str) – Path to save visualization.
-
train_step
(data_batch, optimizer)[source]¶ Train step function.
In this function, the inpaintor will finish the train step following the pipeline:
get fake res/image
optimize discriminator (if have)
optimize generator
If self.train_cfg.disc_step > 1, the train step will contain multiple iterations for optimizing discriminator with different input data and only one iteration for optimizing gerator after disc_step iterations for discriminator.
- Parameters
data_batch (torch.Tensor) – Batch of data as input.
optimizer (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).
- Returns
Dict with loss, information for logger, the number of samples and results for visualization.
- Return type
dict
-
class
mmedit.models.
PConvInpaintor
(encdec, disc=None, loss_gan=None, loss_gp=None, loss_disc_shift=None, loss_composed_percep=None, loss_out_percep=False, loss_l1_hole=None, loss_l1_valid=None, loss_tv=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶ -
forward_dummy
(x)[source]¶ Forward dummy function for getting flops.
- Parameters
x (torch.Tensor) – Input tensor with shape of (n, c, h, w).
- Returns
Results tensor with shape of (n, 3, h, w).
- Return type
torch.Tensor
-
forward_test
(masked_img, mask, save_image=False, save_path=None, iteration=None, **kwargs)[source]¶ Forward function for testing.
- Parameters
masked_img (torch.Tensor) – Tensor with shape of (n, 3, h, w).
mask (torch.Tensor) – Tensor with shape of (n, 1, h, w).
save_image (bool, optional) – If True, results will be saved as image. Defaults to False.
save_path (str, optional) – If given a valid str, the reuslts will be saved in this path. Defaults to None.
iteration (int, optional) – Iteration number. Defaults to None.
- Returns
Contain output results and eval metrics (if have).
- Return type
dict
-
train_step
(data_batch, optimizer)[source]¶ Train step function.
In this function, the inpaintor will finish the train step following the pipeline:
get fake res/image
optimize discriminator (if have)
optimize generator
If self.train_cfg.disc_step > 1, the train step will contain multiple iterations for optimizing discriminator with different input data and only one iteration for optimizing gerator after disc_step iterations for discriminator.
- Parameters
data_batch (torch.Tensor) – Batch of data as input.
optimizer (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).
- Returns
Dict with loss, information for logger, the number of samples and results for visualization.
- Return type
dict
-
-
class
mmedit.models.
Pix2Pix
(generator, discriminator, gan_loss, pixel_loss=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶ Pix2Pix model for paired image-to-image translation.
Ref: Image-to-Image Translation with Conditional Adversarial Networks
- Parameters
generator (dict) – Config for the generator.
discriminator (dict) – Config for the discriminator.
gan_loss (dict) – Config for the gan loss.
pixel_loss (dict) – Config for the pixel loss. Default: None.
train_cfg (dict) – Config for training. Default: None. You may change the training of gan by setting: disc_steps: how many discriminator updates after one generator update. disc_init_steps: how many discriminator updates at the start of the training. These two keys are useful when training with WGAN. direction: image-to-image translation direction (the model training direction): a2b | b2a.
test_cfg (dict) – Config for testing. Default: None. You may change the testing of gan by setting: direction: image-to-image translation direction (the model training direction, same as testing direction): a2b | b2a. show_input: whether to show input real images.
pretrained (str) – Path for pretrained model. Default: None.
-
backward_discriminator
(outputs)[source]¶ Backward function for the discriminator.
- Parameters
outputs (dict) – Dict of forward results.
- Returns
Loss dict.
- Return type
dict
-
backward_generator
(outputs)[source]¶ Backward function for the generator.
- Parameters
outputs (dict) – Dict of forward results.
- Returns
Loss dict.
- Return type
dict
-
forward
(img_a, img_b, meta, test_mode=False, **kwargs)[source]¶ Forward function.
- Parameters
img_a (Tensor) – Input image from domain A.
img_b (Tensor) – Input image from domain B.
meta (list[dict]) – Input meta data.
test_mode (bool) – Whether in test mode or not. Default: False.
kwargs (dict) – Other arguments.
-
forward_dummy
(img)[source]¶ Used for computing network FLOPs.
- Parameters
img (Tensor) – Dummy input used to compute FLOPs.
- Returns
Dummy output produced by forwarding the dummy input.
- Return type
Tensor
-
forward_test
(img_a, img_b, meta, save_image=False, save_path=None, iteration=None)[source]¶ Forward function for testing.
- Parameters
img_a (Tensor) – Input image from domain A.
img_b (Tensor) – Input image from domain B.
meta (list[dict]) – Input meta data.
save_image (bool, optional) – If True, results will be saved as images. Default: False.
save_path (str, optional) – If given a valid str path, the results will be saved in this path. Default: None.
iteration (int, optional) – Iteration number. Default: None.
- Returns
Dict of forward and evaluation results for testing.
- Return type
dict
-
forward_train
(img_a, img_b, meta)[source]¶ Forward function for training.
- Parameters
img_a (Tensor) – Input image from domain A.
img_b (Tensor) – Input image from domain B.
meta (list[dict]) – Input meta data.
- Returns
Dict of forward results for training.
- Return type
dict
-
init_weights
(pretrained=None)[source]¶ Initialize weights for the model.
- Parameters
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Default: None.
-
setup
(img_a, img_b, meta)[source]¶ Perform necessary pre-processing steps.
- Parameters
img_a (Tensor) – Input image from domain A.
img_b (Tensor) – Input image from domain B.
meta (list[dict]) – Input meta data.
- Returns
The real images from domain A/B, and the image path as the metadata.
- Return type
Tensor, Tensor, list[str]
-
train_step
(data_batch, optimizer)[source]¶ Training step function.
- Parameters
data_batch (dict) – Dict of the input data batch.
optimizer (dict[torch.optim.Optimizer]) – Dict of optimizers for the generator and discriminator.
- Returns
Dict of loss, information for logger, the number of samples and results for visualization.
- Return type
dict
-
class
mmedit.models.
SRGAN
(generator, discriminator=None, gan_loss=None, pixel_loss=None, perceptual_loss=None, train_cfg=None, test_cfg=None, pretrained=None)[source]¶ SRGAN model for single image super-resolution.
Ref: Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network.
- Parameters
generator (dict) – Config for the generator.
discriminator (dict) – Config for the discriminator. Default: None.
gan_loss (dict) – Config for the gan loss. Note that the loss weight in gan loss is only for the generator.
pixel_loss (dict) – Config for the pixel loss. Default: None.
perceptual_loss (dict) – Config for the perceptual loss. Default: None.
train_cfg (dict) – Config for training. Default: None. You may change the training of gan by setting: disc_steps: how many discriminator updates after one generate update; disc_init_steps: how many discriminator updates at the start of the training. These two keys are useful when training with WGAN.
test_cfg (dict) – Config for testing. Default: None.
pretrained (str) – Path for pretrained model. Default: None.
-
forward
(lq, gt=None, test_mode=False, **kwargs)[source]¶ Forward function.
- Parameters
lq (Tensor) – Input lq images.
gt (Tensor) – Ground-truth image. Default: None.
test_mode (bool) – Whether in test mode or not. Default: False.
kwargs (dict) – Other arguments.
-
class
mmedit.models.
TwoStageInpaintor
(*args, stage1_loss_type=('loss_l1_hole'), stage2_loss_type=('loss_l1_hole', 'loss_gan'), input_with_ones=True, disc_input_with_mask=False, **kwargs)[source]¶ Two-Stage Inpaintor.
Currently, we support these loss types in each of two stage inpaintors: [‘loss_gan’, ‘loss_l1_hole’, ‘loss_l1_valid’, ‘loss_composed_percep’, ‘loss_out_percep’, ‘loss_tv’] The stage1_loss_type and stage2_loss_type should be chosen from these loss types.
- Parameters
stage1_loss_type (tuple[str]) – Contains the loss names used in the first stage model.
stage2_loss_type (tuple[str]) – Contains the loss names used in the second stage model.
input_with_ones (bool) – Whether to concatenate an extra ones tensor in input. Default: True.
disc_input_with_mask (bool) – Whether to add mask as input in discriminator. Default: False.
-
calculate_loss_with_type
(loss_type, fake_res, fake_img, gt, mask, prefix='stage1_')[source]¶ Calculate multiple types of losses.
- Parameters
loss_type (str) – Type of the loss.
fake_res (torch.Tensor) – Direct results from model.
fake_img (torch.Tensor) – Composited results from model.
gt (torch.Tensor) – Ground-truth tensor.
mask (torch.Tensor) – Mask tensor.
prefix (str, optional) – Prefix for loss name. Defaults to ‘stage1_’.
- Returns
Contain loss value with its name.
- Return type
dict
-
forward_test
(masked_img, mask, save_image=False, save_path=None, iteration=None, **kwargs)[source]¶ Forward function for testing.
- Parameters
masked_img (torch.Tensor) – Tensor with shape of (n, 3, h, w).
mask (torch.Tensor) – Tensor with shape of (n, 1, h, w).
save_image (bool, optional) – If True, results will be saved as image. Defaults to False.
save_path (str, optional) – If given a valid str, the reuslts will be saved in this path. Defaults to None.
iteration (int, optional) – Iteration number. Defaults to None.
- Returns
Contain output results and eval metrics (if have).
- Return type
dict
-
save_visualization
(img, filename)[source]¶ Save visualization results.
- Parameters
img (torch.Tensor) – Tensor with shape of (n, 3, h, w).
filename (str) – Path to save visualization.
-
train_step
(data_batch, optimizer)[source]¶ Train step function.
In this function, the inpaintor will finish the train step following the pipeline:
get fake res/image
optimize discriminator (if have)
optimize generator
If self.train_cfg.disc_step > 1, the train step will contain multiple iterations for optimizing discriminator with different input data and only one iteration for optimizing gerator after disc_step iterations for discriminator.
- Parameters
data_batch (torch.Tensor) – Batch of data as input.
optimizer (dict[torch.optim.Optimizer]) – Dict with optimizers for generator and discriminator (if have).
- Returns
Dict with loss, information for logger, the number of samples and results for visualization.
- Return type
dict
-
mmedit.models.
build
(cfg, registry, default_args=None)[source]¶ Build module function.
- Parameters
cfg (dict) – Configuration for building modules.
registry (obj) –
registry
object.default_args (dict, optional) – Default arguments. Defaults to None.
-
mmedit.models.
build_backbone
(cfg)[source]¶ Build backbone.
- Parameters
cfg (dict) – Configuration for building backbone.
-
mmedit.models.
build_component
(cfg)[source]¶ Build component.
- Parameters
cfg (dict) – Configuration for building component.
common¶
-
class
mmedit.models.common.
ASPP
(in_channels, out_channels=256, mid_channels=256, dilations=(12, 24, 36), conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, separable_conv=False)[source]¶ ASPP module from DeepLabV3.
The code is adopted from https://github.com/pytorch/vision/blob/master/torchvision/models/segmentation/deeplabv3.py # noqa
For more information about the module: “Rethinking Atrous Convolution for Semantic Image Segmentation”.
- Parameters
in_channels (int) – Input channels of the module.
out_channels (int) – Output channels of the module.
mid_channels (int) – Output channels of the intermediate ASPP conv modules.
dilations (Sequence[int]) – Dilation rate of three ASPP conv module. Default: [12, 24, 36].
conv_cfg (dict) – Config dict for convolution layer. If “None”, nn.Conv2d will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
act_cfg (dict) – Config dict for activation layer. Default: dict(type=’ReLU’).
separable_conv (bool) – Whether replace normal conv with depthwise separable conv which is faster. Default: False.
-
class
mmedit.models.common.
ContextualAttentionModule
(unfold_raw_kernel_size=4, unfold_raw_stride=2, unfold_raw_padding=1, unfold_corr_kernel_size=3, unfold_corr_stride=1, unfold_corr_dilation=1, unfold_corr_padding=1, scale=0.5, fuse_kernel_size=3, softmax_scale=10, return_attenion_score=True)[source]¶ Contexture attention module.
The details of this module can be found in: Generative Image Inpainting with Contextual Attention
- Parameters
unfold_raw_kernel_size (int) – Kernel size used in unfolding raw feature. Default: 4.
unfold_raw_stride (int) – Stride used in unfolding raw feature. Default: 2.
unfold_raw_padding (int) – Padding used in unfolding raw feature. Default: 1.
unfold_corr_kernel_size (int) – Kernel size used in unfolding context for computing correlation maps. Default: 3.
unfold_corr_stride (int) – Stride used in unfolding context for computing correlation maps. Default: 1.
unfold_corr_dilation (int) – Dilation used in unfolding context for computing correlation maps. Default: 1.
unfold_corr_padding (int) – Padding used in unfolding context for computing correlation maps. Default: 1.
scale (float) – The resale factor used in resize input features. Default: 0.5.
fuse_kernel_size (int) – The kernel size used in fusion module. Default: 3.
softmax_scale (float) – The scale factor for softmax function. Default: 10.
return_attenion_score (bool) – If True, the attention score will be returned. Default: True.
-
calculate_overlap_factor
(attention_score)[source]¶ Calculte the overlap factor after applying deconv.
- Parameters
attention_score (torch.Tensor) – The attention score with shape of (n, c, h, w).
- Returns
The overlap factor will be returned.
- Return type
torch.Tensor
-
calculate_unfold_hw
(input_size, kernel_size=3, stride=1, dilation=1, padding=0)[source]¶ Calculate (h, w) after unfolding
The official implementation of unfold in pytorch will put the dimension (h, w) into L. Thus, this function is just to calculate the (h, w) according to the equation in: https://pytorch.org/docs/stable/nn.html#torch.nn.Unfold
-
forward
(x, context, mask=None)[source]¶ Forward Function.
- Parameters
x (torch.Tensor) – Tensor with shape (n, c, h, w).
context (torch.Tensor) – Tensor with shape (n, c, h, w).
mask (torch.Tensor) – Tensor with shape (n, 1, h, w). Default: None.
- Returns
Features after contextural attention.
- Return type
tuple(torch.Tensor)
-
fuse_correlation_map
(correlation_map, h_unfold, w_unfold)[source]¶ Fuse correlation map.
This operation is to fuse correlation map for increasing large consistent correlation regions.
The mechanism behind this op is simple and easy to understand. A standard ‘Eye’ matrix will be applied as a filter on the correlation map in horizontal and vertical direction.
The shape of input correlation map is (n, h_unfold*w_unfold, h, w). When adopting fusing, we will apply convolutional filter in the reshaped feature map with shape of (n, 1, h_unfold*w_fold, h*w).
A simple specification for horizontal direction is shown below:
(h, (h, (h, (h, 0) 1) 2) 3) ... (h, 0) (h, 1) 1 (h, 2) 1 (h, 3) 1 ...
-
im2col
(img, kernel_size, stride=1, padding=0, dilation=1, normalize=False, return_cols=False)[source]¶ Reshape image-style feature to columns.
This function is used for unfold feature maps to columns. The details of this function can be found in: https://pytorch.org/docs/1.1.0/nn.html?highlight=unfold#torch.nn.Unfold
- Parameters
img (torch.Tensor) – Features to be unfolded. The shape of this feature should be (n, c, h, w).
kernel_size (int) – In this function, we only support square kernel with same height and width.
stride (int) – Stride number in unfolding. Default: 1.
padding (int) – Padding number in unfolding. Default: 0.
dilation (int) – Dilation number in unfolding. Default: 1.
normalize (bool) – If True, the unfolded feature will be normalized. Default: False.
return_cols (bool) – The official implementation in PyTorch of unfolding will return features with shape of (n, c*$prod{kernel_size}$, L). If True, the features will be reshaped to (n, L, c, kernel_size, kernel_size). Otherwise, the results will maintain the shape as the official implementation.
- Returns
Unfolded columns. If return_cols is True, the shape of output tensor is (n, L, c, kernel_size, kernel_size). Otherwise, the shape will be (n, c*$prod{kernel_size}$, L).
- Return type
torch.Tensor
-
mask_correlation_map
(correlation_map, mask)[source]¶ Add mask weight for correlation map.
Add a negative infinity number to the masked regions so that softmax function will result in ‘zero’ in those regions.
- Parameters
correlation_map (torch.Tensor) – Correlation map with shape of (n, h_unfold*w_unfold, h_map, w_map).
mask (torch.Tensor) – Mask tensor with shape of (n, c, h, w). ‘1’ in the mask indicates masked region while ‘0’ indicates valid region.
- Returns
Updated correlation map with mask.
- Return type
torch.Tensor
-
class
mmedit.models.common.
DepthwiseSeparableConvModule
(in_channels, out_channels, kernel_size, stride=1, padding=0, dilation=1, norm_cfg=None, act_cfg={'type': 'ReLU'}, dw_norm_cfg='default', dw_act_cfg='default', pw_norm_cfg='default', pw_act_cfg='default', **kwargs)[source]¶ Depthwise separable convolution module.
See https://arxiv.org/pdf/1704.04861.pdf for details.
This module can replace a ConvModule with the conv block replaced by two conv block: depthwise conv block and pointwise conv block. The depthwise conv block contains depthwise-conv/norm/activation layers. The pointwise conv block contains pointwise-conv/norm/activation layers. It should be noted that there will be norm/activation layer in the depthwise conv block if
norm_cfg
andact_cfg
are specified.- Parameters
in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
stride (int or tuple[int]) – Same as nn.Conv2d. Default: 1.
padding (int or tuple[int]) – Same as nn.Conv2d. Default: 0.
dilation (int or tuple[int]) – Same as nn.Conv2d. Default: 1.
norm_cfg (dict) – Default norm config for both depthwise ConvModule and pointwise ConvModule. Default: None.
act_cfg (dict) – Default activation config for both depthwise ConvModule and pointwise ConvModule. Default: dict(type=’ReLU’).
dw_norm_cfg (dict) – Norm config of depthwise ConvModule. If it is ‘default’, it will be the same as
norm_cfg
. Default: ‘default’.dw_act_cfg (dict) – Activation config of depthwise ConvModule. If it is ‘default’, it will be the same as
act_cfg
. Default: ‘default’.pw_norm_cfg (dict) – Norm config of pointwise ConvModule. If it is ‘default’, it will be the same as norm_cfg. Default: ‘default’.
pw_act_cfg (dict) – Activation config of pointwise ConvModule. If it is ‘default’, it will be the same as
act_cfg
. Default: ‘default’.kwargs (optional) – Other shared arguments for depthwise and pointwise ConvModule. See ConvModule for ref.
-
class
mmedit.models.common.
GANImageBuffer
(buffer_size, buffer_ratio=0.5)[source]¶ This class implements an image buffer that stores previously generated images.
This buffer allows us to update the discriminator using a history of generated images rather than the ones produced by the latest generator to reduce model oscillation.
- Parameters
buffer_size (int) – The size of image buffer. If buffer_size = 0, no buffer will be created.
buffer_ratio (float) – The chance / possibility to use the images previously stored in the buffer.
-
class
mmedit.models.common.
GCAModule
(in_channels, out_channels, kernel_size=3, stride=1, rate=2, pad_args={'mode': 'reflect'}, interpolation='nearest', penalty=- 10000.0, eps=0.0001)[source]¶ Guided Contextual Attention Module.
From https://arxiv.org/pdf/2001.04069.pdf. Based on https://github.com/nbei/Deep-Flow-Guided-Video-Inpainting. This module use image feature map to augment the alpha feature map with guided contextual attention score.
Image feature and alpha feature are unfolded to small patches and later used as conv kernel. Thus, we refer the unfolding size as kernel size. Image feature patches have a default kernel size 3 while the kernel size of alpha feature patches could be specified by rate (see rate below). The image feature patches are used to convolve with the image feature itself to calculate the contextual attention. Then the attention feature map is convolved by alpha feature patches to obtain the attentioned alpha feature. At last, the attentioned alpah feature is added to the input alpha feature.
- Parameters
in_channels (int) – Input channels of the guided contextual attention module.
out_channels (int) – Output channels of the guided contextual attention module.
kernel_size (int) – Kernel size of image feature patches. Default 3.
stride (int) – Stride when unfolding the image feature. Default 1.
rate (int) – The downsample rate of image feature map. The corresponding kernel size and stride of alpha feature patches will be rate x 2 and rate. It could be regarded as the granularity of the gca module. Default: 2.
pad_args (dict) – Parameters of padding when convolve image feature with image feature patches or alpha feature patches. Allowed keys are mode and value. See torch.nn.functional.pad() for more information. Default: dict(mode=’reflect’).
interpolation (str) – Interpolation method in upsampling and downsampling.
penalty (float) – Punishment hyperparameter to avoid a large correlation between each unknown patch and itself.
eps (float) – A small number to avoid dividing by 0 when calculating the normed image feature patch. Default: 1e-4.
-
compute_guided_attention_score
(similarity_map, unknown_ps, scale, self_mask)[source]¶ Compute guided attention score.
- Parameters
similarity_map (Tensor) – Similarity map of image feature with shape (1, img_h*img_w, img_h, img_w).
unknown_ps (Tensor) – Unknown area patches tensor of shape (1, img_h*img_w, 1, 1).
scale (Tensor) – Softmax scale of known and unknown area: [unknown_scale, known_scale].
self_mask (Tensor) – Self correlation mask of shape (1, img_h*img_w, img_h, img_w). At (1, i*i, i, i) mask value equals -1e4 for i in [1, img_h*img_w] and other area is all zero.
- Returns
Similarity map between image feature patches with shape (1, img_h*img_w, img_h, img_w).
- Return type
Tensor
-
compute_similarity_map
(img_feat, img_ps)[source]¶ Compute similarity between image feature patches.
- Parameters
img_feat (Tensor) – Image feature map of shape (1, img_c, img_h, img_w).
img_ps (Tensor) – Image feature patches tensor of shape (1, img_h*img_w, img_c, img_ks, img_ks).
- Returns
Similarity map between image feature patches with shape (1, img_h*img_w, img_h, img_w).
- Return type
Tensor
-
extract_feature_maps_patches
(img_feat, alpha_feat, unknown)[source]¶ Extract image feature, alpha feature unknown patches.
- Parameters
img_feat (Tensor) – Image feature map of shape (N, img_c, img_h, img_w).
alpha_feat (Tensor) – Alpha feature map of shape (N, alpha_c, ori_h, ori_w).
unknown (Tensor, optional) – Unknown area map generated by trimap of shape (N, 1, img_h, img_w).
- Returns
3-tuple of
Tensor
: Image feature patches of shape (N, img_h*img_w, img_c, img_ks, img_ks).Tensor
: Guided contextual attentioned alpha feature map. (N, img_h*img_w, alpha_c, alpha_ks, alpha_ks).Tensor
: Unknown mask of shape (N, img_h*img_w, 1, 1).- Return type
tuple
-
extract_patches
(x, kernel_size, stride)[source]¶ Extract feature patches.
The feature map will be padded automatically to make sure the number of patches is equal to (H / stride) * (W / stride).
- Parameters
x (Tensor) – Feature map of shape (N, C, H, W).
kernel_size (int) – Size of each patches.
stride (int) – Stride between patches.
- Returns
Extracted patches of shape (N, (H / stride) * (W / stride) , C, kernel_size, kernel_size).
- Return type
Tensor
-
forward
(img_feat, alpha_feat, unknown=None, softmax_scale=1.0)[source]¶ Forward function of GCAModule.
- Parameters
img_feat (Tensor) – Image feature map of shape (N, ori_c, ori_h, ori_w).
alpha_feat (Tensor) – Alpha feature map of shape (N, alpha_c, ori_h, ori_w).
unknown (Tensor, optional) – Unknown area map generated by trimap. If specified, this tensor should have shape (N, 1, ori_h, ori_w).
softmax_scale (float, optional) – The softmax scale of the attention if unknown area is not provided in forward. Default: 1.
- Returns
The augmented alpha feature.
- Return type
Tensor
-
process_unknown_mask
(unknown, img_feat, softmax_scale)[source]¶ Process unknown mask.
- Parameters
unknown (Tensor, optional) – Unknown area map generated by trimap of shape (N, 1, ori_h, ori_w)
img_feat (Tensor) – The interpolated image feature map of shape (N, img_c, img_h, img_w).
softmax_scale (float, optional) – The softmax scale of the attention if unknown area is not provided in forward. Default: 1.
- Returns
2-tuple of
Tensor
: Interpolated unknown area map of shape (N, img_h*img_w, img_h, img_w).Tensor
: Softmax scale tensor of known and unknown area of shape (N, 2).- Return type
tuple
-
propagate_alpha_feature
(gca_score, alpha_ps)[source]¶ Propagate alpha feature based on guided attention score.
- Parameters
gca_score (Tensor) – Guided attention score map of shape (1, img_h*img_w, img_h, img_w).
alpha_ps (Tensor) – Alpha feature patches tensor of shape (1, img_h*img_w, alpha_c, alpha_ks, alpha_ks).
- Returns
Propagted alpha feature map of shape (1, alpha_c, alpha_h, alpha_w).
- Return type
Tensor
-
class
mmedit.models.common.
LinearModule
(in_features, out_features, bias=True, act_cfg={'type': 'ReLU'}, inplace=True, with_spectral_norm=False, order=('linear', 'act'))[source]¶ A linear block that contains linear/norm/activation layers.
For low level visioin, we add spectral norm and padding layer.
- Parameters
in_features (int) – Same as nn.Linear.
out_features (int) – Same as nn.Linear.
bias (bool) – Same as nn.Linear.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
inplace (bool) – Whether to use inplace mode for activation.
with_spectral_norm (bool) – Whether use spectral norm in linear module.
order (tuple[str]) – The order of linear/activation layers. It is a sequence of “linear”, “norm” and “act”. Examples are (“linear”, “act”) and (“act”, “linear”).
-
class
mmedit.models.common.
MaskConvModule
(*args, **kwargs)[source]¶ Mask convolution module.
This is a simple wrapper for mask convolution like: ‘partial conv’. Convolutions in this module always need a mask as extra input.
- Parameters
in_channels (int) – Same as nn.Conv2d.
out_channels (int) – Same as nn.Conv2d.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
stride (int or tuple[int]) – Same as nn.Conv2d.
padding (int or tuple[int]) – Same as nn.Conv2d.
dilation (int or tuple[int]) – Same as nn.Conv2d.
groups (int) – Same as nn.Conv2d.
bias (bool or str) – If specified as auto, it will be decided by the norm_cfg. Bias will be set as True if norm_cfg is None, otherwise False.
conv_cfg (dict) – Config dict for convolution layer.
norm_cfg (dict) – Config dict for normalization layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
inplace (bool) – Whether to use inplace mode for activation.
with_spectral_norm (bool) – Whether use spectral norm in conv module.
padding_mode (str) – If the padding_mode has not been supported by current Conv2d in Pytorch, we will use our own padding layer instead. Currently, we support [‘zeros’, ‘circular’] with official implementation and [‘reflect’] with our own implementaion. Default: ‘zeros’.
order (tuple[str]) – The order of conv/norm/activation layers. It is a sequence of “conv”, “norm” and “act”. Examples are (“conv”, “norm”, “act”) and (“act”, “conv”, “norm”).
-
forward
(x, mask=None, activate=True, norm=True, return_mask=True)[source]¶ Forward function for partial conv2d.
- Parameters
input (torch.Tensor) – Tensor with shape of (n, c, h, w).
mask (torch.Tensor) – Tensor with shape of (n, c, h, w) or (n, 1, h, w). If mask is not given, the function will work as standard conv2d. Default: None.
activate (bool) – Whether use activation layer.
norm (bool) – Whether use norm layer.
return_mask (bool) – If True and mask is not None, the updated mask will be returned. Default: True.
- Returns
Result Tensor or 2-tuple of
Tensor
: Results after partial conv.Tensor
: Updated mask will be returned if mask is given and return_mask is True.- Return type
Tensor or tuple
-
class
mmedit.models.common.
PartialConv2d
(*args, multi_channel=False, eps=1e-08, **kwargs)[source]¶ Implementation for partial convolution.
Image Inpainting for Irregular Holes Using Partial Convolutions [https://arxiv.org/abs/1804.07723]
- Parameters
multi_channel (bool) – If True, the mask is multi-channle. Otherwise, the mask is single-channel.
eps (float) – Need to be changed for mixed precision training. For mixed precision training, you need change 1e-8 to 1e-6.
-
forward
(input, mask=None, return_mask=True)[source]¶ Forward function for partial conv2d.
- Parameters
input (torch.Tensor) – Tensor with shape of (n, c, h, w).
mask (torch.Tensor) – Tensor with shape of (n, c, h, w) or (n, 1, h, w). If mask is not given, the function will work as standard conv2d. Default: None.
return_mask (bool) – If True and mask is not None, the updated mask will be returned. Default: True.
- Returns
Results after partial conv. torch.Tensor : Updated mask will be returned if mask is given and
return_mask
is True.- Return type
torch.Tensor
-
class
mmedit.models.common.
PixelShufflePack
(in_channels, out_channels, scale_factor, upsample_kernel)[source]¶ Pixel Shuffle upsample layer.
- Parameters
in_channels (int) – Number of input channels.
out_channels (int) – Number of output channels.
scale_factor (int) – Upsample ratio.
upsample_kernel (int) – Kernel size of Conv layer to expand channels.
- Returns
Upsampled feature map.
-
class
mmedit.models.common.
ResidualBlockNoBN
(mid_channels=64, res_scale=1.0)[source]¶ Residual block without BN.
It has a style of:
---Conv-ReLU-Conv-+- |________________|
- Parameters
mid_channels (int) – Channel number of intermediate features. Default: 64.
res_scale (float) – Used to scale the residual before addition. Default: 1.0.
-
forward
(x)[source]¶ Forward function.
- Parameters
x (Tensor) – Input tensor with shape (n, c, h, w).
- Returns
Forward results.
- Return type
Tensor
-
init_weights
()[source]¶ Initialize weights for ResidualBlockNoBN.
Initialization methods like kaiming_init are for VGG-style modules. For modules with residual paths, using smaller std is better for stability and performance. We empirically use 0.1. See more details in “ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks”
-
class
mmedit.models.common.
ResidualBlockWithDropout
(channels, padding_mode, norm_cfg={'type': 'BN'}, use_dropout=True)[source]¶ Define a Residual Block with dropout layers.
Ref: Deep Residual Learning for Image Recognition
A residual block is a conv block with skip connections. A dropout layer is added between two common conv modules.
- Parameters
channels (int) – Number of channels in the conv layer.
padding_mode (str) – The name of padding layer: ‘reflect’ | ‘replicate’ | ‘zeros’.
norm_cfg (dict) – Config dict to build norm layer. Default: dict(type=’IN’).
use_dropout (bool) – Whether to use dropout layers. Default: True.
-
class
mmedit.models.common.
SimpleGatedConvModule
(in_channels, out_channels, kernel_size, feat_act_cfg={'type': 'ELU'}, gate_act_cfg={'type': 'Sigmoid'}, **kwargs)[source]¶ Simple Gated Convolutional Module.
This module is a simple gated convolutional module. The detailed formula is:
\[y = \phi(conv1(x)) * \sigma(conv2(x)),\]where phi is the feature activation function and sigma is the gate activation function. In default, the gate activation function is sigmoid.
- Parameters
in_channels (int) – Same as nn.Conv2d.
out_channels (int) – The number of channels of the output feature. Note that out_channels in the conv module is doubled since this module contains two convolutions for feature and gate seperately.
kernel_size (int or tuple[int]) – Same as nn.Conv2d.
feat_act_cfg (dict) – Config dict for feature activation layer.
gate_act_cfg (dict) – Config dict for gate activation layer.
kwargs (keyword arguments) – Same as ConvModule.
-
class
mmedit.models.common.
UnetSkipConnectionBlock
(outer_channels, inner_channels, in_channels=None, submodule=None, is_outermost=False, is_innermost=False, norm_cfg={'type': 'BN'}, use_dropout=False)[source]¶ Construct a Unet submodule with skip connections, with the following structure: downsampling - submodule - upsampling.
- Parameters
outer_channels (int) – Number of channels at the outer conv layer.
inner_channels (int) – Number of channels at the inner conv layer.
in_channels (int) – Number of channels in input images/features. If is None, equals to outer_channels. Default: None.
submodule (UnetSkipConnectionBlock) – Previously constructed submodule. Default: None.
is_outermost (bool) – Whether this module is the outermost module. Default: False.
is_innermost (bool) – Whether this module is the innermost module. Default: False.
norm_cfg (dict) – Config dict to build norm layer. Default: dict(type=’BN’).
use_dropout (bool) – Whether to use dropout layers. Default: False.
-
mmedit.models.common.
default_init_weights
(module, scale=1)[source]¶ Initialize network weights.
- Parameters
modules (nn.Module) – Modules to be initialized.
scale (float) – Scale initialized weights, especially for residual blocks.
-
mmedit.models.common.
extract_around_bbox
(img, bbox, target_size, channel_first=True)[source]¶ Extract patches around the given bbox.
- Parameters
bbox (np.ndarray | torch.Tensor) – Bboxes to be modified. Bbox can be in batch or not.
target_size (List(int)) – Target size of final bbox.
- Returns
Extracted patches. The dimension of the output should be the same as img.
- Return type
(torch.Tensor | numpy.array)
-
mmedit.models.common.
extract_bbox_patch
(bbox, img, channel_first=True)[source]¶ Extract patch from a given bbox
- Parameters
bbox (torch.Tensor | numpy.array) – Bbox with (top, left, h, w). If img has batch dimension, the bbox must be stacked at first dimension. The shape should be (4,) or (n, 4).
img (torch.Tensor | numpy.array) – Image data to be extracted. If organized in batch dimension, the batch dimension must be the first order like (n, h, w, c) or (n, c, h, w).
channel_first (bool) – If True, the channel dimension of img is before height and width, e.g. (c, h, w). Otherwise, the img shape (samples in the batch) is like (h, w, c).
- Returns
Extracted patches. The dimension of the output should be the same as img.
- Return type
(torch.Tensor | numpy.array)
-
mmedit.models.common.
flow_warp
(x, flow, interpolation='bilinear', padding_mode='zeros', align_corners=True)[source]¶ Warp an image or a feature map with optical flow.
- Parameters
x (Tensor) – Tensor with size (n, c, h, w).
flow (Tensor) – Tensor with size (n, h, w, 2). The last dimension is a two-channel, denoting the width and height relative offsets. Note that the values are not normalized to [-1, 1].
interpolation (str) – Interpolation mode: ‘nearest’ or ‘bilinear’. Default: ‘bilinear’.
padding_mode (str) – Padding mode: ‘zeros’ or ‘border’ or ‘reflection’. Default: ‘zeros’.
align_corners (bool) – Whether align corners. Default: True.
- Returns
Warped image or feature map.
- Return type
Tensor
-
mmedit.models.common.
generation_init_weights
(module, init_type='normal', init_gain=0.02)[source]¶ Default initialization of network weights for image generation.
By default, we use normal init, but xavier and kaiming might work better for some applications.
- Parameters
module (nn.Module) – Module to be initialized.
init_type (str) – The name of an initialization method: normal | xavier | kaiming | orthogonal.
init_gain (float) – Scaling factor for normal, xavier and orthogonal.
-
mmedit.models.common.
make_layer
(block, num_blocks, **kwarg)[source]¶ Make layers by stacking the same blocks.
- Parameters
block (nn.module) – nn.module class for basic block.
num_blocks (int) – number of blocks.
- Returns
Stacked blocks in nn.Sequential.
- Return type
nn.Sequential
-
mmedit.models.common.
scale_bbox
(bbox, target_size)[source]¶ Modify bbox to target size.
The original bbox will be enlarged to the target size with the original bbox in the center of the new bbox.
- Parameters
bbox (np.ndarray | torch.Tensor) – Bboxes to be modified. Bbox can be in batch or not. The shape should be (4,) or (n, 4).
target_size (tuple[int]) – Target size of final bbox.
- Returns
Modified bboxes.
- Return type
(np.ndarray | torch.Tensor)
backbones¶
-
class
mmedit.models.backbones.
ContextualAttentionNeck
(in_channels, conv_type='conv', conv_cfg=None, norm_cfg=None, act_cfg={'type': 'ELU'}, contextual_attention_args={'softmax_scale': 10.0}, **kwargs)[source]¶ Neck with contextual attention module.
- Parameters
in_channels (int) – The number of input channels.
conv_type (str) – The type of conv module. In DeepFillv1 model, the conv_type should be ‘conv’. In DeepFillv2 model, the conv_type should be ‘gated_conv’.
conv_cfg (dict | None) – Config of conv module. Default: None.
norm_cfg (dict | None) – Config of norm module. Default: None.
act_cfg (dict | None) – Config of activation layer. Default: dict(type=’ELU’).
contextual_attention_args (dict) – Config of contextual attention module. Default: dict(softmax_scale=10.).
kwargs (keyword arguments) –
-
class
mmedit.models.backbones.
DeepFillDecoder
(in_channels, conv_type='conv', norm_cfg=None, act_cfg={'type': 'ELU'}, out_act_cfg={'max': 1.0, 'min': - 1.0, 'type': 'clip'}, channel_factor=1.0, **kwargs)[source]¶ Decoder used in DeepFill model.
This implementation follows: Generative Image Inpainting with Contextual Attention
- Parameters
in_channels (int) – The number of input channels.
conv_type (str) – The type of conv module. In DeepFillv1 model, the conv_type should be ‘conv’. In DeepFillv2 model, the conv_type should be ‘gated_conv’.
norm_cfg (dict) – Config dict to build norm layer. Default: None.
act_cfg (dict) – Config dict for activation layer, “elu” by default.
out_act_cfg (dict) – Config dict for output activation layer. Here, we provide commonly used clamp or clip operation.
channel_factor (float) – The scale factor for channel size. Default: 1.
kwargs (keyword arguments) –
-
class
mmedit.models.backbones.
DeepFillEncoder
(in_channels=5, conv_type='conv', norm_cfg=None, act_cfg={'type': 'ELU'}, encoder_type='stage1', channel_factor=1.0, **kwargs)[source]¶ Encoder used in DeepFill model.
This implementation follows: Generative Image Inpainting with Contextual Attention
- Parameters
in_channels (int) – The number of input channels. Default: 5.
conv_type (str) – The type of conv module. In DeepFillv1 model, the conv_type should be ‘conv’. In DeepFillv2 model, the conv_type should be ‘gated_conv’.
norm_cfg (dict) – Config dict to build norm layer. Default: None.
act_cfg (dict) – Config dict for activation layer, “elu” by default.
encoder_type (str) – Type of the encoder. Should be one of [‘stage1’, ‘stage2_conv’, ‘stage2_attention’]. Default: ‘stage1’.
channel_factor (float) – The scale factor for channel size. Default: 1.
kwargs (keyword arguments) –
-
class
mmedit.models.backbones.
DeepFillEncoderDecoder
(stage1={'decoder': {'in_channels': 128, 'type': 'DeepFillDecoder'}, 'dilation_neck': {'act_cfg': {'type': 'ELU'}, 'in_channels': 128, 'type': 'GLDilationNeck'}, 'encoder': {'type': 'DeepFillEncoder'}, 'type': 'GLEncoderDecoder'}, stage2={'type': 'DeepFillRefiner'}, return_offset=False)[source]¶ Two-stage encoder-decoder structure used in DeepFill model.
The details are in: Generative Image Inpainting with Contextual Attention
- Parameters
stage1 (dict) – Config dict for building stage1 model. As DeepFill model uses Global&Local model as baseline in first stage, the stage1 model can be easily built with GLEncoderDecoder.
stage2 (dict) – Config dict for building stage2 model.
return_offset (bool) – Whether to return offset feature in contextual attention module. Default: False.
-
forward
(x)[source]¶ Forward function.
- Parameters
x (torch.Tensor) – This input tensor has the shape of (n, 5, h, w). In channel dimension, we concatenate [masked_img, ones, mask] as DeepFillv1 models do.
- Returns
The first two item is the results from first and second stage. If set return_offset as True, the offset will be returned as the third item.
- Return type
tuple[torch.Tensor]
-
class
mmedit.models.backbones.
DepthwiseIndexBlock
(in_channels, norm_cfg={'type': 'BN'}, use_context=False, use_nonlinear=False, mode='o2o')[source]¶ Depthwise index block.
From https://arxiv.org/abs/1908.00672.
- Parameters
in_channels (int) – Input channels of the holistic index block.
kernel_size (int) – Kernel size of the conv layers. Default: 2.
padding (int) – Padding number of the conv layers. Default: 0.
mode (str) – Mode of index block. Should be ‘o2o’ or ‘m2o’. In ‘o2o’ mode, the group of the conv layers is 1; In ‘m2o’ mode, the group of the conv layer is in_channels.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
use_nonlinear (bool) – Whether add a non-linear conv layer in the index blocks. Default: False.
-
class
mmedit.models.backbones.
EDSR
(in_channels, out_channels, mid_channels=64, num_blocks=16, upscale_factor=4, res_scale=1, rgb_mean=(0.4488, 0.4371, 0.404), rgb_std=(1.0, 1.0, 1.0))[source]¶ EDSR network structure.
Paper: Enhanced Deep Residual Networks for Single Image Super-Resolution. Ref repo: https://github.com/thstkdgus35/EDSR-PyTorch
- Parameters
in_channels (int) – Channel number of inputs.
out_channels (int) – Channel number of outputs.
mid_channels (int) – Channel number of intermediate features. Default: 64.
num_blocks (int) – Block number in the trunk network. Default: 16.
upscale_factor (int) – Upsampling factor. Support 2^n and 3. Default: 4.
res_scale (float) – Used to scale the residual in residual block. Default: 1.
rgb_mean (tuple[float]) – Image mean in RGB orders. Default: (0.4488, 0.4371, 0.4040), calculated from DIV2K dataset.
rgb_std (tuple[float]) – Image std in RGB orders. In EDSR, it uses (1.0, 1.0, 1.0). Default: (1.0, 1.0, 1.0).
-
forward
(x)[source]¶ Forward function.
- Parameters
x (Tensor) – Input tensor with shape (n, c, h, w).
- Returns
Forward results.
- Return type
Tensor
-
init_weights
(pretrained=None, strict=True)[source]¶ Init weights for models.
- Parameters
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
strict (boo, optional) – Whether strictly load the pretrained model. Defaults to True.
-
class
mmedit.models.backbones.
EDVRNet
(in_channels, out_channels, mid_channels=64, num_frames=5, deform_groups=8, num_blocks_extraction=5, num_blocks_reconstruction=10, center_frame_idx=2, with_tsa=True)[source]¶ EDVR network structure for video super-resolution.
Now only support X4 upsampling factor. Paper: EDVR: Video Restoration with Enhanced Deformable Convolutional Networks.
- Parameters
in_channels (int) – Channel number of inputs.
out_channels (int) – Channel number of outputs.
mid_channels (int) – Channel number of intermediate features. Default: 64.
num_frames (int) – Number of input frames. Default: 5.
deform_groups (int) – Deformable groups. Defaults: 8.
num_blocks_extraction (int) – Number of blocks for feature extraction. Default: 5.
num_blocks_reconstruction (int) – Number of blocks for reconstruction. Default: 10.
center_frame_idx (int) – The index of center frame. Frame counting from 0. Default: 2.
with_tsa (bool) – Whether to use TSA module. Default: True.
-
forward
(x)[source]¶ Forward function for EDVRNet.
- Parameters
x (Tensor) – Input tensor with shape (n, t, c, h, w).
- Returns
SR center frame with shape (n, c, h, w).
- Return type
Tensor
-
init_weights
(pretrained=None, strict=True)[source]¶ Init weights for models.
- Parameters
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
strict (boo, optional) – Whether strictly load the pretrained model. Defaults to True.
-
class
mmedit.models.backbones.
GLDecoder
(in_channels=256, norm_cfg=None, act_cfg={'type': 'ReLU'}, out_act='clip')[source]¶ Decoder used in Global&Local model.
This implementation follows: Globally and locally Consistent Image Completion
- Parameters
in_channels (int) – Channel number of input feature.
norm_cfg (dict) – Config dict to build norm layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
out_act (str) – Output activation type, “clip” by default. Noted that in our implementation, we clip the output with range [-1, 1].
-
class
mmedit.models.backbones.
GLDilationNeck
(in_channels=256, conv_type='conv', norm_cfg=None, act_cfg={'type': 'ReLU'}, **kwargs)[source]¶ Dilation Backbone used in Global&Local model.
This implementation follows: Globally and locally Consistent Image Completion
- Parameters
in_channels (int) – Channel number of input feature.
conv_type (str) – The type of conv module. In DeepFillv1 model, the conv_type should be ‘conv’. In DeepFillv2 model, the conv_type should be ‘gated_conv’.
norm_cfg (dict) – Config dict to build norm layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
kwargs (keyword arguments) –
-
class
mmedit.models.backbones.
GLEncoder
(norm_cfg=None, act_cfg={'type': 'ReLU'})[source]¶ Encoder used in Global&Local model.
This implementation follows: Globally and locally Consistent Image Completion
- Parameters
norm_cfg (dict) – Config dict to build norm layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
-
class
mmedit.models.backbones.
GLEncoderDecoder
(encoder={'type': 'GLEncoder'}, decoder={'type': 'GLDecoder'}, dilation_neck={'type': 'GLDilationNeck'})[source]¶ Encoder-Decoder used in Global&Local model.
This implementation follows: Globally and locally Consistent Image Completion
The architecture of the encoder-decoder is: (conv2d x 6) –> (dilated conv2d x 4) –> (conv2d or deconv2d x 7)
- Parameters
encoder (dict) – Config dict to encoder.
decoder (dict) – Config dict to build decoder.
dilation_neck (dict) – Config dict to build dilation neck.
-
class
mmedit.models.backbones.
HolisticIndexBlock
(in_channels, norm_cfg={'type': 'BN'}, use_context=False, use_nonlinear=False)[source]¶ Holistic Index Block.
From https://arxiv.org/abs/1908.00672.
- Parameters
in_channels (int) – Input channels of the holistic index block.
kernel_size (int) – Kernel size of the conv layers. Default: 2.
padding (int) – Padding number of the conv layers. Default: 0.
norm_cfg (dict) – Config dict for normalization layer. Default: dict(type=’BN’).
use_nonlinear (bool) – Whether add a non-linear conv layer in the index block. Default: False.
-
class
mmedit.models.backbones.
IndexNetDecoder
(in_channels, kernel_size=5, norm_cfg={'type': 'BN'}, separable_conv=False)[source]¶
-
class
mmedit.models.backbones.
IndexNetEncoder
(in_channels, out_stride=32, width_mult=1, index_mode='m2o', aspp=True, norm_cfg={'type': 'BN'}, freeze_bn=False, use_nonlinear=True, use_context=True)[source]¶ Encoder for IndexNet.
Please refer to https://arxiv.org/abs/1908.00672.
- Parameters
in_channels (int, optional) – Input channels of the encoder.
out_stride (int, optional) – Output stride of the encoder. For example, if out_stride is 32, the input feature map or image will be downsample to the 1/32 of original size. Defaults to 32.
width_mult (int, optional) – Width multiplication factor of channel dimension in MobileNetV2. Defaults to 1.
index_mode (str, optional) – Index mode of the index network. It must be one of {‘holistic’, ‘o2o’, ‘m2o’}. If it is set to ‘holistic’, then Holistic index network will be used as the index network. If it is set to ‘o2o’ (or ‘m2o’), when O2O (or M2O) Depthwise index network will be used as the index network. Defaults to ‘m2o’.
aspp (bool, optional) – Whether use ASPP module to augment output feature. Defaults to True.
norm_cfg (None | dict, optional) – Config dict for normalization layer. Defaults to dict(type=’BN’).
freeze_bn (bool, optional) – Whether freeze batch norm layer. Defaults to False.
use_nonlinear (bool, optional) – Whether use nonlinearty in index network. Refer to the paper for more information. Defaults to True.
use_context (bool, optional) – Whether use larger kernel size in index network. Refer to the paper for more information. Defaults to True.
- Raises
ValueError – out_stride must 16 or 32.
NameError – Supported index_mode are {‘holistic’, ‘o2o’, ‘m2o’}.
-
class
mmedit.models.backbones.
IndexedUpsample
(in_channels, out_channels, kernel_size=5, norm_cfg={'type': 'BN'}, conv_module=<class 'mmcv.cnn.bricks.conv_module.ConvModule'>)[source]¶ Indexed upsample module.
- Parameters
in_channels (int) – Input channels.
out_channels (int) – Output channels.
kernel_size (int, optional) – Kernel size of the convolution layer. Defaults to 5.
norm_cfg (dict, optional) – Config dict for normalization layer. Defaults to dict(type=’BN’).
conv_module (ConvModule | DepthwiseSeparableConvModule, optional) – Conv module. Defaults to ConvModule.
-
forward
(x, shortcut, dec_idx_feat=None)[source]¶ Forward function.
- Parameters
x (Tensor) – Input feature map with shape (N, C, H, W).
shortcut (Tensor) – The shortcut connection with shape (N, C, H’, W’).
dec_idx_feat (Tensor, optional) – The decode index feature map with shape (N, C, H’, W’). Defaults to None.
- Returns
Output tensor with shape (N, C, H’, W’).
- Return type
Tensor
-
class
mmedit.models.backbones.
MSRResNet
(in_channels, out_channels, mid_channels=64, num_blocks=16, upscale_factor=4)[source]¶ Modified SRResNet.
A compacted version modified from SRResNet in “Photo-Realistic Single Image Super-Resolution Using a Generative Adversarial Network”.
It uses residual blocks without BN, similar to EDSR. Currently, it supports x2, x3 and x4 upsampling scale factor.
- Parameters
in_channels (int) – Channel number of inputs.
out_channels (int) – Channel number of outputs.
mid_channels (int) – Channel number of intermediate features. Default: 64.
num_blocks (int) – Block number in the trunk network. Default: 16.
upscale_factor (int) – Upsampling factor. Support x2, x3 and x4. Default: 4.
-
forward
(x)[source]¶ Forward function.
- Parameters
x (Tensor) – Input tensor with shape (n, c, h, w).
- Returns
Forward results.
- Return type
Tensor
-
init_weights
(pretrained=None, strict=True)[source]¶ Init weights for models.
- Parameters
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
strict (boo, optional) – Whether strictly load the pretrained model. Defaults to True.
-
class
mmedit.models.backbones.
PConvDecoder
(num_layers=7, interpolation='nearest', conv_cfg={'multi_channel': True, 'type': 'PConv'}, norm_cfg={'type': 'BN'})[source]¶ Decoder with partial conv.
About the details for this architecture, pls see: Image Inpainting for Irregular Holes Using Partial Convolutions
- Parameters
num_layers (int) – The number of convolutional layers. Default: 7.
interpolation (str) – The upsample mode. Default: ‘nearest’.
conv_cfg (dict) – Config for convolution module. Default: {‘type’: ‘PConv’, ‘multi_channel’: True}.
norm_cfg (dict) – Config for norm layer. Default: {‘type’: ‘BN’}.
-
class
mmedit.models.backbones.
PConvEncoder
(in_channels=3, num_layers=7, conv_cfg={'multi_channel': True, 'type': 'PConv'}, norm_cfg={'requires_grad': True, 'type': 'BN'}, norm_eval=False)[source]¶ Encoder with partial conv.
About the details for this architecture, pls see: Image Inpainting for Irregular Holes Using Partial Convolutions
- Parameters
in_channels (int) – The number of input channels. Default: 3.
num_layers (int) – The number of convolutional layers. Default 7.
conv_cfg (dict) – Config for convolution module. Default: {‘type’: ‘PConv’, ‘multi_channel’: True}.
norm_cfg (dict) – Config for norm layer. Default: {‘type’: ‘BN’}.
norm_eval (bool) – Whether to set norm layers to eval mode, namely, freeze running stats (mean and var). Note: Effective on Batch Norm and its variants only.
-
forward
(x, mask)[source]¶ Forward function for partial conv encoder.
- Parameters
x (torch.Tensor) – Masked image with shape (n, c, h, w).
mask (torch.Tensor) – Mask tensor with shape (n, c, h, w).
- Returns
Contains the results and middle level features in this module. hidden_feats contain the middle feature maps and hidden_masks store updated masks.
- Return type
dict
-
train
(mode=True)[source]¶ Sets the module in training mode.
This has any effect only on certain modules. See documentations of particular modules for details of their behaviors in training/evaluation mode, if they are affected, e.g.
Dropout
,BatchNorm
, etc.- Parameters
mode (bool) – whether to set training mode (
True
) or evaluation mode (False
). Default:True
.- Returns
self
- Return type
Module
-
class
mmedit.models.backbones.
PConvEncoderDecoder
(encoder, decoder)[source]¶ Encoder-Decoder with partial conv module.
- Parameters
encoder (dict) – Config of the encoder.
decoder (dict) – Config of the decoder.
-
class
mmedit.models.backbones.
PlainDecoder
(in_channels)[source]¶ Simple decoder from Deep Image Matting.
- Parameters
in_channels (int) – Channel num of input features.
-
forward
(inputs)[source]¶ Forward function of PlainDecoder.
- Parameters
inputs (dict) –
Output dictionary of the VGG encoder containing:
out (Tensor): Output of the VGG encoder.
max_idx_1 (Tensor): Index of the first maxpooling layer in the VGG encoder.
max_idx_2 (Tensor): Index of the second maxpooling layer in the VGG encoder.
max_idx_3 (Tensor): Index of the third maxpooling layer in the VGG encoder.
max_idx_4 (Tensor): Index of the fourth maxpooling layer in the VGG encoder.
max_idx_5 (Tensor): Index of the fifth maxpooling layer in the VGG encoder.
- Returns
Output tensor.
- Return type
Tensor
-
class
mmedit.models.backbones.
RRDBNet
(in_channels, out_channels, mid_channels=64, num_blocks=23, growth_channels=32)[source]¶ Networks consisting of Residual in Residual Dense Block, which is used in ESRGAN.
ESRGAN: Enhanced Super-Resolution Generative Adversarial Networks. Currently, it supports x4 upsampling scale factor.
- Parameters
in_channels (int) – Channel number of inputs.
out_channels (int) – Channel number of outputs.
mid_channels (int) – Channel number of intermediate features. Default: 64
num_blocks (int) – Block number in the trunk network. Defaults: 23
growth_channels (int) – Channels for each growth. Default: 32.
-
forward
(x)[source]¶ Forward function.
- Parameters
x (Tensor) – Input tensor with shape (n, c, h, w).
- Returns
Forward results.
- Return type
Tensor
-
init_weights
(pretrained=None, strict=True)[source]¶ Init weights for models.
- Parameters
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
strict (boo, optional) – Whether strictly load the pretrained model. Defaults to True.
-
class
mmedit.models.backbones.
ResGCADecoder
(block, layers, in_channels, kernel_size=3, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'inplace': True, 'negative_slope': 0.2, 'type': 'LeakyReLU'}, with_spectral_norm=False, late_downsample=False)[source]¶ ResNet decoder with shortcut connection and gca module.
feat1 ---------------------------------------- conv2 --- out | feat2 ----------------------------------- conv1 | feat3 ------------------------------ layer4 | feat4, img_feat -- gca_module - layer3 | feat5 ------- layer2 | out --- layer1
gca module also requires unknown tensor generated by trimap which is ignored in the above graph.
- Parameters
block (str) – Type of residual block. Currently only BasicBlockDec is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Channel number of input features.
kernel_size (int) – Kernel size of the conv layers in the decoder.
conv_cfg (dict) – Dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
late_downsample (bool) – Whether to adopt late downsample strategy, Default: False.
-
forward
(inputs)[source]¶ Forward function of resnet shortcut decoder.
- Parameters
inputs (dict) –
Output dictionary of the ResGCAEncoder containing:
out (Tensor): Output of the ResGCAEncoder.
feat1 (Tensor): Shortcut connection from input image.
feat2 (Tensor): Shortcut connection from conv2 of ResGCAEncoder.
feat3 (Tensor): Shortcut connection from layer1 of ResGCAEncoder.
feat4 (Tensor): Shortcut connection from layer2 of ResGCAEncoder.
feat5 (Tensor): Shortcut connection from layer3 of ResGCAEncoder.
img_feat (Tensor): Image feature extracted by guidance head.
unknown (Tensor): Unknown tensor generated by trimap.
- Returns
Output tensor.
- Return type
Tensor
-
class
mmedit.models.backbones.
ResGCAEncoder
(block, layers, in_channels, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, with_spectral_norm=False, late_downsample=False, order=('conv', 'act', 'norm'))[source]¶ ResNet backbone with shortcut connection and gca module.
image ---------------- shortcut[0] -------------- feat1 | conv1-conv2 ---------- shortcut[1] -------------- feat2 | conv3-layer1 ---- shortcut[2] -------------- feat3 | | image - guidance_conv ------------ img_feat | | layer2 --- gca_module - shortcut[4] - feat4 | layer3 -- shortcut[5] - feat5 | layer4 --------------- out
gca module also requires unknown tensor generated by trimap which is ignored in the above graph.
Implementation of Natural Image Matting via Guided Contextual Attention https://arxiv.org/pdf/2001.04069.pdf.
- Parameters
block (str) – Type of residual block. Currently only BasicBlock is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Number of input channels.
conv_cfg (dict) – Dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
late_downsample (bool) – Whether to adopt late downsample strategy. Default: False.
order (tuple[str]) – Order of conv, norm and act layer in shortcut convolution module. Default: (‘conv’, ‘act’, ‘norm’).
-
class
mmedit.models.backbones.
ResNetDec
(block, layers, in_channels, kernel_size=3, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'inplace': True, 'negative_slope': 0.2, 'type': 'LeakyReLU'}, with_spectral_norm=False, late_downsample=False)[source]¶ ResNet decoder for image matting.
This class is adopted from https://github.com/Yaoyi-Li/GCA-Matting.
- Parameters
block (str) – Type of residual block. Currently only BasicBlockDec is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Channel num of input features.
kernel_size (int) – Kernel size of the conv layers in the decoder.
conv_cfg (dict) – dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
with_spectral_norm (bool) – Whether use spectral norm after conv. Default: False.
late_downsample (bool) – Whether to adopt late downsample strategy, Default: False.
-
class
mmedit.models.backbones.
ResNetEnc
(block, layers, in_channels, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, with_spectral_norm=False, late_downsample=False)[source]¶ ResNet encoder for image matting.
This class is adopted from https://github.com/Yaoyi-Li/GCA-Matting. Implement and pre-train on ImageNet with the tricks from https://arxiv.org/abs/1812.01187 without the mix-up part.
- Parameters
block (str) – Type of residual block. Currently only BasicBlock is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Number of input channels.
conv_cfg (dict) – dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
with_spectral_norm (bool) – Whether use spectral norm after conv. Default: False.
late_downsample (bool) – Whether to adopt late downsample strategy, Default: False.
-
class
mmedit.models.backbones.
ResShortcutDec
(block, layers, in_channels, kernel_size=3, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'inplace': True, 'negative_slope': 0.2, 'type': 'LeakyReLU'}, with_spectral_norm=False, late_downsample=False)[source]¶ ResNet decoder for image matting with shortcut connection.
feat1 --------------------------- conv2 --- out | feat2 ---------------------- conv1 | feat3 ----------------- layer4 | feat4 ------------ layer3 | feat5 ------- layer2 | out --- layer1
- Parameters
block (str) – Type of residual block. Currently only BasicBlockDec is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Channel number of input features.
kernel_size (int) – Kernel size of the conv layers in the decoder.
conv_cfg (dict) – Dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
late_downsample (bool) – Whether to adopt late downsample strategy, Default: False.
-
forward
(inputs)[source]¶ Forward function of resnet shortcut decoder.
- Parameters
inputs (dict) –
Output dictionary of the ResNetEnc containing:
out (Tensor): Output of the ResNetEnc.
feat1 (Tensor): Shortcut connection from input image.
feat2 (Tensor): Shortcut connection from conv2 of ResNetEnc.
feat3 (Tensor): Shortcut connection from layer1 of ResNetEnc.
feat4 (Tensor): Shortcut connection from layer2 of ResNetEnc.
feat5 (Tensor): Shortcut connection from layer3 of ResNetEnc.
- Returns
Output tensor.
- Return type
Tensor
-
class
mmedit.models.backbones.
ResShortcutEnc
(block, layers, in_channels, conv_cfg=None, norm_cfg={'type': 'BN'}, act_cfg={'type': 'ReLU'}, with_spectral_norm=False, late_downsample=False, order=('conv', 'act', 'norm'))[source]¶ ResNet backbone for image matting with shortcut connection.
image ---------------- shortcut[0] --- feat1 | conv1-conv2 ---------- shortcut[1] --- feat2 | conv3-layer1 --- shortcut[2] --- feat3 | layer2 -- shortcut[4] --- feat4 | layer3 - shortcut[5] --- feat5 | layer4 ---------------- out
Baseline model of Natural Image Matting via Guided Contextual Attention https://arxiv.org/pdf/2001.04069.pdf.
- Parameters
block (str) – Type of residual block. Currently only BasicBlock is implemented.
layers (list[int]) – Number of layers in each block.
in_channels (int) – Number of input channels.
conv_cfg (dict) – Dictionary to construct convolution layer. If it is None, 2d convolution will be applied. Default: None.
norm_cfg (dict) – Config dict for normalization layer. “BN” by default.
act_cfg (dict) – Config dict for activation layer, “ReLU” by default.
with_spectral_norm (bool) – Whether use spectral norm after conv. Default: False.
late_downsample (bool) – Whether to adopt late downsample strategy. Default: False.
order (tuple[str]) – Order of conv, norm and act layer in shortcut convolution module. Default: (‘conv’, ‘act’, ‘norm’).
-
class
mmedit.models.backbones.
ResnetGenerator
(in_channels, out_channels, base_channels=64, norm_cfg={'type': 'IN'}, use_dropout=False, num_blocks=9, padding_mode='reflect', init_cfg={'gain': 0.02, 'type': 'normal'})[source]¶ Construct a Resnet-based generator that consists of residual blocks between a few downsampling/upsampling operations.
- Parameters
in_channels (int) – Number of channels in input images.
out_channels (int) – Number of channels in output images.
base_channels (int) – Number of filters at the last conv layer. Default: 64.
norm_cfg (dict) – Config dict to build norm layer. Default: dict(type=’IN’).
use_dropout (bool) – Whether to use dropout layers. Default: False.
num_blocks (int) – Number of residual blocks. Default: 9.
padding_mode (str) – The name of padding layer in conv layers: ‘reflect’ | ‘replicate’ | ‘zeros’. Default: ‘reflect’.
init_cfg (dict) – Config dict for initialization. type: The name of our initialization method. Default: ‘normal’. gain: Scaling factor for normal, xavier and orthogonal. Default: 0.02.
-
forward
(x)[source]¶ Forward function.
- Parameters
x (Tensor) – Input tensor with shape (n, c, h, w).
- Returns
Forward results.
- Return type
Tensor
-
init_weights
(pretrained=None, strict=True)[source]¶ Initialize weights for the model.
- Parameters
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Default: None.
strict (bool, optional) – Whether to allow different params for the model and checkpoint. Default: True.
-
class
mmedit.models.backbones.
SRCNN
(channels=(3, 64, 32, 3), kernel_sizes=(9, 1, 5), upscale_factor=4)[source]¶ SRCNN network structure for image super resolution.
SRCNN has three conv layers. For each layer, we can define the in_channels, out_channels and kernel_size. The input image will first be upsampled with a bicubic upsampler, and then super-resolved in the HR spatial size.
Paper: Learning a Deep Convolutional Network for Image Super-Resolution.
- Parameters
channels (tuple[int]) – A tuple of channel numbers for each layer including channels of input and output . Default: (3, 64, 32, 3).
kernel_sizes (tuple[int]) – A tuple of kernel sizes for each conv layer. Default: (9, 1, 5).
upscale_factor (int) – Upsampling factor. Default: 4.
-
forward
(x)[source]¶ Forward function.
- Parameters
x (Tensor) – Input tensor with shape (n, c, h, w).
- Returns
Forward results.
- Return type
Tensor
-
init_weights
(pretrained=None, strict=True)[source]¶ Init weights for models.
- Parameters
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
strict (boo, optional) – Whether strictly load the pretrained model. Defaults to True.
-
class
mmedit.models.backbones.
SimpleEncoderDecoder
(encoder, decoder)[source]¶ Simple encoder-decoder model from matting.
- Parameters
encoder (dict) – Config of the encoder.
decoder (dict) – Config of the decoder.
-
class
mmedit.models.backbones.
TOFlow
(adapt_official_weights=False)[source]¶ PyTorch implementation of TOFlow.
In TOFlow, the LR frames are pre-upsampled and have the same size with the GT frames.
Paper: Xue et al., Video Enhancement with Task-Oriented Flow, IJCV 2018 Code reference:
- Parameters
adapt_official_weights (bool) – Whether to adapt the weights translated from the official implementation. Set to false if you want to train from scratch. Default: False
-
denormalize
(img)[source]¶ Denormalize the output image.
- Parameters
img (Tensor) – Output image.
- Returns
Denormalized image.
- Return type
Tensor
-
forward
(lrs)[source]¶ - Parameters
lrs – Input lr frames: (b, 7, 3, h, w).
- Returns
SR frame: (b, 3, h, w).
- Return type
Tensor
-
init_weights
(pretrained=None, strict=True)[source]¶ Init weights for models.
- Parameters
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
strict (boo, optional) – Whether strictly load the pretrained model. Defaults to True.
-
class
mmedit.models.backbones.
UnetGenerator
(in_channels, out_channels, num_down=8, base_channels=64, norm_cfg={'type': 'BN'}, use_dropout=False, init_cfg={'gain': 0.02, 'type': 'normal'})[source]¶ Construct the Unet-based generator from the innermost layer to the outermost layer, which is a recursive process.
- Parameters
in_channels (int) – Number of channels in input images.
out_channels (int) – Number of channels in output images.
num_down (int) – Number of downsamplings in Unet. If num_down is 8, the image with size 256x256 will become 1x1 at the bottleneck. Default: 8.
base_channels (int) – Number of channels at the last conv layer. Default: 64.
norm_cfg (dict) – Config dict to build norm layer. Default: dict(type=’BN’).
use_dropout (bool) – Whether to use dropout layers. Default: False.
init_cfg (dict) – Config dict for initialization. type: The name of our initialization method. Default: ‘normal’. gain: Scaling factor for normal, xavier and orthogonal. Default: 0.02.
-
forward
(x)[source]¶ Forward function.
- Parameters
x (Tensor) – Input tensor with shape (n, c, h, w).
- Returns
Forward results.
- Return type
Tensor
-
init_weights
(pretrained=None, strict=True)[source]¶ Initialize weights for the model.
- Parameters
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Default: None.
strict (bool, optional) – Whether to allow different params for the model and checkpoint. Default: True.
-
class
mmedit.models.backbones.
VGG16
(in_channels, batch_norm=False, aspp=False, dilations=None)[source]¶ Customed VGG16 Encoder.
A 1x1 conv is added after the original VGG16 conv layers. The indices of max pooling layers are returned for unpooling layers in decoders.
- Parameters
in_channels (int) – Number of input channels.
batch_norm (bool, optional) – Whether use
nn.BatchNorm2d
. Default to False.aspp (bool, optional) – Whether use ASPP module after the last conv layer. Default to False.
dilations (list[int], optional) – Atrous rates of ASPP module. Default to None.
components¶
-
class
mmedit.models.components.
DeepFillRefiner
(encoder_attention={'encoder_type': 'stage2_attention', 'type': 'DeepFillEncoder'}, encoder_conv={'encoder_type': 'stage2_conv', 'type': 'DeepFillEncoder'}, dilation_neck={'act_cfg': {'type': 'ELU'}, 'in_channels': 128, 'type': 'GLDilationNeck'}, contextual_attention={'in_channels': 128, 'type': 'ContextualAttentionNeck'}, decoder={'in_channels': 256, 'type': 'DeepFillDecoder'})[source]¶ Refiner used in DeepFill model.
This implementation follows: Generative Image Inpainting with Contextual Attention.
- Parameters
encoder_attention (dict) – Config dict for encoder used in branch with contextual attention module.
encoder_conv (dict) – Config dict for encoder used in branch with just convolutional operation.
dilation_neck (dict) – Config dict for dilation neck in branch with just convolutional operation.
contextual_attention (dict) – Config dict for contextual attention neck.
decoder (dict) – Config dict for decoder used to fuse and decode features.
-
class
mmedit.models.components.
DeepFillv1Discriminators
(global_disc_cfg, local_disc_cfg)[source]¶ Discriminators used in DeepFillv1 model.
In DeepFillv1 model, the discriminators are independent without any concatenation like Global&Local model. Thus, we call this model DeepFillv1Discriminators. There exist a global discriminator and a local discriminator with global and local input respectively.
The details can be found in: Generative Image Inpainting with Contextual Attention.
- Parameters
global_disc_cfg (dict) – Config dict for global discriminator.
local_disc_cfg (dict) – Config dict for local discriminator.
-
class
mmedit.models.components.
GLDiscs
(global_disc_cfg, local_disc_cfg)[source]¶ Discriminators in Global&Local
This discriminator contains a local discriminator and a global discriminator as described in the original paper: Globally and locally Consistent Image Completion
- Parameters
global_disc_cfg (dict) – Config dict to build global discriminator.
local_disc_cfg (dict) – Config dict to build local discriminator.
-
class
mmedit.models.components.
ModifiedVGG
(in_channels, mid_channels)[source]¶ A modified VGG discriminator with input size 128 x 128.
It is used to train SRGAN and ESRGAN.
- Parameters
in_channels (int) – Channel number of inputs. Default: 3.
mid_channels (int) – Channel number of base intermediate features. Default: 64.
-
forward
(x)[source]¶ Forward function.
- Parameters
x (Tensor) – Input tensor with shape (n, c, h, w).
- Returns
Forward results.
- Return type
Tensor
-
init_weights
(pretrained=None, strict=True)[source]¶ Init weights for models.
- Parameters
pretrained (str, optional) – Path for pretrained weights. If given None, pretrained weights will not be loaded. Defaults to None.
strict (boo, optional) – Whether strictly load the pretrained model. Defaults to True.
-
class
mmedit.models.components.
MultiLayerDiscriminator
(in_channels, max_channels, num_convs=5, fc_in_channels=None, fc_out_channels=1024, kernel_size=5, conv_cfg=None, norm_cfg=None, act_cfg={'type': 'ReLU'}, out_act_cfg={'type': 'ReLU'}, with_input_norm=True, with_out_convs=False, with_spectral_norm=False, **kwargs)[source]¶ Multilayer Discriminator.
This is a commonly used structure with stacked multiply convolution layers.
- Parameters
in_channels (int) – Input channel of the first input convolution.
max_channels (int) – The maximum channel number in this structure.
num_conv (int) – Number of stacked intermediate convs (including input conv but excluding output conv).
fc_in_channels (int | None) – Input dimension of the fully connected layer. If fc_in_channels is None, the fully connected layer will be removed.
fc_out_channels (int) – Output dimension of the fully connected layer.
kernel_size (int) – Kernel size of the conv modules. Default to 5.
conv_cfg (dict) – Config dict to build conv layer.
norm_cfg (dict) – Config dict to build norm layer.
act_cfg (dict) – Config dict for activation layer, “relu” by default.
out_act_cfg (dict) – Config dict for output activation, “relu” by default.
with_input_norm (bool) – Whether add normalization after the input conv. Default to True.
with_out_convs (bool) – Whether add output convs to the discriminator. The output convs contain two convs. The first out conv has the same setting as the intermediate convs but a stride of 1 instead of 2. The second out conv is a conv similar to the first out conv but reduces the number of channels to 1 and has no activation layer. Default to False.
with_spectral_norm (bool) – Whether use spectral norm after the conv layers. Default to False.
kwargs (keyword arguments) –
-
class
mmedit.models.components.
PatchDiscriminator
(in_channels, base_channels=64, num_conv=3, norm_cfg={'type': 'BN'}, init_cfg={'gain': 0.02, 'type': 'normal'})[source]¶ A PatchGAN discriminator.
- Parameters
in_channels (int) – Number of channels in input images.
base_channels (int) – Number of channels at the first conv layer. Default: 64.
num_conv (int) – Number of stacked intermediate convs (excluding input and output conv). Default: 3.
norm_cfg (dict) – Config dict to build norm layer. Default: dict(type=’BN’).
init_cfg (dict) – Config dict for initialization. type: The name of our initialization method. Default: ‘normal’. gain: Scaling factor for normal, xavier and orthogonal. Default: 0.02.
-
class
mmedit.models.components.
PlainRefiner
(conv_channels=64, pretrained=None)[source]¶ Simple refiner from Deep Image Matting.
- Parameters
conv_channels (int) – Number of channels produced by the three main convolutional layer.
loss_refine (dict) – Config of the loss of the refiner. Default: None.
pretrained (str) – Name of pretrained model. Default: None.
losses¶
-
class
mmedit.models.losses.
CharbonnierCompLoss
(loss_weight=1.0, reduction='mean', sample_wise=False, eps=1e-12)[source]¶ Charbonnier composition loss.
- Parameters
loss_weight (float) – Loss weight for L1 loss. Default: 1.0.
reduction (str) – Specifies the reduction to apply to the output. Supported choices are ‘none’ | ‘mean’ | ‘sum’. Default: ‘mean’.
sample_wise (bool) – Whether calculate the loss sample-wise. This argument only takes effect when reduction is ‘mean’ and weight (argument of forward()) is not None. It will first reduces loss with ‘mean’ per-sample, and then it means over all the samples. Default: False.
eps (float) – A value used to control the curvature near zero. Default: 1e-12.
-
forward
(pred_alpha, fg, bg, ori_merged, weight=None, **kwargs)[source]¶ - Parameters
pred_alpha (Tensor) – of shape (N, 1, H, W). Predicted alpha matte.
fg (Tensor) – of shape (N, 3, H, W). Tensor of foreground object.
bg (Tensor) – of shape (N, 3, H, W). Tensor of background object.
ori_merged (Tensor) – of shape (N, 3, H, W). Tensor of origin merged image before normalized by ImageNet mean and std.
weight (Tensor, optional) – of shape (N, 1, H, W). It is an indicating matrix: weight[trimap == 128] = 1. Default: None.
-
class
mmedit.models.losses.
CharbonnierLoss
(loss_weight=1.0, reduction='mean', sample_wise=False, eps=1e-12)[source]¶ Charbonnier loss (one variant of Robust L1Loss, a differentiable variant of L1Loss).
- Described in “Deep Laplacian Pyramid Networks for Fast and Accurate
Super-Resolution”.
- Parameters
loss_weight (float) – Loss weight for L1 loss. Default: 1.0.
reduction (str) – Specifies the reduction to apply to the output. Supported choices are ‘none’ | ‘mean’ | ‘sum’. Default: ‘mean’.
sample_wise (bool) – Whether calculate the loss sample-wise. This argument only takes effect when reduction is ‘mean’ and weight (argument of forward()) is not None. It will first reduces loss with ‘mean’ per-sample, and then it means over all the samples. Default: False.
eps (float) – A value used to control the curvature near zero. Default: 1e-12.
-
class
mmedit.models.losses.
DiscShiftLoss
(loss_weight=0.1)[source]¶ Disc shift loss.
- Parameters
loss_weight (float, optional) – Loss weight. Defaults to 1.0.
-
class
mmedit.models.losses.
GANLoss
(gan_type, real_label_val=1.0, fake_label_val=0.0, loss_weight=1.0)[source]¶ Define GAN loss.
- Parameters
gan_type (str) – Support ‘vanilla’, ‘lsgan’, ‘wgan’, ‘hinge’.
real_label_val (float) – The value for real label. Default: 1.0.
fake_label_val (float) – The value for fake label. Default: 0.0.
loss_weight (float) – Loss weight. Default: 1.0. Note that loss_weight is only for generators; and it is always 1.0 for discriminators.
-
forward
(input, target_is_real, is_disc=False)[source]¶ - Parameters
input (Tensor) – The input for the loss module, i.e., the network prediction.
target_is_real (bool) – Whether the targe is real or fake.
is_disc (bool) – Whether the loss for discriminators or not. Default: False.
- Returns
GAN loss value.
- Return type
Tensor
-
class
mmedit.models.losses.
GradientLoss
(loss_weight=1.0, reduction='mean')[source]¶ Gradient loss.
- Parameters
loss_weight (float) – Loss weight for L1 loss. Default: 1.0.
reduction (str) – Specifies the reduction to apply to the output. Supported choices are ‘none’ | ‘mean’ | ‘sum’. Default: ‘mean’.
-
class
mmedit.models.losses.
GradientPenaltyLoss
(loss_weight=1.0)[source]¶ Gradient penalty loss for wgan-gp.
- Parameters
loss_weight (float) – Loss weight. Default: 1.0.
-
forward
(discriminator, real_data, fake_data, mask=None)[source]¶ Forward function.
- Parameters
discriminator (nn.Module) – Network for the discriminator.
real_data (Tensor) – Real input data.
fake_data (Tensor) – Fake input data.
mask (Tensor) – Masks for inpaitting. Default: None.
- Returns
Loss.
- Return type
Tensor
-
class
mmedit.models.losses.
L1CompositionLoss
(loss_weight=1.0, reduction='mean', sample_wise=False)[source]¶ L1 composition loss.
- Parameters
loss_weight (float) – Loss weight for L1 loss. Default: 1.0.
reduction (str) – Specifies the reduction to apply to the output. Supported choices are ‘none’ | ‘mean’ | ‘sum’. Default: ‘mean’.
sample_wise (bool) – Whether calculate the loss sample-wise. This argument only takes effect when reduction is ‘mean’ and weight (argument of forward()) is not None. It will first reduces loss with ‘mean’ per-sample, and then it means over all the samples. Default: False.
-
forward
(pred_alpha, fg, bg, ori_merged, weight=None, **kwargs)[source]¶ - Parameters
pred_alpha (Tensor) – of shape (N, 1, H, W). Predicted alpha matte.
fg (Tensor) – of shape (N, 3, H, W). Tensor of foreground object.
bg (Tensor) – of shape (N, 3, H, W). Tensor of background object.
ori_merged (Tensor) – of shape (N, 3, H, W). Tensor of origin merged image before normalized by ImageNet mean and std.
weight (Tensor, optional) – of shape (N, 1, H, W). It is an indicating matrix: weight[trimap == 128] = 1. Default: None.
-
class
mmedit.models.losses.
L1Loss
(loss_weight=1.0, reduction='mean', sample_wise=False)[source]¶ L1 (mean absolute error, MAE) loss.
- Parameters
loss_weight (float) – Loss weight for L1 loss. Default: 1.0.
reduction (str) – Specifies the reduction to apply to the output. Supported choices are ‘none’ | ‘mean’ | ‘sum’. Default: ‘mean’.
sample_wise (bool) – Whether calculate the loss sample-wise. This argument only takes effect when reduction is ‘mean’ and weight (argument of forward()) is not None. It will first reduce loss with ‘mean’ per-sample, and then it means over all the samples. Default: False.
-
class
mmedit.models.losses.
MSECompositionLoss
(loss_weight=1.0, reduction='mean', sample_wise=False)[source]¶ MSE (L2) composition loss.
- Parameters
loss_weight (float) – Loss weight for MSE loss. Default: 1.0.
reduction (str) – Specifies the reduction to apply to the output. Supported choices are ‘none’ | ‘mean’ | ‘sum’. Default: ‘mean’.
sample_wise (bool) – Whether calculate the loss sample-wise. This argument only takes effect when reduction is ‘mean’ and weight (argument of forward()) is not None. It will first reduces loss with ‘mean’ per-sample, and then it means over all the samples. Default: False.
-
forward
(pred_alpha, fg, bg, ori_merged, weight=None, **kwargs)[source]¶ - Parameters
pred_alpha (Tensor) – of shape (N, 1, H, W). Predicted alpha matte.
fg (Tensor) – of shape (N, 3, H, W). Tensor of foreground object.
bg (Tensor) – of shape (N, 3, H, W). Tensor of background object.
ori_merged (Tensor) – of shape (N, 3, H, W). Tensor of origin merged image before normalized by ImageNet mean and std.
weight (Tensor, optional) – of shape (N, 1, H, W). It is an indicating matrix: weight[trimap == 128] = 1. Default: None.
-
class
mmedit.models.losses.
MSELoss
(loss_weight=1.0, reduction='mean', sample_wise=False)[source]¶ MSE (L2) loss.
- Parameters
loss_weight (float) – Loss weight for MSE loss. Default: 1.0.
reduction (str) – Specifies the reduction to apply to the output. Supported choices are ‘none’ | ‘mean’ | ‘sum’. Default: ‘mean’.
sample_wise (bool) – Whether calculate the loss sample-wise. This argument only takes effect when reduction is ‘mean’ and weight (argument of forward()) is not None. It will first reduces loss with ‘mean’ per-sample, and then it means over all the samples. Default: False.
-
class
mmedit.models.losses.
MaskedTVLoss
(loss_weight=1.0)[source]¶ Masked TV loss.
- Parameters
loss_weight (float, optional) – Loss weight. Defaults to 1.0.
-
class
mmedit.models.losses.
PerceptualLoss
(layer_weights, vgg_type='vgg19', use_input_norm=True, perceptual_weight=1.0, style_weight=1.0, norm_img=True, pretrained='torchvision://vgg19', criterion='l1')[source]¶ Perceptual loss with commonly used style loss.
- Parameters
layers_weights (dict) – The weight for each layer of vgg feature. Here is an example: {‘4’: 1., ‘9’: 1., ‘18’: 1.}, which means the 5th, 10th and 18th feature layer will be extracted with weight 1.0 in calculting losses.
vgg_type (str) – The type of vgg network used as feature extractor. Default: ‘vgg19’.
use_input_norm (bool) – If True, normalize the input image in vgg. Default: True.
perceptual_weight (float) – If perceptual_weight > 0, the perceptual loss will be calculated and the loss will multiplied by the weight. Default: 1.0.
style_weight (float) – If style_weight > 0, the style loss will be calculated and the loss will multiplied by the weight. Default: 1.0.
norm_img (bool) – If True, the image will be normed to [0, 1]. Note that this is different from the use_input_norm which norm the input in in forward function of vgg according to the statistics of dataset. Importantly, the input image must be in range [-1, 1].
pretrained (str) – Path for pretrained weights. Default: ‘torchvision://vgg19’
-
class
mmedit.models.losses.
PerceptualVGG
(layer_name_list, vgg_type='vgg19', use_input_norm=True, pretrained='torchvision://vgg19')[source]¶ VGG network used in calculating perceptual loss.
In this implementation, we allow users to choose whether use normalization in the input feature and the type of vgg network. Note that the pretrained path must fit the vgg type.
- Parameters
layer_name_list (list[str]) – According to the name in this list, forward function will return the corresponding features. This list contains the name each layer in vgg.feature. An example of this list is [‘4’, ‘10’].
vgg_type (str) – Set the type of vgg network. Default: ‘vgg19’.
use_input_norm (bool) – If True, normalize the input image. Importantly, the input feature must in the range [0, 1]. Default: True.
pretrained (str) – Path for pretrained weights. Default: ‘torchvision://vgg19’
-
mmedit.models.losses.
mask_reduce_loss
(loss, weight=None, reduction='mean', sample_wise=False)[source]¶ Apply element-wise weight and reduce loss.
- Parameters
loss (Tensor) – Element-wise loss.
weight (Tensor) – Element-wise weights. Default: None.
reduction (str) – Same as built-in losses of PyTorch. Options are “none”, “mean” and “sum”. Default: ‘mean’.
sample_wise (bool) – Whether calculate the loss sample-wise. This argument only takes effect when reduction is ‘mean’ and weight (argument of forward()) is not None. It will first reduces loss with ‘mean’ per-sample, and then it means over all the samples. Default: False.
- Returns
Processed loss values.
- Return type
Tensor
mmedit.utils¶
-
mmedit.utils.
get_root_logger
(log_file=None, log_level=20)[source]¶ Get the root logger.
The logger will be initialized if it has not been initialized. By default a StreamHandler will be added. If log_file is specified, a FileHandler will also be added. The name of the root logger is the top-level package name, e.g., “mmedit”.
- Parameters
log_file (str | None) – The log filename. If specified, a FileHandler will be added to the root logger.
log_level (int) – The root logger level. Note that only the process of rank 0 is affected, while other processes will set the level to “Error” and be silent most of the time.
- Returns
The root logger.
- Return type
logging.Logger