I Introduction
Xray computed tomography (CT) is one of the most commonly used medical imaging modalities with the benefits of highresolution imaging in a short scan time. However, excessive Xray radiation can potentially increase the incidence of cancer, so the lowdose CT scanning has been extensively studied to minimize the radiation dose to the patients. Unfortunately, various artifacts appear on lowdose CT images, which significantly reduces the diagnostic values.
Recently, deep learning approaches [1, 2, 3, 4, 5, 6] have been proposed for lowdose CT denoising with impressive performance. The majority of these works [1, 3, 2, 4]
are based on supervised learning, where the neural network is trained with paired lowdose CT (LDCT) and standarddose CT (SDCT) images. However, the simultaneous acquisition of images with low and high doses is often difficult, which also leads to increased radiation exposure to the subjects.
Accordingly, unsupervised learning approaches that do not require matched LDCT and SDCT images have become a major focus of research in the CT community
[5, 7, 8, 6]. In particular, the authors in [5, 6] proposed a CycleGAN approach [9] for lowdose CT denoising that trains the networks with unpaired LDCT and SDCT images. To enable such unpaired training, two generators are necessary: one for the forward mapping from LDCT to SDCT, and the other for inverse mapping from SDCT to LDCT. The cycle consistency is then enforced so that an image that goes through the successive application of forward and inverse mapping should revert to the original one. In fact, a recent theoretical study [10] reveals that this CycleGAN architecture emerges as a dual formulation of an optimal transport problem where the statistical distances between the empirical and transported measures in both source and target domains are simultaneously minimized.Although two generators are required for training, only the forward generator is used at the time of inference. Nonetheless, the inverse mapping generator requires a similar number of learnable parameters and memory as the forward mapping, making the CycleGAN architecture inefficient. Furthermore, two generators and two discriminators should be trained simultaneously for convergence, which requires highlevel of skills and knowhow for training. To mitigate this problem, Gu et al. [6] proposed a tunable CycleGAN with adaptive instance normalization (AdaIN) [11]. The main idea is that a single generator can be switched to the forward or inverse generator by simply changing the AdaIN code that is generated by a lightweight AdaIN code generator. However, the architecture still requires two discriminators to distinguish the fake and real samples for LDCT and SDCT domains, the individual complexity of which is still as high as that of a generator.
Therefore, one of the ultimate goals of the CycleGAN study for lowdose CT noise removal would be to eliminate the unnecessary generator and discriminator while still maintaining the optimality of CycleGAN from the point of view of optimal transport. Indeed, one of the most important contributions of this paper is to show that using an invertible generator architecture can automatically satisfy the cycle consistency term and completely remove one of the discriminators without affecting the CycleGAN framework (see Fig. 1).
To meet the invertibility conditions, our generator is implemented using the coupling layers originally proposed for the normalizing flow [12, 13]. Then our generator is trained with just a single discriminator that distinguishes the fake SDCT from the real SDCT images. To make the invertible generator sufficiently expressive for lowdose CT denoising, our network is trained using the wavelet residual domain. Despite the lack of explicit cycle consistency, our algorithm maintains the optimality of CycleGAN and offers stateoftheart noise removal with only 10% of the trainable parameters compared to the conventional CycleGAN. Furthermore, the training time is two times faster. Since there is no explicit cycle consistency, our method is dubbed as cyclefree CycleGAN.
This paper is structured as follows. Section II reviews the existing theory of normalizing flow. Then, Section III explains the mathematical theory behind our cyclefree CycleGAN. Section IV explains the implementation issues, training and analysis details, and our lowdose CT datasets. Experimental results using the various levels of lowdose CT denoising tasks are shown in Section V, which is followed by conclusion in Section VII.
Ii Related works
Iia Normalizing Flow
Our method is inspired by the normalizing flow (NF) or invertible flow [12, 13, 14], so we review them briefly to highlight the similarities and differences from our work. However, the original derivation of NF [15, 12, 16, 13, 14] is difficult to reveal the link to our cyclefree CycleGAN, so here we present a new derivation, which is inspired from VAE [17].
Let and denote the ambient space and latent space, respectively. In classical variational inference, the model distribution is obtained by combining a latent space distribution with a family of conditional distributions , which leads to an interesting lower bound:
(1) 
(2) 
where denotes the Kullback–Leibler (KL) divergence [15]. The lower bound in (1) is often called the evidence lower bound (ELBO) or the variational lower bound [18]. Then, the goal of the variational inference tries to find and the posterior that maximize the lower bound.
Among the various choices of posterior for the ELBO, the following form of the posterior is most often used [17]:
(3) 
where
is zeromean unitvariance Gaussian, and
is the encoder function parameterized by for a given input in addition to noise .For the given encoder in (3), the ELBO loss in (2) can be simplified as [17]:
(4)  
where the first term in (IIA) is obtained from the first term in (2
) that corresponds to the likelihood term. This can be represented as following by assuming the Gaussian distribution:
(5) 
where is the decoder function parameterized by . Furthermore, VAE chooses the following form of the encoder function:
(6) 
where
is the noise standard deviation, which is often called the reparametrization trick
[15]. Then, the normalizing flow further enforces that is an invertible function such that(7) 
Thanks to the invertibility condition in (7), a very interesting phenomenon happens. More specifically, (IIA) can be simplified as follows:
(8) 
which becomes a constant. Therefore, the decoder part is no more necessary in the parameter estimation. Accordingly, the ELBO loss in (
IIA) can be simplified as(9)  
where we have also removed term since this is also a constant. If we further assume the zero mean unit variance Gaussian measure for the latent space , (9) can be further simplified as
(10)  
which is the final loss function for NF.
Now the main technical difficulty of minimizing the loss function in (10) arises from the last term which involves with complicated determinant calculation for huge size matrix. Aside from the invertible network architecture that satisfies (7), normalizing flow therefore focuses on encoder function which is composed of a sequence of transformations:
(11) 
For this encoder function, the change of variable formula leads to
(12) 
where . Note that the complicated determinant computation in (10) can be replaced by the relative easy computation for each step [12].
Iii Main Contribution
Iiia Derivation of cyclefree CycleGAN
Similar to the normalizing flow which is concerned about the conversion between the latent space and the ambient space for image generation, the main goal of CycleGAN is the image transfer between two spaces, say and .
Specifically, for the case of lowdose CT denoising, the target SDCT image space
is equipped with a probability measure
, whereas the LDCT image space is with a probability measure (see Fig. 2(a).) Then, the goal of the CycleGAN is to transport the distribution of the LDCT to the SDCT distribution so that the LDCT distribution can follow the SDCT distribution. It turns out that this is closely related to the optimal transport [19, 20].In particular, the transport from to is performed by the forward operator , so that “pushes forward” the measure in to in the space [19, 20]. On the other hand, the mass transportation from the measure space to another measure space is done by a generator , i.e. the generator pushes forward the measure in to a measure in the target space . Then, the optimal transport map for unsupervised learning can be achieved by simultaneously minimizing the statistical distances between and , and between and .
Although various forms of statistical distance could be used (for example, KL divergence in the case of VAE), in our prior work [10] and its extension [21, 22, 23, 24, 25], we use the Wasserstein metric as the statistical distance. Then, it was shown that the simultaneous statistical distance minimization can be done by solving the following Kantorovich optimal transport problem:
(13) 
where
refers to the set of the joint distributions with the margins
and , and the transportation cost is defined by(14) 
where denotes some weighting parameter. In particular, the role of in (14) was originally studied in the context of CycleGAN [25]. In many inverse problems, additional regularization is often used. For example, one could use the following [23, 24]:
(15) 
where is the regularization parameter and the last term penalizes the variation by the generator. Note that the first two terms in (IIIA) are computed by using both and , whereas the last term is only with respect to . From the optimal transport perspective, this makes a huge differences, since the computation of the last term is trivial whereas the first term requires the dual formulation [23, 24].
One of the most important contributions of our companion paper [10] is to show that the primal formulation of the unsupervised learning in (13) with the transport cost (IIIA) can be represented by:
(16) 
where
where is the hyperparameter, and the cycleconsistency term is given by
(17)  
and
(18)  
where denotes the space of Lipschitz functions with the domain , and
(19) 
To make the paper selfcontained, see Appendix for the detailed derivation.
Similar to the key simplification step (8) in NF, a very interesting thing happens if we use an invertible generator in (7) for the CycleGAN training. The following proposition is our key result.
Proposition 1.
Proof.
First, the invertibility condition in (7) implies that and so that we can easily see that in (17) vanishes. Second, thanks to the invertibility condition in (7), we have
(22) 
where the set is defined by
(23) 
and the last equality comes from that , as for . Furthermore, since is a Lischitz function, we have
where the inequality (a) comes from the Lipschitz condition of and (b) comes from that is a Lischitz function, where by the assumption. Accordingly, is 1Lipschitz function. Therefore, we can obtain the following upper bound
(24) 
by extending the function space from to all 1Lipschitz functions. Next, we will show that the upper bound in (IIIA) is tight. Suppose that is the maximizer for (IIIA). To show that the bound is tight, we need to show the existence of such that
Thanks to the invertibility condition (7), we can always find such that for all . Accordingly,
which achieves the upper bound. Therefore, we have
and . This concludes the proof. ∎
Compared to the NF, our cyclefree CycleGAN has several advantages. First, in NF, the latent space is usually assumed to be Gaussian distribution so that the main focus is an image generation from noises in to the ambient space . In order to apply NF to image translation between to domain, we need to implement two NF networks: one for conversion from to , and the other from to . During this image translation via the latent space, our empirical results shows that the information loss are present due to the restriction to the Gaussian latent variable. On the other hand, in our cyclefree CycleGAN, the space and can be any empirical distributions.
Additionally, our method has very interesting geometric interpretation. By replacing the forward operator with the inverse of invertible generator , two statistical distance minimization problem in the original CycleGAN in Fig. 2(a) can be replaced by the single statistical distance minimization problem as shown in Fig. 2(b).
IiiB Invertible Generator
Various architecture has been proposed to construct invertible neural networks for flowbased generative models [12, 13, 14]. For example, the Nonlinear Independent Component Estimation (NICE) [12] is based on additive coupling layer that leads to volumepreserving invertible mapping. Later, the method is further extended to the affine coupling layer, which increases the expressiveness of the model [13].
However, this architecture imposes some constraints on the functions that the network can represent: for instance, it can only represent volumepreserving mappings. Followup works [13, 14] addressed this limitation by introducing a new reversible transformation. More specifically, the authors in [13] proposed a coupling layer using realvalued nonvolume preserving (Real NVP) transformations. On the other hand, Kingma et al. [14] proposed an invertible 1 1 convolution as a generalization of a permutation operation, which significantly improves the image generation quality of the flowbased generative models.
In the following, we explain specific components of invertible blocks that are used in our method. Specifically, our network architecture is shown in Fig. 3, which is composed of repetition of squeeze/unsqueeze block interleaved with invertible 11 convolution and stable additive coupling layers. The detailed explanation follows.
IiiB1 Squeeze and Unsqueeze operation
Squeeze operation splits input image into four subimages which are arranged along channel direction as shown in Fig. 4. Mathematically, this can be written by
Squeeze operation is essential to build the coupling layer, which becomes evident soon. Unsqueeze operation , denoted by
then rearranges separated channels into one image as an inverse operation of squeeze operation (see Fig. 4). This operation is applied using the output of the coupling layer, so that unsqueezed output maintains the same spatial dimension of the input image .
IiiB2 Invertible 11 convolution
The squeeze operation decomposes the input into four components along the channel dimension. With the resulting fixed channel arrangement, only limited spatial information passes through the neural network. Therefore, random shuffling and reversing the order of channel dimension [12, 13] were proposed. On the other hand, Generative Flow with Invertible 11 Convolutions (Glow) [14] proposed an invertible 11 convolution with an equal number of input and output channels as a generalization of permutation operation with learnable parameters.
Mathematically, 11 convolution can be represented by multiplying a matrix as follows:
(25) 
which is illustrated in Fig. 5. By multiplying a fully populated matrix, the channelwise separated input information can be mixed together so that the subsequent operation can be applied more efficiently. Then, the corresponding inversion operation can be written by [14]
(26) 
if is invertible.
IiiB3 Stable Additive coupling layer
Coupling layer is the essential component that gives invertibility but also provides expressiveness of the neural network. The additive coupling layer in NICE [12]
is based on the even and odd decomposition of the sequence, after which neural networks are applied in an alternating manner.
Tomczak et al [26] further extended the additive coupling layer to general coupling layer where input image can split into four channel blocks, and neural networks are applied at every step. By applying the general invertible transformation, we can handle separated input more efficiently.
Specifically, the stable coupling layer is given by
(27) 
where are neural networks. Then, the block inversion can be readily done by
(28) 
For example, additive operation and its inverse operation are shown as Fig. 6.
IiiB4 Lipschitz constant computation
It is easy to see that the Jacobian of the stable coupling layer has unit determinant [12]. In fact, among the aforementioned modules in the invertible networks, only module that does not have unit determinant is the 11 convolution layer. Specifically, the logdeterminant of the step (25) is determined by that of [14]:
(29) 
Similarly, the Lipschitz constant for the invertible generator can be easily determined by the matrix norm of .
IiiC Wavelet Residual Learning
Unlike the image generation from noises, one of the important observations in the image denoising is that the noisy and clean images share structural similarities. Accordingly, rather than learning all components of the images, the authors in [27, 6] proposed wavelet residual domain learning approach, and we follow the same procedure.
Specifically, as shown in Fig. 7(a), wavelet decomposition separates highfrequency component and lowfrequency components, then by nulling only the lowfrequency (LL) component at the last level decomposition, we can obtain the wavelet residual images that contain highfrequency components. Then, as shown in Fig. 7(b), our network is trained using only highfrequency components. This makes the network handles CT noise components much easier, because most of the CT noises are concentrated in high frequency and the common lowpass images are not processed by neural networks.
Iv Method
Iva Dataset
To verify the denoising performance of our framework, we use two datasets, one for the quantitative analysis and the other for the qualitative analysis. For quantitative analysis, we use paired lowdose and standarddose CT image dataset which was used for study by Kang et al.[3]. Specifically, the data are abdominal CT projection data from the AAPM 2016 Low Dose CT Grand Challenge. For qualitative experiments, we use unpaired 20% dose cardiac multiphase CT scan dataset which was used for study by Kang et al.[5]. The details are as follows.
IvA1 AAPM CT dataset
AAPM CT dataset is a reconstructed CT image dataset from abdominal CT projection data in the AAPM 2016 Low Dose CT Grand Challenge, which was used for study by Kang et al.[3]. Total 10 patients’ data were obtained after approval by the institutional review board of the Mayo Clinic. CT images with the size of were reconstructed using a conventional filtered backprojection algorithm. Poisson noise was inserted into the projection data to make noise level corresponded to 25% of the standarddose. As the lowdose CT image data were simulated based on standarddose CT images, they are paired dataset. For the training, every value of the dataset is converted into Hounsfield unit [HU] and the value lower than 1000 HU is truncated to 1000 HU. Then, we divide the dataset into 4000 to normalize all data values between [1, 1]. To train our network, we use 3839 CT images, while the other 421 images were used to test our network.
IvA2 20% dose Multiphase Cardiac CT scan dataset
The 20% dose cardiac multiphase CT scan dataset was acquired from 50 CT scans of mitral value prolapse patients and 50 CT scans of coronary artery disease patients. The dataset was collected at the University of Ulsan College of Medicine and used for study by Kang et al.[5] and Gu et al.[6]. The detailed information of CT scan protocol is described in previous reports [28, 29]. Electrocardiography (ECG)gated cardiac CT scanning with secondgeneration dualsource CT scanner was performed. For the lowdose CT scan, the tube current is reduced to 20% of the standarddose CT scan. For the training, every value of the dataset is converted into Hounsfield unit [HU] and the value lower than 1024 HU is truncated to 1024 HU. After that, we divide the dataset into 4096 to normalize all data values between [1, 1]. To train our network, we use 4684 CT images, while the other 772 images were used to test our model.
IvB Implementation Details
The invertible generator is constructed based on the flow of the invertible generator with as shown in Fig. 3. To extract the wavelet residual, we use daub3 wavelets, and the level of wavelet decomposition was set to 6 for all datasets.
The architecture of the neural network in the coupling layer (see (IIIB3)) is shown in Fig. 8(a). Basically, the architecture is composed of three convolution layers with spectral normalization[30, 31] followed by the multichannel input singlechannel channel output convolution. The first and last convolution layer use 3
3 kernel with stride of 1 and the second convolution layer uses 1
1 kernel with stride of 1. And the latent feature map channel size is 256. Zeropadding is applied for the first and last convolution layer so that at each stage, the height and width of the feature map are equal to the previous feature map.
The discriminator is constructed based on a PatchGAN architecture [32]. The overall architecture of the discriminator is shown in Fig. 8
(b), which is based on the PatchGAN discriminator composed of four convolution layers than five convolution layers. First two convolution layers use stride of 2, and the rest of the convolution layers use stride of 1. After the first and last convolution layer, we do not apply batch normalization. Except for the last convolution layer, we apply LeakyReLU with a slope of 0.2 after the batch normalization. At the first convolution layer, which does not have batch normalization, LeakyReLU was applied after the convolution layer. The discriminator loss is calculated with the LSGAN loss
[33].For all datasets, the network was trained with for in 19, using ADAM optimizer [34] with , , , and the minibatch size of 1. The learning rate was initialized to
and halved in every 50,000 iterations. We trained network for 150,000 iterations on NVIDIA GeForce RTX 2080 Ti. Also, our code was implemented with Pytorch v1.6.0 and CUDA 10.1.
IvC Quantitative Metrics
For quantitative experiment analysis, we use the peak signaltonoise ratio (PSNR) and the structural similarity index metric (SSIM)
[35]. The PSNR is defined as follows:(30) 
where is the input image, is target image, and is possible maximum pixel value of image .
The SSIM is defined as follows:
(31) 
where is the average of image, is the variance of image, , , , as in the original paper[35].
IvD Comparative Methods
We compared our method with the existing unsupervised LDCT denoising networks [9, 6]. For AAPM dataset, we compared our network performance with the conventional CycleGAN[9] whose the generator is based on Unet[36] architecture. We also compared with AdaINbased tunable CycleGAN [6], which shows stateoftheart performace for LDCT denoising. For unpaired 20% dose CT scan datasets, we compare our method with AdaINbased tunable CycleGAN.
For the training of the conventional CycleGAN, the images are cropped into patches, the learning rate is initialized to
, the network trained for 200 epochs, and the other training settings are set to the same as the proposed network training. In AdaINbased tunable CycleGAN, the same patch size was used, and the learning rate is initialized to
, the network trained for 200 epochs, and the other training settings are the same as the proposed network training. For both comparative methods, we used PatchGAN consisting of five convolution layers for discriminator architecture.V Experimental Results
Va AAPM CT dataset
For the AAPM CT dataset, we first compare the noise reduction performance quantitatively with the conventional CycleGAN and AdaINbased CycleGAN based on PSNR and SSIM. As shown in Table I, our network shows the highest PSNR and comparable SSIM values.
Network  PSNR  SSIM 

LDCT input  30.468  0.695 
Conventional CycleGAN  34.621  0.818 
AdaINbased CycleGAN  34.801  0.824 
Proposed  34.940  0.821 
Conventional CycleGAN  AdaIN CycleGAN[6]  Proposed  

Network  # of Parameters  Network  # of Parameters  Network  # of Parameters 
6,251,392  5,900,865  1,204,320  
6,251,392  274,560      
2,766,209  2,766,209  662,401  
2,766,209  2,766,209      
Total  18,035,202  Total  11,707,843  Total  1,866,721 
Invertible block component  PSNR  SSIM  

Coupling layer  11 conv  
✓  ✗  29.931  0.691 
✗  ✓  30.706  0.704 
✓  ✓  34.940  0.821 
Fig. 9(a) shows representative denoising results by various methods. The resulting images are cropped at to more accurately visualize the denoising performance. The intensity of the CT images shown is (1000, 1000) [HU] and the difference is (200, 200) [HU]. Our CycleGAN with an invertible generator removes more noise components than the AdaINbased CycleGAN method without losing any information. As can be seen from the Fig. 9(a), the proposed network (Fig. 9(aiii)) removes noise components around highintensity metals more evenly than AdaINbased CycleGAN(Fig. 9(aii)).
To verify that the invertible generator can also properly perform the inverse mapping, we provide an inversely mapped output from SDCT images as shown in Fig. 9(b). The resulting images are cropped at resolution in order to more accurately visualize the improved quality. The intensity of the CT images shown is (1000, 1000) [HU]. Even if the proposed method does not apply discriminator or loss for inverse mapping, the proposed method adds a reasonable level of noise to the SDCT, which makes the output of the LDCT appear closer than the AdaINbased CycleGAN.
In Fig. 10, there are three representative denoising results by the proposed CycleGAN with an invertible generator to verify the noise reduction performance qualitatively. The gray boxes in the input lowdose CT images and the denoising result images are enlarged in order to more accurately visualize the noise reduction performance, and their difference from the input are also visualized. The difference images clearly show the removed noise components. As can be seen from Fig. 10, the proposed method removes noise components evenly without any structural information loss. Therefore, it distinguishes bone and each soft tissue more clear.
VB 20% dose cardiac CT scan dataset
The dataset does not have paired reference data so that quantitative comparison using PSNR and SSIM is not possible. Therefore, we qualitatively compared denoising performance. Intensity of the CT images is shown (1024, 1024) [HU], whereas the difference images are shown in (200, 200) [HU] for 20% dose.
Fig. 11(a) shows denoising result by various methods. Note that the target is not perfectly aligned with the input, since there are no perfectly aligned highdose images in in vivo experiments. Still, the visual inspection and the difference images from the input shows that our cyclefree CycleGAN with an invertible generator removes various noise components more uniformly than the AdaINbased CycleGAN method without incurring any structural distortion. In Fig. 11(b), SDCT images can be successfully converted to noisy images. Even if the proposed method does not apply any discriminator or loss for inverse mapping, our method adds proper noise level to SDCT.
Also, Fig. 12 shows three representative denoising results by the proposed cyclefree CycleGAN with an invertible generator. The gray boxes in the lowdose inputs and denoised outputs are enlarged. The proposed method properly removes noise components from input lowdose CT images, so that each soft tissue in the resulting denoised images is distinguished clearly.
Vi Discussion
As shown in Table II, our network uses only 10% parameters of conventional CycleGAN, and 15% parameters of AdaINbased CycleGAN. This is because we use a single generator and a single discriminator thanks to the invertibility. In addition, the networks in stable coupling layer are relatively light. Accordingly, the discriminator also requires relatively few parameters.
Thanks to its efficient parameter requirement and memory consumption, cyclefree CycleGAN also reduced training time. While training input image size with 256256 resolution, cyclefree CycleGAN shows training time of 12.2 iterations per second. However, our comparative model AdaINbased tunable CycleGAN shows training time of 6.8 iterations per second. Accordingly, the training time is twice faster than that of AdaINbased CycleGAN.
To investigate the optimality of our network architecture, we performed ablation studies. In particular, we investigate the effect of invertible 11 convolution and stable coupling layer, as these are the critical parts for the network design. As shown in Table III, both modules are critical. In particular, the results without 11 convolution layers show inferior performance than using the invertible 11 convolution.
Vii Conclusion
In this paper, we proposed a cyclefree CycleGAN architecture with an invertible generator. Thanks to the invertibility, only a single pair of a generator and a discriminator is necessary, which significantly reduced complexity. Although the number of trainable parameters are only 10% of conventional CycleGAN and 15% of AdaINbased CycleGAN, extensive experimental results confirmed that the proposed method shows better lowdose CT denoising performance with using significantly reduced learnable parameters.
Viii Acknowledgement
This work was supported by the National Research Foundation (NRF) of Korea grant NRF2020R1A2B5B03001980. The authors would like to thank Dr. Dong Hyun Yang from the University of Ulsan College of Medicine for providing the multiphase cardiac CT scan dataset. The authors also thank the Mayo Clinic, the American Association of Physicists in Medicine (AAPM), and the National Institute of Biomedical Imaging and Bioengineering for providing the LowDose CT Grand Challenge dataset.
The derivation of the dual formula is simple modification of the technique in [10]. Consider the primal OT problem:
where refers to the set of the joint distributions with the margins and , and the transportation cost is defined by
We can easily show that
where is defined in (19) and
We now define the optimal joint measure for the primal problem. Using the Kantorovich dual formulations, we have the following two equalities:
(32)  
(33) 
Using Lipschitz continuity of the Kantorovich potential , we have
Using Lipschitz continuity of the Kantorovich potential , we have
This leads to two lower bounds and by taking the average of the two, we have
where is defined in (18). For and upper bound, instead of finding the , we choose in (32); similarly, instead of , we chose in (33). By taking the average of the two upper bounds, we have
where is defined (17). The remaining part of the proof for the dual formula is a simple repetition of the techniques in [10].
References

[1]
H. Chen, Y. Zhang, M. K. Kalra, F. Lin, Y. Chen, P. Liao, J. Zhou, and G. Wang, “Lowdose CT with a residual encoderdecoder convolutional neural network,”
IEEE transactions on medical imaging, vol. 36, no. 12, pp. 2524–2535, 2017. 
[2]
Q. Yang, P. Yan, Y. Zhang, H. Yu, Y. Shi, X. Mou, M. K. Kalra, Y. Zhang, L. Sun, and G. Wang, “Lowdose CT image denoising using a generative adversarial network with Wasserstein distance and perceptual loss,”
IEEE transactions on medical imaging, vol. 37, no. 6, pp. 1348–1357, 2018.  [3] E. Kang, J. Min, and J. C. Ye, “A deep convolutional neural network using directional wavelets for lowdose Xray CT reconstruction,” Medical physics, vol. 44, no. 10, pp. e360–e375, 2017.
 [4] E. Kang, W. Chang, J. Yoo, and J. C. Ye, “Deep convolutional framelet denosing for lowdose CT via wavelet residual network,” IEEE transactions on medical imaging, vol. 37, no. 6, pp. 1358–1369, 2018.
 [5] E. Kang, H. J. Koo, D. H. Yang, J. B. Seo, and J. C. Ye, “Cycleconsistent adversarial denoising network for multiphase coronary CT angiography,” Medical physics, vol. 46, no. 2, pp. 550–562, 2019.
 [6] J. Gu and J. C. Ye, “AdaINbased tunable CycleGAN for efficient unsupervised lowdose CT denoising,” IEEE Transactions on Computational Imaging, vol. 7, pp. 73–85, 2021.

[7]
C. You, G. Li, Y. Zhang, X. Zhang, H. Shan, M. Li, S. Ju, Z. Zhao, Z. Zhang,
W. Cong et al.
, “CT superresolution GAN constrained by the identical, residual, and cycle learning ensemble (GANCIRCLE),”
IEEE transactions on medical imaging, vol. 39, no. 1, pp. 188–203, 2019.  [8] K. Kim, S. Soltanayev, and S. Y. Chun, “Unsupervised training of denoisers for lowdose CT reconstruction without fulldose ground truth,” IEEE Journal of Selected Topics in Signal Processing, vol. 14, no. 6, pp. 1112–1125, 2020.

[9]
J.Y. Zhu, T. Park, P. Isola, and A. A. Efros, “Unpaired imagetoimage translation using cycleconsistent adversarial networks,” in
Proceedings of the IEEE international conference on computer vision
, 2017, pp. 2223–2232.  [10] B. Sim, G. Oh, J. Kim, C. Jung, and J. C. Ye, “Optimal transport driven CycleGAN for unsupervised learning in inverse problems,” SIAM Journal on Imaging Sciences, vol. 13, no. 4, pp. 2281–2306, 2020.
 [11] X. Huang and S. Belongie, “Arbitrary style transfer in realtime with adaptive instance normalization,” in Proceedings of the IEEE International Conference on Computer Vision, 2017, pp. 1501–1510.
 [12] L. Dinh, D. Krueger, and Y. Bengio, “NICE: Nonlinear independent components estimation,” arXiv preprint arXiv:1410.8516, 2014.
 [13] L. Dinh, J. SohlDickstein, and S. Bengio, “Density estimation using Real NVP,” arXiv preprint arXiv:1605.08803, 2016.
 [14] D. P. Kingma and P. Dhariwal, “Glow: Generative flow with invertible 1x1 convolutions,” in NeurIPS, 2018.
 [15] D. P. Kingma and M. Welling, “Autoencoding variational bayes,” arXiv preprint arXiv:1312.6114, 2013.

[16]
D. Rezende and S. Mohamed, “Variational inference with normalizing flows,” in
International Conference on Machine Learning
. PMLR, 2015, pp. 1530–1538.  [17] J. Su and G. Wu, “fVAEs: Improve VAEs with conditional flows,” arXiv preprint arXiv:1809.05861, 2018.
 [18] M. J. Wainwright and M. I. Jordan, Graphical models, exponential families, and variational inference. Now Publishers Inc, 2008.
 [19] C. Villani, Optimal transport: old and new. Springer Science & Business Media, 2008, vol. 338.
 [20] G. Peyré, M. Cuturi et al., “Computational optimal transport,” Foundations and Trends® in Machine Learning, vol. 11, no. 56, pp. 355–607, 2019.
 [21] S. Lim, H. Park, S.E. Lee, S. Chang, B. Sim, and J. C. Ye, “CycleGAN with a blur kernel for deconvolution microscopy: Optimal transport geometry,” IEEE Transactions on Computational Imaging, vol. 6, pp. 1127–1138, 2020.
 [22] G. Oh, B. Sim, H. Chung, L. Sunwoo, and J. C. Ye, “Unpaired deep learning for accelerated MRI using optimal transport driven CycleGAN,” IEEE Transactions on Computational Imaging, vol. 6, pp. 1285–1296, 2020.
 [23] E. Cha, H. Chung, E. Y. Kim, and J. C. Ye, “Unpaired training of deep learning tMRA for flexible spatiotemporal resolution,” IEEE Transactions on Medical Imaging, vol. 40, no. 1, pp. 166–179, 2021.
 [24] H. Chung, E. Cha, L. Sunwoo, and J. C. Ye, “Twostage deep learning for accelerated 3D timeofflight MRA without matched training data,” Medical Image Analysis, p. 102047, 2021.
 [25] J. Lee, J. Gu, and J. C. Ye, “Unsupervised CT metal artifact learning using attentionguided betaCycleGAN,” arXiv preprint arXiv:2007.03480, 2020.
 [26] J. M. Tomczak, “General invertible transformations for flowbased generative modeling,” arXiv preprint arXiv:2011.15056, 2020.
 [27] J. Song, J.H. Jeong, D.S. Park, H.H. Kim, D.C. Seo, and J. C. Ye, “Unsupervised denoising for satellite imagery using wavelet directional CycleGAN,” IEEE Transactions on Geoscience and Remote Sensing, 2020.
 [28] H. J. Koo, D. H. Yang, S. Y. Oh, J.W. Kang, D.H. Kim, J.K. Song, J. W. Lee, C. H. Chung, and T.H. Lim, “Demonstration of mitral valve prolapse with CT for planning of mitral valve repair,” Radiographics, vol. 34, no. 6, pp. 1537–1552, 2014.
 [29] D. H. Yang, Y.H. Kim, J.H. Roh, J.W. Kang, D. Han, J. Jung, N. Kim, J. B. Lee, J.M. Ahn, J.Y. Lee et al., “Stress myocardial perfusion CT in patients suspected of having coronary artery disease: visual and quantitative analysis—validation by using fractional flow reserve,” Radiology, vol. 276, no. 3, pp. 715–723, 2015.
 [30] T. Miyato, T. Kataoka, M. Koyama, and Y. Yoshida, “Spectral normalization for generative adversarial networks,” in International Conference on Learning Representations, 2018.

[31]
J. Behrmann, P. Vicol, K.C. Wang, R. Grosse, and J.H. Jacobsen,
“Understanding and mitigating exploding inverses in invertible neural
networks,” in
International Conference on Artificial Intelligence and Statistics
. PMLR, 2021, pp. 1792–1800. 
[32]
P. Isola, J.Y. Zhu, T. Zhou, and A. A. Efros, “Imagetoimage translation
with conditional adversarial networks,” in
Proceedings of the IEEE conference on computer vision and pattern recognition
, 2017, pp. 1125–1134.  [33] X. Mao, Q. Li, H. Xie, R. Y. Lau, Z. Wang, and S. Paul Smolley, “Least squares generative adversarial networks,” in Proceedings of the IEEE international conference on computer vision, 2017, pp. 2794–2802.
 [34] D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014.
 [35] Z. Wang, A. C. Bovik, H. R. Sheikh, and E. P. Simoncelli, “Image quality assessment: from error visibility to structural similarity,” IEEE transactions on image processing, vol. 13, no. 4, pp. 600–612, 2004.
 [36] O. Ronneberger, P. Fischer, and T. Brox, “Unet: Convolutional networks for biomedical image segmentation,” in International Conference on Medical image computing and computerassisted intervention. Springer, 2015, pp. 234–241.
Comments
There are no comments yet.