Fangda HAN | publications

2021

BMVC
Multi-attribute Pizza Generator: Cross-domain Attribute Control with Conditional StyleGAN

Han, Fangda, Hao, Guoyao, Guerrero, Ricardo, and Pavlovic, Vladimir

2021

Abs Bib HTML

Multi-attribute conditional image generation is a challenging problem in computervision. We propose Multi-attribute Pizza Generator (MPG), a conditional Generative Neural Network (GAN) framework for synthesizing images from a trichotomy of attributes: content, view-geometry, and implicit visual style. We design MPG by extending the state-of-the-art StyleGAN2, using a new conditioning technique that guides the intermediate feature maps to learn multi-scale multi-attribute entangled representationsof controlling attributes. Because of the complex nature of the multi-attribute image generation problem, we regularize the image generation by predicting the explicit conditioning attributes (ingredients and view). To synthesize a pizza image with view attributesoutside the range of natural training images, we design a CGI pizza dataset PizzaView using 3D pizza models and employ it to train a view attribute regressor to regularize the generation process, bridging the real and CGI training datasets. To verify the efficacy of MPG, we test it on Pizza10, a carefully annotated multi-ingredient pizza image dataset. MPG can successfully generate photo-realistic pizza images with desired ingredients and view attributes, beyond the range of those observed in real-world training data.
@misc{han2021multiattribute, abbr = {BMVC}, bibtex_show = {true}, selected = {true}, html = {https://arxiv.org/abs/2110.11830}, title = {Multi-attribute Pizza Generator: Cross-domain Attribute Control with Conditional StyleGAN}, author = {Han, Fangda and Hao, Guoyao and Guerrero, Ricardo and Pavlovic, Vladimir}, year = {2021}, eprint = {2110.11830}, archiveprefix = {arXiv}, primaryclass = {cs.CV} }
AAAI
Cross-Modal Coherence for Text-to-Image Retrieval

Han, Fangda, Alikhani, Malihe, Ravi, Hareesh, Kapadia, Mubbasir, Pavlovic, Vladimir, and Stone, Matthew

2021

Abs Bib HTML

Common image-text joint understanding techniques presume that images and the associated text can universally be characterized by a single implicit model. However, co-occurring images and text can be related in qualitatively different ways, and explicitly modeling it could improve the performance of current joint understanding models. In this paper, we train a Cross-Modal Coherence Modelfor text-to-image retrieval task. Our analysis shows that models trained with image–text coherence relations can retrieve images originally paired with target text more often than coherence-agnostic models. We also show via human evaluation that images retrieved by the proposed coherence-aware model are preferred over a coherence-agnostic baseline by a huge margin. Our findings provide insights into the ways that different modalities communicate and the role of coherence relations in capturing commonsense inferences in text and imagery.
@misc{alikhani2021crossmodal, bibtex_show = {true}, abbr = {AAAI}, selected = {true}, html = {https://arxiv.org/abs/2109.11047}, title = {Cross-Modal Coherence for Text-to-Image Retrieval}, author = {Han, Fangda and Alikhani, Malihe and Ravi, Hareesh and Kapadia, Mubbasir and Pavlovic, Vladimir and Stone, Matthew}, year = {2021}, eprint = {2109.11047}, archiveprefix = {arXiv}, primaryclass = {cs.CV}, journal = {arXiv preprint arXiv:2109.11047} }
ICPR
Picture-to-amount (pita): Predicting relative ingredient amounts from food images

Li, Jiatong, Han, Fangda, Guerrero, Ricardo, and Pavlovic, Vladimir

In 2020 25th International Conference on Pattern Recognition (ICPR) 2021

Abs Bib HTML

Increased awareness of the impact of food consumption on health and lifestyle today has given rise to novel data-driven food analysis systems. Although these systems may recognize the ingredients, a detailed analysis of their amounts in the meal, which is paramount for estimating the correct nutrition, is usually ignored. In this paper, we study the novel and challenging problem of predicting the relative amount of each ingredient from a food image. We propose PITA, the Picture-to-Amount deep learning architecture to solve the problem. More specifically, we predict the ingredient amounts using a domain-driven Wasserstein loss from image-to-recipe cross-modal embeddings learned to align the two views of food data. Experiments on a dataset of recipes collected from the Internet show the model generates promising results and improves the baselines on this challenging task. A demo of our system and our data is available at: foodai.cs.rutgers.edu.
@inproceedings{li2021picture, bibtex_show = {true}, abbr = {ICPR}, html = {https://ieeexplore.ieee.org/abstract/document/9412828}, title = {Picture-to-amount (pita): Predicting relative ingredient amounts from food images}, author = {Li, Jiatong and Han, Fangda and Guerrero, Ricardo and Pavlovic, Vladimir}, booktitle = {2020 25th International Conference on Pattern Recognition (ICPR)}, pages = {10343--10350}, year = {2021}, organization = {IEEE} }

2020

Arxiv
MPG: A Multi-ingredient Pizza Image Generator with Conditional StyleGANs

Han, Fangda, Hao, Guoyao, Guerrero, Ricardo, and Pavlovic, Vladimir

arXiv preprint arXiv:2012.02821 2020

Abs Bib HTML

Multilabel conditional image generation is a challenging problem in computer vision. In this work we propose Multi-ingredient Pizza Generator (MPG), a conditional Generative Neural Network (GAN) framework for synthesizing multilabel images. We design MPG based on a state-of-the-art GAN structure called StyleGAN2, in which we develop a new conditioning technique by enforcing intermediate feature maps to learn scalewise label information. Because of the complex nature of the multilabel image generation problem, we also regularize synthetic image by predicting the corresponding ingredients as well as encourage the discriminator to distinguish between matched image and mismatched image. To verify the efficacy of MPG, we test it on Pizza10, which is a carefully annotated multi-ingredient pizza image dataset. MPG can successfully generate photo-realist pizza images with desired ingredients. The framework can be easily extend to other multilabel image generation scenarios.
@article{han2020mpg, bibtex_show = {true}, abbr = {Arxiv}, selected = {true}, html = {https://arxiv.org/abs/2012.02821}, title = {MPG: A Multi-ingredient Pizza Image Generator with Conditional StyleGANs}, author = {Han, Fangda and Hao, Guoyao and Guerrero, Ricardo and Pavlovic, Vladimir}, journal = {arXiv preprint arXiv:2012.02821}, year = {2020} }

WACV

CookGAN: Meal Image Synthesis from Ingredients

Han, Fangda, Guerrero, Ricardo, and Pavlovic, Vladimir

In The IEEE Winter Conference on Applications of Computer Vision 2020

Bib HTML

@inproceedings{han2020cookgan,
  bibtex_show = {true},
  abbr = {WACV},
  selected = {true},
  html = {https://openaccess.thecvf.com/content_WACV_2020/papers/Han_CookGAN_Meal_Image_Synthesis_from_Ingredients_WACV_2020_paper.pdf},
  title = {CookGAN: Meal Image Synthesis from Ingredients},
  author = {Han, Fangda and Guerrero, Ricardo and Pavlovic, Vladimir},
  booktitle = {The IEEE Winter Conference on Applications of Computer Vision},
  pages = {1450--1458},
  year = {2020}
}

2019

Arxiv
The Art of Food: Meal Image Synthesis from Ingredients

Han, Fangda, Guerrero, Ricardo, and Pavlovic, Vladimir

arXiv preprint arXiv:1905.13149 2019

Bib HTML
@article{han2019art, bibtex_show = {true}, abbr = {Arxiv}, html = {https://arxiv.org/abs/2002.11493}, title = {The Art of Food: Meal Image Synthesis from Ingredients}, author = {Han, Fangda and Guerrero, Ricardo and Pavlovic, Vladimir}, journal = {arXiv preprint arXiv:1905.13149}, year = {2019}, url = https://arxiv.org/abs/1905.13149 }
C&G
Cartoonish sketch-based face editing in videos using identity deformation transfer

Zhao, Long, Han, Fangda, Peng, Xi, Zhang, Xun, Kapadia, Mubbasir, Pavlovic, Vladimir, and Metaxas, Dimitris N

Computers & Graphics 2019

Bib HTML
@article{zhao2019cartoonish, bibtex_show = {true}, abbr = {C&G}, selected = {false}, html = {https://www.sciencedirect.com/science/article/pii/S0097849319300147}, title = {Cartoonish sketch-based face editing in videos using identity deformation transfer}, author = {Zhao, Long and Han, Fangda and Peng, Xi and Zhang, Xun and Kapadia, Mubbasir and Pavlovic, Vladimir and Metaxas, Dimitris N}, journal = {Computers \& Graphics}, volume = {79}, pages = {58--68}, year = {2019}, publisher = {Elsevier} }

2017

MST

Correct block artifacts by differential projection for a dynamic computed tomography system

Xiao, Yongshun, Han, Fangda, and Chen, Zhiqiang

Measurement Science and Technology 2017

Bib HTML

@article{xiao2017correct,
  bibtex_show = {true},
  abbr = {MST},
  html = {https://iopscience.iop.org/article/10.1088/1361-6501/aa79cb},
  title = {Correct block artifacts by differential projection for a dynamic computed tomography system},
  author = {Xiao, Yongshun and Han, Fangda and Chen, Zhiqiang},
  journal = {Measurement Science and Technology},
  volume = {28},
  number = {9},
  pages = {094001},
  year = {2017},
  publisher = {IOP Publishing}
}

2015

NSS/MIC
A block-eliminating method by limited-view scan in a dynamic CT system for running aero-engine

Han, Fangda, Xiao, Yongshun, and Chang, Ming

In Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), 2015 IEEE 2015

Bib HTML
@inproceedings{han2015block, bibtex_show = {true}, html = {https://ieeexplore.ieee.org/document/7581934}, abbr = {NSS/MIC}, title = {A block-eliminating method by limited-view scan in a dynamic CT system for running aero-engine}, author = {Han, Fangda and Xiao, Yongshun and Chang, Ming}, booktitle = {Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), 2015 IEEE}, pages = {1--5}, year = {2015}, organization = {IEEE} }
NSS/MIC
Improve spatial resolution by projection restoration for CT reconstruction

Chang, Ming, Xiao, Yongshun, Chen, Zhiqiang, Han, Fangda, and YangDai, Tian-yi

In Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), 2015 IEEE 2015

Bib HTML
@inproceedings{chang2015improve, bibtex_show = {true}, abbr = {NSS/MIC}, html = {https://ieeexplore.ieee.org/document/7582111}, title = {Improve spatial resolution by projection restoration for CT reconstruction}, author = {Chang, Ming and Xiao, Yongshun and Chen, Zhiqiang and Han, Fangda and YangDai, Tian-yi}, booktitle = {Nuclear Science Symposium and Medical Imaging Conference (NSS/MIC), 2015 IEEE}, pages = {1--4}, year = {2015}, organization = {IEEE} }

2014

X 射线源焦点尺寸测量方法和标准综述

韩放达, , 肖永顺, , 常铭, , and 朱晓骅,

中国体视学与图像分析 2014

Bib HTML
@article{韩放达2014x, bibtex_show = {true}, html = {https://www.cnki.com.cn/Article/CJFDTotal-ZTSX201404001.htm}, title = {X 射线源焦点尺寸测量方法和标准综述}, author = {韩放达 and 肖永顺 and 常铭 and 朱晓骅}, journal = {中国体视学与图像分析}, number = {4}, pages = {321--329}, year = {2014} }