Computer Vision + Generative AI

Tony Ng

AI Research Scientist at Meta, building diffusion systems for image, video, and audio generation at scale.

AI Research Scientist at Meta. PhD at MatchLab, Imperial College London. Ex-Synthesia & Scape Technologies.

Research Focus

Generative Systems

Diffusion models for image, video, and audio generation with real-world quality and reliability constraints.

Visual Localization

Learning-based localization that blends geometry with deep representations for AR/VR at scale.

Privacy + Security

Content-concealing descriptors and robust perception for privacy-preserving visual systems.

Now

I build and evaluate diffusion systems for ad creatives at Meta AI Research. I’m interested in controllable generation, scalable data curation, and evaluation frameworks that move beyond surface-level metrics.

I’m open to collaborations on generative media systems, privacy-preserving perception, and robust evaluation.

selected publications

  1. CVPR
    NinjaDesc: Content-Concealing Visual Descriptors via Adversarial Learning
    Tony Ng, Hyo Jin Kim, Vincent Lee, Daniel DeTone, Tsun-Yi Yang, Tianwei Shen, Eddy Ilg, Vassileios Balntas, Krystian Mikolajczyk, Chris Sweeney
    In CVPR, 2022
  2. arXiv
    Reassessing the Limitations of CNN Methods for Camera Pose Regression
    Tony Ng, Adrian Lopez-Rodriguez, Vassileios Balntas, Krystian Mikolajczyk
    arXiv preprint, 2021
  3. ECCV
    SOLAR: Second-Order Loss and Attention for Image Retrieval
    Tony Ng, Vassileios Balntas, Yurun Tian, Krystian Mikolajczyk
    In ECCV, 2020

news

Dec 10, 2025 New preprint: TUNA — Taming Unified Visual Representations for Native Unified Multimodal Models (arXiv:2512.02014).
Aug 1, 2024 Started a new role as an AI Research Scientist at Meta, focusing on diffusion models for image, video, and audio generation.
Feb 1, 2023 Joined Synthesia as a Research Engineer, working on controllable video diffusion models for AI dubbing on avatars.
Oct 7, 2022 I completed a second research internship at Reality Labs, this time working on multi-modal understanding (text & geometry) using language models.
Jun 24, 2022 Our paper NinjaDesc: Content-Concealing Visual Descriptors via Adversarial Learning was presented at CVPR 2022, New Orleans LA.