Sajad Hamzenejadi

I'm a PhD student in Information Systems at University of Geneva (UNIGE), previously a research intern at Nokia Bell Labs in Paris, France, and I completed my MSc in Telecommunications Engineering at Politecnico di Milano (Polimi).

This page is currently under cunstruction, come back later!

E-mail / Google Scholar / Github / LinkedIn

Figure 1: Sajad in AI world!

📑 Selected Publication

* denotes equal contribution; Highlighted papers are representative first-author works

	ArXiv 2025 Android botnet detection using convolutional neural networks Sina Hojjatinia, Sajad Hamzenejadi, Hadis Mohseni project page / paper / youtube / code Key Words: Long Video Generation; End-to-end Filming; Human Talking/Dancing Animation Summary: Stable Video Infinity (SVI) is able to generate ANY-length videos with high temporal consistency, plausible scene transitions, and controllable streaming storylines in ANY domains. SVI incorporates Error-Recycling Fine-Tuning, a new type of efficient training that recycles the Diffusion Transformer (DiT)’s self-generated errors into supervisory prompts, thereby encouraging DiT to actively correct its own errors.
	ArXiv 2025 Factorized Video Generation: Decoupling Scene Construction and Temporal Synthesis in Text-to-Video Diffusion Models Mariam Hassan, Bastien Van Delft, Wuyang Li, Alexandre Alahi project page / paper / code (coming) Key Words: Video Factorization; Text-to-Video Diffusion Models Summary: We propose Factorized Video Generation (FVG), a simple yet effective pipeline that decomposes text-to-video generation into three stages: reasoning, composition, and temporal synthesis..
	ArXiv 2025 RAP: 3D Rasterization Augmented End-to-End Planning Lan Feng, Yang Gao, Éloi Zablocki, Quanyi Li, Wuyang Li, Sichao Liu, Matthieu Cord, Alexandre Alahi project page / paper / code Key Words: End-to-End Planning; 3D Rasterization; Data Scaling Summary: We propose RAP, a Raster-to-Real feature-space alignment that bridges the sim-to-real gap without requiring pixel-level realism. RAP ranks 1st in the Waymo Open Dataset Vision-based End-to-End Driving Challenge (2025) (UniPlan entry); Waymo Open Dataset Vision-based E2E Driving Leaderboard, NAVSIM v1 navtest, and NAVSIM v2 navhard
	NeurIPS 2025 Spotlight VoxDet: Rethinking 3D Semantic Occupancy Prediction as Dense Object Detection Wuyang Li, Zhuy Yu, Alexandre Alahi project page / paper / code Key Words: 3D Semantic Occupancy Prediction; Dense Object Detection Summary: 3D semantic occupancy prediction aims to reconstruct the 3D geometry and semantics of the surrounding environment. With dense voxel labels, prior works typically formulate it as a dense segmentation task, independently classifying each voxel without instance-level perception. Differently, VoxDet addresses semantic occupancy prediction with an instance-centric formulation inspired by dense object detection, which uses a VoxNT trick for freely transferring voxel-level class labels to instance-level offset labels.
	NeurIPS 2025 See&Trek: Training-Free Spatial Prompting for Multimodal Large Language Model Pengteng Li, Pinhao Song Wuyang Li, Weiyu Guo, Huizai Yao, Yijie Xu, Dugang Liu, Hui Xiong paper Key Words: Spatial Understanding; Multimodal Large Language Model Summary: We introduce SEE&TREK, the first training-free prompting framework tailored to enhance the spatial understanding of Multimodal Large Language Models (MLLMS) under vision-only constraints. While prior efforts have incorporated modalities like depth or point clouds to improve spatial reasoning, purely visualspatial understanding remains underexplored. SEE&TREK addresses this gap by focusing on two core principles: increasing visual diversity and motion reconstruction.
	ICCV 2025 Highlight MetaScope: Optics-Driven Neural Network for Ultra-Micro Metalens Endoscopy Wuyang Li, Wentao Pan, Xiaoyuan Liu, Zhendong Luo, Chenxin Li, Hengyu Liu, Din Ping Tsai, Mu Ku Chen, Yixuan Yuan project page / paper/ code (coming) Key Words: Metalens, Computation Photography, Endoscopy, Optical Imaging Summary*: Unlike conventional endoscopes limited by millimeter-scale thickness, metalenses operate at the micron scale, serving as a promising solution for ultra-miniaturized endoscopy. However, metalenses suffer from intensity decay and chromatic aberration. To address this, we developed MetaScope, an optics-driven neural network for metalens-based endoscopy, offering a promising pathway for next-generation ultra-miniaturized medical imaging devices.
	NeurIPS 2025 IR3D-Bench: Evaluating Vision-Language Model Scene Understanding as Agentic Inverse Rendering Parker Liu, Chenxin Li, Zhengxin Li, Yipeng Wu, Wuyang Li, Zhiqin Yang, Zhenyue Zhang, Yunlong Lin, Sirui Han, Brandon Y. Feng project page / paper / code Key Words: 3D Scene Understanding; Vision-Language Model; Inverse Rendering Summary: We propose IR3D-Bench, a benchmark that challenges VLMs to demonstrate real scene understanding by actively recreating 3D structures from images using tools. An "understanding-by-creating" approach that probes the generative and tool-using capacity of vision-language agents (VLAs), moving beyond the descriptive or conversational capacity measured by traditional scene understanding benchmarks.
	AAAI 2025 Top-1 most influential paper U-KAN Makes Strong Backbone for Medical Image Segmentation and Generation Chenxin Li, Xinyu Liu, Wuyang Li, Cheng Wang, Hengyu Liu, Yifan Liu, Zhen Chen, Yixuan Yuan project page/ paper/ code Key Words: Kolmogorov-Arnold Networks; Medical Image Segmentation/Generation; Medical Backbone Summary: We propose the first KAN-based medical backbone, U-KAN, which can be seamlessly integrated with existing medical image segmentation and generation models to boost their performance with minimal computational overhead. This work has been cited more than 250 times in one year.

I stole these guys' source code! See this and this.