CV计算机视觉每日开源代码Paper with code速览

news2026/5/7 17:16:59

墙裂推荐想获取更多前沿论文及算法优化idea冲击顶会或发表专利包含目标检测、目标跟踪、图像分割、视频分割、Visual Grounding、可见光红外融合、多任务学习、多模态基础模型、文生图、自动驾驶、BEV、占用预测、具身智能VLA、深度估计、动作识别、表情识别、三维重建、点云3D检测、医学图像分割、医学图像目标检测、医学大模型、缺陷检测、异常检测、遥感图像分割、遥感图像变化检测、数字人、知识蒸馏、视频理解、3D生成、姿态估计、图像增强、人群/目标计数、视频编辑、图像去雨等众多主题请参考https://qcno08je5sgu.feishu.cn/1.【图像融合】UniFusion: A Unified Image Fusion Framework with Robust Representation and Source-Aware Preservation论文地址https://arxiv.org//pdf/2603.14214开源代码https://github.com/dusongcheng/UniFusion2.【多模态大模型】UAVBench and UAVIT-1M: Benchmarking and Enhancing MLLMs for Low-Altitude UAV Vision-Language Understanding论文地址https://arxiv.org//pdf/2603.14336工程主页SOCIAL MEDIA TITLE TAG开源代码https://github.com/ZhanYang-nwpu/UAVBench-and-UAVIT-1M3.【多模态大模型】Deeper Thought, Weaker Aim: Understanding and Mitigating Perceptual Impairment during Reasoning in Multimodal Large Language Models论文地址https://arxiv.org//pdf/2603.14184开源代码即将开源https://github.com/Ivine11/VRGA4.【医学大模型】ICLR2026How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images论文地址https://arxiv.org//pdf/2603.14323工程主页How Do Medical MLLMs Fail? A Study on Visual Grounding in Medical Images开源代码https://github.com/Guimeng-Leo-Liu/Medical-MLLMs-Fail5.【行人重识别】CVPR2026BIT: Matching-based Bi-directional Interaction Transformation Network for Visible-Infrared Person Re-Identification论文地址https://arxiv.org//pdf/2603.14243开源代码即将开源https://github.com/Xuan266/BIT6.【数字人】AvatarForcing: One-Step Streaming Talking Avatars via Local-Future Sliding-Window Denoising论文地址https://arxiv.org//pdf/2603.14331工程主页https://cuiliyuan121.github.io/AvatarForcing/开源代码https://github.com/KlingAIResearch/AvatarForcing/tree/main7.【视觉语言导航】AerialVLA: A Vision-Language-Action Model for UAV Navigation via Minimalist End-to-End Control论文地址https://arxiv.org//pdf/2603.14363开源代码https://github.com/XuPeng23/AerialVLA8.【视觉语言导航】ICLR2026All-day Multi-scenes Lifelong Vision-and-Language Navigation with Tucker Adaptation论文地址https://arxiv.org//pdf/2603.14276工程主页All-Day Multi-Scenes Lifelong Vision-And-Language Navigation With Tucker-Adaption开源代码https://github.com/Ganvin-Li/AlldayWalker9.【文生图】Fair Benchmarking of Emerging One-Step Generative Models Against Multistep Diffusion and Flow Models论文地址https://arxiv.org//pdf/2603.14186开源代码https://github.com/Harvard-AI-and-Robotics-Lab/FairBenchmarkingFlow10.【文生视频】Early Failure Detection and Intervention in Video Diffusion Models论文地址https://arxiv.org//pdf/2603.14320开源代码即将开源https://github.com/kaist-ami/Early-failure-video-diffusion11.【文生视频】Seeking Physics in Diffusion Noise论文地址https://arxiv.org//pdf/2603.14294工程主页Seeking Physics in Diffusion Noise代码即将开源12.【图像生成】Representation Alignment for Just Image Transformers is not Easier than You Think论文地址https://arxiv.org//pdf/2603.14366开源代码https://github.com/kaist-cvml/PixelREPA群内包含目标检测、图像分割、目标跟踪、Transformer、多模态、NeRF、GAN、缺陷检测、显著目标检测、关键点检测、超分辨率重建、SLAM、人脸、OCR、生物医学图像、三维重建、姿态估计、自动驾驶感知、深度估计、视频理解、行为识别、图像去雾、图像去雨、图像修复、图像检索、车道线检测、点云目标检测、点云分割、图像压缩、运动预测、神经网络量化、网络部署等多个领域的大佬不定期分享技术知识、面试技巧和内推招聘信息。

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.coloradmin.cn/o/2592117.html

如若内容造成侵权/违法违规/事实不符，请联系多彩编程网进行投诉反馈，一经查实，立即删除！