Score: 0

BenchSeg: A Large-Scale Dataset and Benchmark for Multi-View Food Video Segmentation

Published: January 12, 2026 | arXiv ID: 2601.07581v1

By: Ahmad AlMughrabi , Guillermo Rivo , Carlos Jiménez-Farfán and more

Food image segmentation is a critical task for dietary analysis, enabling accurate estimation of food volume and nutrients. However, current methods suffer from limited multi-view data and poor generalization to new viewpoints. We introduce BenchSeg, a novel multi-view food video segmentation dataset and benchmark. BenchSeg aggregates 55 dish scenes (from Nutrition5k, Vegetables & Fruits, MetaFood3D, and FoodKit) with 25,284 meticulously annotated frames, capturing each dish under free 360° camera motion. We evaluate a diverse set of 20 state-of-the-art segmentation models (e.g., SAM-based, transformer, CNN, and large multimodal) on the existing FoodSeg103 dataset and evaluate them (alone and combined with video-memory modules) on BenchSeg. Quantitative and qualitative results demonstrate that while standard image segmenters degrade sharply under novel viewpoints, memory-augmented methods maintain temporal consistency across frames. Our best model based on a combination of SeTR-MLA+XMem2 outperforms prior work (e.g., improving over FoodMem by ~2.63% mAP), offering new insights into food segmentation and tracking for dietary analysis. We release BenchSeg to foster future research. The project page including the dataset annotations and the food segmentation models can be found at https://amughrabi.github.io/benchseg.

RobotSeg: A Model and Dataset for Segmenting Robots in Image and Video

CV and Pattern Recognition

Helps robots see and understand themselves better.

28 Nov 2025 2

87%

January Food Benchmark (JFB): A Public Benchmark Dataset and Evaluation Suite for Multimodal Food Analysis

CV and Pattern Recognition

Helps computers guess food nutrition from pictures.

13 Aug 2025 0

87%

SFOOD: A Multimodal Benchmark for Comprehensive Food Attribute Analysis Beyond RGB with Spectral Insights

CV and Pattern Recognition

Helps computers know food's taste and weight.

6 Jul 2025 0

View PDF Login to Bookmark

BenchSeg: A Large-Scale Dataset and Benchmark for Multi-View Food Video Segmentation

Technical Abstract

RobotSeg: A Model and Dataset for Segmenting Robots in Image and Video

January Food Benchmark (JFB): A Public Benchmark Dataset and Evaluation Suite for Multimodal Food Analysis

SFOOD: A Multimodal Benchmark for Comprehensive Food Attribute Analysis Beyond RGB with Spectral Insights