The RoboFlow 100 (RF100) benchmark consists of 34 diverse object detection datasets organized into 6 collections. This vignette provides a comprehensive catalog to help you find the right dataset for your task.
The RF100 datasets cover a wide range of domains including:
The easiest way to find datasets is using the search functions:
library(torchvision)
# Search for specific topics
search_rf100("cell") # Find cell-related datasets
search_rf100("solar") # Find solar panel datasets
search_rf100("x-ray") # Find X-ray datasets
# List all datasets in a collection
search_rf100(collection = "biology")
search_rf100(collection = "medical")
# View complete catalog
catalog <- get_rf100_catalog()
View(catalog)One of the motivations for this catalog was answering questions like: “Is there a photovoltaic dataset in torchvision?”
Here’s the complete catalog of all RF100 datasets:
Microscopy and biological imaging datasets for research and diagnostics:
Available datasets:
stomata_cell: Plant stomata cells for biology
researchblood_cell: Blood cell detection (RBC, WBC,
platelets)parasite: Parasite detection in microscopy imagescell: General cell detection in microscopybacteria: Bacteria detection in microscopy imagescotton_disease: Cotton plant disease detectionmitosis: Mitosis phase detection in cell imagesphage: Bacteriophage detection in microscopyliver_disease: Liver disease pathology detectionMedical imaging datasets for clinical and research applications:
Available datasets:
radio_signal: Radio signal detection in medical
imagingrheumatology: Rheumatology X-ray abnormality
detectionknee: ACL and knee X-ray analysisabdomen_mri: Abdomen MRI organ detectionbrain_axial_mri: Brain axial MRI structure
detectiongynecology_mri: Gynecology MRI structure detectionbrain_tumor: Brain tumor detection in MRI scansfracture: Bone fracture detection in X-raysThermal and infrared imaging datasets:
Available datasets:
thermal_dog_and_people: Thermal imaging of dogs and
peoplesolar_panel: Solar panel detection in infrared
imagerythermal_cheetah: Thermal imaging of cheetahsir_object: FLIR camera object detectionInfrastructure damage and defect detection:
Available datasets:
liquid_crystals: 4-fold defect detection in LCD
displayssolar_panel: Solar panel defect and damage
detectionasbestos: Asbestos detection for safety inspectionMarine and underwater imaging datasets:
Available datasets:
pipes: Underwater pipe detection for
infrastructureaquarium: Aquarium fish and species detectionobjects: Underwater object detectioncoral: Coral reef detection and monitoringDocument analysis and OCR datasets:
Available datasets:
tweeter_post: Twitter post element detectiontweeter_profile: Twitter profile element detectiondocument_part: Document structure and part
detectionactivity_diagram: Activity diagram element
detectionsignature: Signature detection in documentspaper_part: Academic paper structure detectionOnce you’ve found a dataset, loading it is straightforward:
library(torchvision)
# Search for blood cell dataset
search_rf100("blood")
# Load the dataset
ds <- rf100_biology_collection(
dataset = "blood_cell",
split = "train",
download = TRUE
)
# Inspect a sample
item <- ds[1]
print(item$y$labels) # Object classes
print(item$y$boxes) # Bounding boxes
# Visualize with bounding boxes
boxed <- draw_bounding_boxes(item)
tensor_image_browse(boxed)catalog <- get_rf100_catalog()
# Total size of all datasets
sum(catalog$total_size_mb) / 1024 # In GB
# Datasets by size
catalog[order(-catalog$total_size_mb), c("dataset", "collection", "total_size_mb")]
# Smallest and largest datasets
catalog[which.min(catalog$total_size_mb), ]
catalog[which.max(catalog$total_size_mb), ]
# Average size by collection
aggregate(total_size_mb ~ collection, data = catalog, FUN = mean)The catalog is a regular data frame, so you can use standard R operations:
# Find small datasets (< 20 MB total)
subset(catalog, total_size_mb < 20)
# Find large datasets (> 200 MB total)
subset(catalog, total_size_mb > 200)
# Find datasets with specific keywords
subset(catalog, grepl("tumor|cancer|disease", description, ignore.case = TRUE))
# Datasets with all three splits
subset(catalog, has_train & has_test & has_valid)?rf100_biology_collection,
?rf100_medical_collection, etc.?draw_bounding_boxes for visualizing detectionsIf you use RF100 datasets in your research, please cite:
@article{roboflow100,
title={Roboflow 100: A Rich, Multi-Domain Object Detection Benchmark},
author={Roboflow},
journal={arXiv preprint},
year={2022}
}