We finished training and evaluation of Oriented R-CNN 3× on DOTA le90. With it, the OrientedDet pretrained zoo on Hugging Face (dl4eo/oriented-det-pretrained) is complete for our three detector families: Oriented R-CNN, Rotated Faster R-CNN, and Rotated RetinaNet.

This post focuses on the new checkpoint — why it matters, how it compares to our 1× run and to MMRotate, how to load it, and how odet image-demo runs sliding-window inference on images larger than 1024×1024.


Headline result Link to heading

Oriented R-CNN Oriented R-CNN Δ
eval-val mAP5074.79%79.40%+4.6 pp
Training12 epochs (~2 days)36 epochs (~6 days)LR milestones 8/11 vs 24/33
Hub slugoriented_rcnn_dota_le90_1xoriented_rcnn_dota_le90_3x

eval-val means the published protocol: all 7,669 DOTA val tiles (filter_empty_gt=false), rotated IoU ≥ 0.50 for matching, production decode from the experiment config. Same setup as make eval-val in the repo.

The 1× model already matched our training recipe; the 3× schedule adds +4.6 pp on the full val split and beats the MMRotate Oriented R-CNN 1× reference (75.69%) by +3.7 pp.

Training-time periodic mAP (non-empty val tiles only, tighter internal NMS) peaks at 82.0% at epoch 36 — useful for monitoring, but not the number we publish on the Hub.


How the three families compare (eval-val mAP50) Link to heading

All models: ResNet50-FPN, 1024×1024 tiles, DOTA train+val pretrain, val-only evaluation.

ModelScheduleeval-val mAP50Hub slug
Oriented R-CNN74.79%oriented_rcnn_dota_le90_1x
Oriented R-CNN79.40%oriented_rcnn_dota_le90_3x
Rotated Faster R-CNN3× (+ ProbIoU aux)76.41%rotated_faster_rcnn_dota_le90_3x
Rotated RetinaNet64.14%rotated_retinanet_dota_le90_1x
Rotated RetinaNet71.52%rotated_retinanet_dota_le90_3x

Recommendation: use oriented_rcnn_dota_le90_3x for best accuracy on DOTA-style oriented detection. Keep 1× for faster iteration or when GPU time is limited.

Per-class reports: docs/eval-reports/ in the repo (markdown + analysis JSON; raw predictions.json stays local for the Gradio viewer).


Training curves (periodic validation mAP50) Link to heading

During training we log mAP every 4 epochs on the filtered val split (3,121 non-empty tiles). The chart below shows how (12 epochs) and (36 epochs) behave for each architecture. 1× runs stop at epoch 12; 3× runs continue with LR decays at epochs 24 and 33 (Oriented R-CNN / RetinaNet) or similar milestones.

DOTA le90 — periodic validation mAP50 during training (1× dashed, 3× solid)

DOTA le90 — periodic validation mAP50 during training (1× dashed, 3× solid)

EpochORC 1×ORC 3×RetinaNet 1×RetinaNet 3×Faster R-CNN 3×
464.964.846.939.563.0
872.471.260.358.467.0
1278.173.569.657.172.0
1674.364.172.3
2074.368.473.5
2475.371.175.8
2881.075.980.9
3281.875.481.8
3682.075.581.9

Reading the Oriented R-CNN 3× curve: mAP stays near 74–75% through epoch 24, then jumps to ~81% after the first LR decay — most of the 3× gain comes from the low-LR phase, not from extra epochs at full LR. At epoch 12, the 1× run is actually ahead of 3× on this monitor (78% vs 74%) because the 1× schedule has already stepped LR down; the fair comparison is eval-val at convergence (table above).

We do not publish a Rotated Faster R-CNN Hub weight; only the 3× line appears in the chart.


What improved in Oriented R-CNN 3× (eval-val) Link to heading

Biggest per-class gains vs our 1× checkpoint (same eval protocol):

Class1× AP3× APΔ
bridge56.7%70.8%+14.1 pp
roundabout69.0%81.6%+12.5 pp
soccer-ball-field76.2%86.7%+10.4 pp
swimming-pool59.6%69.1%+9.5 pp
small-vehicle71.3%75.4%+4.1 pp

Hardest classes (ship, storage-tank) move little; extra epochs mainly help rare or structurally diverse categories.


Using the new checkpoint Link to heading

pip install oriented-det   # includes Hugging Face download helpers
odet pretrained download oriented_rcnn_dota_le90_3x

In a training or inference config:

"load_from_checkpoint": "hf://oriented_rcnn_dota_le90_3x"

Demo on a single image (uses bundled recipe + sidecar config from pretrained/):

odet image-demo demo.jpg hf://oriented_rcnn_dota_le90_3x \
  --out-file result.jpg --device cuda \
  --score-thr 0.7 --nms-thr 0.1

Full val evaluation (needs DOTA tiles locally):

make eval-val EXPERIMENT=runs/oriented_rcnn/20260621-092802

Weights, config sidecar, and training log: pretrained/oriented_rcnn_r50_fpn_dota_le90_3x-68957f98.* · Recipe: configs/oriented_rcnn/dota_le90_3x.json


Large images: sliding-window inference Link to heading

DOTA training uses 1024×1024 tiles, but real scenes are often larger. When the input exceeds the model canvas, odet image-demo switches to pad/tile mode automatically: overlapping 1024×1024 windows, detections mapped back to full-image coordinates, then merge NMS.

Example — ship detection on demo/large.jpg (1299×1904) with the new 3× checkpoint:

odet image-demo demo/large.jpg hf://oriented_rcnn_dota_le90_3x \
  --out-file result.jpg \
  --score-thr 0.5 --nms-thr 0.1 \
  --window-batch-size 8 --classes ship

Typical CLI output:

Preprocessing: resize_mode=fixed, target_size=(1024, 1024) (model canvas 1024×1024)
Inference thresholds: score>=0.5, merge NMS IoU<=0.1, overlap_pixels=200, ignore_margin_pixels=0.0
  -> pad/tile (image 1299×1904 vs canvas 1024×1024, overlap_pixels=200, 6 windows)
  -> 310 detections (score >= 0.5, NMS <= 0.1)
  -> 280 after class filter ['ship'] (removed 30)
Saved visualization to result.jpg

Ship detections on demo/large.jpg — Oriented R-CNN 3×

Ship detections on demo/large.jpg — Oriented R-CNN 3×

280 oriented boxes on a dense marina scene — each docked vessel gets a rotated box aligned to its hull, with no visible seams at the six window boundaries.

What to notice:

  • 6 windows for this image size — modest overhead compared with a single tile.
  • --window-batch-size 8 batches window inference on GPU (all six windows in one pass here).
  • --classes ship keeps one DOTA class after detection (310 → 280 boxes).
  • overlap_pixels=200 comes from the bundled recipe default — fine for DOTA-scale objects; increase it if your targets are larger than the overlap band, or they can be split across window boundaries.
  • --score-thr 0.5 works on this harbor scene; the bundled demo.jpg bus tile still prefers ~0.7 (see below).

For a zero-shot maritime experiment on a Copernicus Sentinel-2 tile — zoom, overlap, and margin tuned for small ships — see Zero-shot ship detection on a Copernicus Sentinel-2 tile with Oriented R-CNN.


Demo thresholds (short note) Link to heading

--score-thr and --nms-thr on odet image-demo are post-decode filters. Values tuned on one architecture do not transfer to the others: Faster R-CNN and RetinaNet typically need stricter score cutoffs on busy scenes. Oriented R-CNN 3× is still the best default on our demo.jpg bus scene at roughly --score-thr 0.7 --nms-thr 0.1.

Production NMS for Oriented R-CNN uses production.final_nms_iou_threshold: 0.3 in the recipe; 0.5 is the mAP matching IoU, not detection NMS.



June 29, 2026 — Jeff Faudi