Oriented object detection becomes concrete when you look at the boxes.
In this post we will walk through detections produced by Oriented R-CNN on the 15 original classes from DOTA v1.0, the benchmark dataset that shaped much of the modern work on rotated object detection in aerial imagery.
DOTA matters because it is not a neat toy dataset. Its images contain large scenes, dense object layouts, arbitrary object directions, and strong scale variation. Aircraft, ships, sports fields, bridges, harbors, storage tanks, vehicles, and helicopters do not line up with image axes just because our detectors prefer rectangles. DOTA forces the detector to care about orientation.
What is DOTA? Link to heading
DOTA is a large-scale dataset for object detection in aerial images. DOTA v1.0 contains 2,806 images and 188,282 annotated object instances across 15 common categories. Objects are annotated as arbitrary quadrilaterals, which makes the dataset a natural benchmark for oriented bounding box detectors.
The official dataset page lists imagery from several sources:
- Google Earth
- GF-2 and JL-1 satellite imagery provided by the China Centre for Resources Satellite Data and Application
- Aerial images provided by CycloMedia B.V.
The usage terms are important. According to the official DOTA dataset page, use of Google Earth images must respect the Google Earth terms of use, and all DOTA images and associated annotations can be used for academic purposes only; commercial use is prohibited.
For practical EO work, that means DOTA is best treated as a public research benchmark: useful for comparing architectures, validating geometry, and building intuition, but not a source of commercial training imagery.
The 15 DOTA v1.0 classes Link to heading
DOTA v1.0 defines the following original classes:
| Class | Why orientation matters |
|---|---|
| Plane | Aircraft heading and footprint are naturally rotated on aprons and runways. |
| Ship | Harbors and waterways create dense scenes where axis-aligned boxes overlap heavily. |
| Storage tank | Circular objects are easy, but dense tank farms still benefit from precise localization. |
| Baseball diamond | The object footprint is strongly geometric and rarely axis-aligned. |
| Tennis court | Courts appear in repeated grids and rotated urban layouts. |
| Basketball court | Small rectangular sports facilities are sensitive to box angle. |
| Ground track field | Large elongated sports fields need oriented extent rather than loose rectangles. |
| Harbor | Harbors are large, irregular, and often packed with ships and infrastructure. |
| Bridge | Long thin structures are one of the clearest cases for oriented boxes. |
| Large vehicle | Trucks and buses appear at arbitrary parking and road angles. |
| Small vehicle | Dense vehicle areas quickly become cluttered with horizontal boxes. |
| Helicopter | Small aircraft need accurate orientation when parked close to other assets. |
| Roundabout | Circular road structures are visually distinctive but context-dependent. |
| Soccer ball field | Field boundaries are rectangular and frequently rotated. |
| Swimming pool | Pools vary in size and orientation across dense urban scenes. |
What we are showing Link to heading
The detections below are produced with Oriented R-CNN, using the DOTA-style oriented bounding box setup from oriented-det. The ground truth provided by DOTA is displayed in green ; predictions by Oriented-Det are displayed in red (all classes).
Oriented R-CNN is a good baseline for this visual tour because it combines a strong two-stage detector with oriented region prediction. Instead of returning horizontal boxes that include too much background, it predicts rotated boxes aligned with the object footprint. For EO imagery, that difference is not cosmetic: it affects duplicate suppression, dense-scene readability, footprint estimation, and downstream workflows where orientation is part of the information. These illustration have been created with the 1x pre-trained version.
Each illustration highlights one DOTA class. The goal is not to claim perfect production performance from a benchmark checkpoint. The goal is simpler: show what oriented detections look like across the full original DOTA label set.
Detection gallery Link to heading
Plane Link to heading

Ship Link to heading

Storage tank Link to heading

Baseball diamond Link to heading

Tennis court Link to heading

Basketball court Link to heading

Ground track field Link to heading

Harbor Link to heading

Bridge Link to heading

Large vehicle Link to heading

Small vehicle Link to heading

Helicopter Link to heading

Roundabout Link to heading

Soccer ball field Link to heading

Swimming pool Link to heading

Takeaway Link to heading
DOTA is a useful reminder that aerial object detection is not just object detection from above. Orientation is part of the visual signal. When objects are long, dense, rotated, or tightly packed, a horizontal rectangle often describes the image crop more than it describes the object.
That is why Oriented R-CNN remains a strong baseline for EO work: it makes object angle explicit, keeps detections readable in dense scenes, and gives downstream systems a geometry closer to the real footprint.
References Link to heading
- DOTA dataset
- DOTA dataset page: image source, usage license, and object categories
- Google Earth terms of use
- oriented-det on GitHub
- Previous post: Oriented-Det v0.1.0 is out