Research notebook ‧ Hough → Radon v 1.0 ‧ interactive ‧ 9 figures

Mathematically Detect Geometry in Images

A visual deep-dive: how voting in parameter space detects lines, circles and planes — and why the same idea, taken to its continuous limit, is what makes a CT scanner work.

subjectcomputer vision · inverse problems prereqlinear algebra · basic calculus runtime~ 20 min

{/* ─── §1 The core reframe ─── */}

§ 01

Move the problem to a space where the answer is easy to see.

In an image, a line y = mx + b is a single equation satisfied by infinitely many points. Detecting lines from edge pixels naively — pick groups, test collinearity — is combinatorial and noise-fragile.

Hough’s insight is to invert the relationship between points and shapes. Treat the parameters (m, b) as the coordinates of a new plane. Then a single line in the image is a single point in parameter space; a single point in the image — which could lie on infinitely many lines — is a whole line of (m, b)s in parameter space.

Collinear pixels in the image become parameter-space lines that all pass through one point. Detect that intersection and you’ve detected the underlying line. Section 2 shows the duality directly — drag the points around and watch.

image-space line y = mx + b ⇔ parameter-space point (m, b)
image-space point (x₀, y₀) ⇔ parameter-space line b = −x₀·m + y₀

— a point–line duality

{/* ─── §2 m-b duality ─── */}

§ 02

Point–line duality, drawn.

Each image-space point (top) is rewritten as a line (bottom) via{" "} b = −x₀m + y₀. Drag a point and its parameter line tilts and shifts in lockstep. Press snap to collinear and the parameter lines all converge on a single (m*, b*).

The same duality lives inside parallel coordinates: plot each point's x value on a left axis and its y value on a right axis, then draw a line between them. Collinear points in image space produce PC lines that all cross at one point in the corridor between the axes — the crossing position encodes the slope. This is exactly how the IARC QR pipeline ran: accumulate crossings in a discretised corridor, find the peak, recover the border.

The trick is now visible in two coordinate dresses: a global, spatial property in the image (scattered pixels are collinear) has become a local, summable event in parameter space (many lines crossing at one point). That swap is what makes the algorithm work.

{/* ─── §3 Vertical fail ─── */}

§ 03

The Cartesian (m, b) plane has a fatal flaw.

Rotate a line slowly around the origin. Its slope m = tan θ{" "} moves smoothly while θ is small and then explodes as θ approaches 90°. A uniform grid on m wastes resolution near horizontal and starves near vertical; pure verticals can’t be represented at all.

The fix is to switch parameters. Encode each line by the angle of its normal{" "} θ and its perpendicular distance{" "} ρ from the origin. Both are bounded; every line has a well-defined representation, verticals included.

ρ = x cos θ + y sin θ

— θ ∈ [0, π), ρ ∈ [−D, D] where D is the image diagonal

{/* ─── §4 Polar duality ─── */}

§ 04

The polar fix: each image point becomes a sinusoid.

For a fixed image point (x₀, y₀), sweep θ from 0 to π and the corresponding ρ = x₀ cos θ + y₀ sin θ traces a sinusoid in parameter space. The point–line duality is now a point–sinusoid duality — same logic, bounded space.

Notice how the vertical-line preset (impossible to represent in Fig 2.1) sits comfortably at{" "} θ = 0 here — exactly the symmetry the polar parameterisation was invented for. This example cant handle multiple lines in the same photo, even adding points outside of a well defined line confuses the algorithm. We fix this in the next section

{/* ─── §5 Voting ─── */}

§ 05

Discretise the parameter plane and let the pixels vote.

The continuous picture above doesn’t need to stay continuous. Quantise{" "} θ into 1° bins and ρ into 1-pixel bins, build a 2-D array A[θ][ρ], and have every edge pixel add 1 to every cell its sinusoid passes through. Lines in the image appear as peaks in the accumulator.

The figure feeds in a slightly tilted rectangle and gets back four peaks — one per side. Read the (θ, ρ) of each peak and you’ve recovered the rectangle’s full geometry: the two near-vertical sides sit at θ ≈ 10° but very different ρ (they’re at different perpendicular distances from origin); the two near-horizontal sides sit at θ ≈ 100°, again split by ρ.

Why it’s robust

Occluded lines just lose a few votes (lower peak — still a peak). Noise pixels vote diffusely across the accumulator, so their contribution never piles up in any one cell. This is why a 1962 idea is still in OpenCV.

At IARC Mission 8, a drone photographed four separate fragments of a QR code and had to recover the complete symbol mid-flight. Each fragment's borders produce thousands of collinear edge pixels; their votes pile into the same four{" "} (θ, ρ) cells, giving unambiguous peaks against background noise. The key computational fact is that each pixel sweeps its sinusoid and increments cells independently — no data dependency between pixels. A CPU running the loop serially took ~10 s per frame; rewriting the accumulation in OpenGL and CUDA dropped that under 1 s, enough for real-time guidance.

Practical tweaks that show up everywhere

Probabilistic Hough samples a random subset of edge pixels — much faster, almost as accurate (this is OpenCV’s HoughLinesP).{" "} Gradient-direction prior: at each edge pixel the gradient already tells you the line’s normal, so only sweep θ in a small window around it. Kernel-based Hough: deposit a smooth blob, not a hard vote, so peaks are easier to detect on noisy data.

{/* ─── §6 Circles ─── */}

§ 06

One more parameter, one more dimension: circles.

A circle is (x − a)² + (y − b)² = r² — three parameters (a, b, r). Each edge pixel’s vote-locus is now a 2-D cone in 3-D parameter space. Slice that cone at a fixed{" "} r and each edge pixel votes for a circle of radius r{" "} around itself. When r equals the true radius, those vote-circles all pass through the true centre.

In practice the 3-D accumulator is small enough to enumerate, but using the local{" "} gradient direction at each edge pixel narrows each vote from a full circle to a single point on it — cheap, accurate, and the basis of OpenCV’s HoughCircles.

{/* ─── §7 3D ─── */}

§ 07

The same idea in 3-D: planes, cylinders, n-dimensional hyperplanes.

For any shape parameterised by k numbers, the recipe runs unchanged: every data point becomes a (k − 1)-D surface in k-D parameter space, and you vote in a{" "} k-D accumulator. Plane detection in 3-D point clouds is the workhorse case — a plane is{" "} ρ = x sin φ cos θ + y sin φ sin θ + z cos φ with three parameters (θ, φ, ρ).

Three parameters is roughly the ceiling for grid-based Hough. At usable resolution the accumulator stays under ~10⁸ cells, which fits in memory and you can scan for peaks. Above that, two failure modes appear together. The accumulator{" "} blows up: a 5-D cylinder grid at the same resolution is ~10¹² cells — around a terabyte. And vote-thinness: each datum’s vote spreads across a higher-dimensional surface, so the signal-to-noise at any single cell collapses. You build an enormous structure to find nothing.

The pivot at k ≈ 4 — RANSAC

The standard fix abandons the grid entirely and probes parameter space by{" "} sampling: pick the minimum number of points needed to instantiate the shape (2 for a line, 3 for a plane, 4 for a sphere, 5 for a cylinder), fit the candidate, count agreeing data points, repeat. Keep the candidate with the most agreement.

The conceptual move is the same as Hough — every datum still has a relationship with a region of parameter space — but instead of materialising the accumulator and voting into it, RANSAC implicitly samples it. This is the dominant method above two parameters: PCL, Open3D and COLMAP all fit planes and cylinders via RANSAC, not Hough.

{/* ─── §8 Radon ─── */}

§ 08

Take the limit and you get the Radon transform.

Replace the discrete edge-pixel sum with a continuous integral and you arrive at one of the most important transforms in inverse problems:

R(θ, ρ) = ∬ f(x, y) · δ(ρ − x cos θ − y sin θ) dx dy

— at each (θ, ρ), the line integral of f along the line ρ = x cos θ + y sin θ

Each value R(θ, ρ) is the line integral of{" "} f along the line at angle θ and offset ρ. A single bright point at (x₀, y₀) in the source image lights up exactly the cells where ρ = x₀ cos θ + y₀ sin θ — the same sinusoid that appeared in §4. An extended object is a sum of these sinusoids, one per source point, smeared together — which is why R viewed as a 2-D image is called a{" "} sinogram.

How CT inverts this

A CT scanner physically measures the sinogram. An X-ray source–detector pair rotates around the patient, and each rotation angle θ records one full row of{" "} R(θ, ρ) as the detector picks up the attenuation along every parallel line at that angle. You don’t see your body directly — you see the sinogram.

Reconstruction is the mirror image of Hough voting: for each measured{" "} (θ, ρ), smear that value back along the line it came from, and sum across all angles. Where many lines overlap, the original density builds back up; where few do, contributions cancel. Pre-multiplying each projection by a ramp filter (in the frequency domain) undoes the 1/r blur the naive smear creates — this is{" "} filtered backprojection, the workhorse method of CT. Toggle the ramp filter in the demo above and watch the star artefact collapse into a clean reconstruction.

Where this generalises

In n dimensions you integrate over{" "} (n − 1)-D hyperplanes. In 3-D this gives either the X-ray transform (line integrals through a volume — a CT geometry) or the full 3-D Radon transform (plane integrals — the natural form for some tomography problems). PET, SPECT, certain MRI sequences, ground-penetrating radar, and electron tomography all use variants. Cryo-EM is the 3-D X-ray transform with noise and unknown orientations.

{/* ─── §9 Duality family ─── */}

§ 09

One last reframe: Hough is one corner of a much bigger map.

The switch-to-a-dual-representation idea recurs across mathematics and physics under different names. Each one is a kernel; each one parameterises “the right axis” for a particular question.

Projective duality in computational geometry literally swaps points and lines — convex hulls become half-plane intersections, the k-set problem and ham-sandwich theorems are most naturally stated in dual form. Lifting to a paraboloid{" "} sends 2-D points (x, y) to{" "} (x, y, x² + y²): take the lower convex hull in 3-D, project back, and you have a Delaunay triangulation — with Voronoi diagrams as its dual.

The Legendre transform sends a function to its convex conjugate, f*(p) = sup_x (px − f(x)) — swapping position-velocity space for position-momentum in mechanics, swapping primal for dual in convex optimisation. The Fourier transform sends time to frequency, turning convolution into pointwise multiplication; in X-ray crystallography the diffraction pattern physically computes the Fourier transform of electron density.

The unity is that most of these are integral transforms — generalised inner products of the data against a parameterised family of kernels — and each is invertible under appropriate conditions. Hough is the cleanest pedagogical instance. Once you have the pattern in your head you start spotting it everywhere.

{/* ─── §10 For your research ─── */}

§ 10

For your research — which corner are you in?

If you want to…	Representation	Practical tool
find lines / circles in 2-D edge maps	(θ, ρ) / (a, b, r)	HoughLines, HoughCircles
segment 3-D point clouds into surfaces	(θ, φ, ρ) / cylinder params	RANSAC (PCL, Open3D)
detect a custom shape from a template	translation (× rotation × scale)	Generalised Hough (Ballard ’81)
reconstruct a density from line integrals	sinogram → image	filtered backprojection, iterative recon
reconstruct a 3-D structure from 2-D projections of unknown orientation	3-D X-ray transform + alignment	RELION, cryoSPARC, ASTRA
analyse periodic signals or spatial frequencies	frequency domain	FFT, wavelets
convert constrained min/max into something tractable	primal ↔ dual	Lagrangian / Legendre
analyse geometric incidence / range problems	point ↔ line dual	computational geometry libraries

The mental move is identical every time: identify the structure you want,{" "} write down a parameter space whose coordinates make that structure local,{" "} project your data into it, find peaks. Hough/Radon is the most visually intuitive instance — keep it in your head as the reference picture, and the higher-D and continuous versions follow from the same diagram.