Probability Density Estimation for Affordance Detection

A literature review and comparison of incremental PDE methods for online affordance detection from autoencoder-derived latent representations.

Overview

Affordances, originally proposed by James J. Gibson, describe the actions implied by objects and stimuli in an environment (“chair implies sit,” “food implies eat”). Encoding affordances into probabilistic models for real-time robotic decision-making remains an open problem, particularly when sensor data arrives as a stream and the underlying distribution shifts over time.

This project surveys and compares three incremental Probability Density Estimation (PDE) methods for use in an affordance detection pipeline. The downstream system uses Variational Convolutional Autoencoders (VCAEs) to obtain compact latent representations of environment images, and PDE is applied over those latent spaces to model the distribution of states in which a given affordance was successfully detected. Because the number of distributions to maintain scales with the action space, the chosen method must run fast enough to remain comparable to a no-affordance-detection baseline.

Background

Standard density estimation falls into two broad camps. Parametric Bayesian methods assume the data comes from a known family (Gaussian, Poisson, etc.) and incrementally adjust hyperparameters; mixture models extend this to multimodal distributions but require estimating the number of components, which is itself a hard problem under streaming data. Non-parametric methods like Kernel Density Estimators (KDEs) avoid distributional assumptions but traditionally require batch recomputation of the entire density on each update, making them expensive for streams.

Methods Compared

1. Incremental KDE for Data Stream Computation (He et al., 2020)

He and Jiang propose decomposing a KDE trained on the union of datasets into a weighted sum of KDEs trained on individual batches. When a new batch arrives, only its local optimal bandwidth needs to be computed rather than re-deriving bandwidths over all data. This drops time complexity from O((Σ Nₜ)² D) to O((Σ Nₜ²) D), with negligible loss in convergence quality measured by Jensen–Shannon divergence.

Left: runtime comparison between I-KDE and full retraining. Right: I-KDE convergence on univariate and multivariate distributions, with JS divergence values demonstrating fit quality.

2. Density Estimation Over Data Streams (Zhou et al., 2008)

Zhou’s approach updates the density per data point rather than per batch, and introduces M-Kernels — composite kernels that approximate the sum of two kernels with a single weighted kernel. A fixed-size buffer holds kernel descriptors {μ, h, p, m_cost}; when full, the two kernels with the lowest merge cost are combined, freeing space without storing raw data. This guarantees linear time and bounded memory.

Linear-time scaling, density adjustment over a randomly sampled stream, and the tradeoff between buffer size, accuracy, and runtime.

3. Incremental Variational Bayesian GMM with Decremental Optimization (Dai et al., 2023)

Dai and Zhao propose an incremental VBGMM that handles distribution shift by adding new Gaussian components when novel modes appear, then running a decremental optimization step that removes redundant components based on their latent assignment counts. The model uses Variational Bayesian Expectation-Maximization with KL divergence as a stopping criterion. While powerful for capturing distribution drift in fault detection settings, the per-step cost is significantly higher than the kernel-based methods.

The IncVBGMM pipeline: increment to incorporate new data, then decrement to prune redundant components.

Conclusions

For affordance detection, the system must maintain 𝒜 × f × 2 density estimators and update them simultaneously on each tick, so per-step cost is the dominant constraint. The I-KDE saves time over full retraining but still re-runs cross-validation for bandwidth selection on each batch and does not address memory bounds. The IncVBGMM is the most expressive but its KL-based EM loop is too expensive for the target setting. Zhou’s M-Kernel approach was selected for the affordance detection pipeline: it guarantees linear time, bounds memory through its buffer, and is straightforward to implement.

Resources