stdpipe.realbogus module

Real-Bogus Classifier for Astronomical Object Detection

This module provides a CNN-based classifier to distinguish real astronomical sources (stars, galaxies) from artifacts (cosmic rays, hot pixels, satellite trails) in detected object catalogs.

Key Features: - FWHM-invariant: Hybrid downscaling to canonical PSF size (no auxiliary FWHM input) - Brightness-invariant: Peak normalization generalizes to any flux level - Pure morphology: Classification based solely on source shape from 2-channel images - 2-channel input: background-subtracted (linear), asinh-scaled (dynamic range compression) - Lightweight 5-layer CNN (~100k parameters) - Batch processing for efficient inference - Optional TensorFlow dependency

Usage:

from stdpipe import photometry, realbogus

# Detect objects obj = photometry.get_objects_sep(image, thresh=3.0)

# Classify and filter obj_clean = realbogus.classify_realbogus(obj, image, threshold=0.5)

# Or add scores without filtering obj = realbogus.classify_realbogus(obj, image, add_score=True, flag_bogus=False) print(obj[‘rb_score’])

Author: STDPipe Contributors

stdpipe.realbogus.create_realbogus_model(input_shape=(31, 31, 2), filters=(32, 64, 128), dense_units=64, dropout_rate=0.5)[source]

Create CNN architecture for real-bogus classification.

Architecture:
  • 3-5 convolutional layers with batch normalization

  • Global average pooling (handles variable input sizes)

  • Dense layer with dropout

  • Sigmoid output (binary classification)

Design Philosophy:
  • FWHM-invariant: Images downscaled to canonical FWHM, no auxiliary FWHM input needed

  • Brightness-invariant: Peak normalization allows generalization to any flux level

  • Pure morphology: Classification based solely on source shape

Input Channels:
  • Channel 0: Background-subtracted (linear scale), peak-normalized

  • Channel 1: Asinh-scaled background-subtracted, peak-normalized

Parameters:
input_shapetuple, optional

Input shape (height, width, channels). Default: (31, 31, 2) Height/width can be None for variable-size inputs.

filterstuple, optional

Number of filters in each conv layer. Default: (32, 64, 128)

dense_unitsint, optional

Units in dense layer. Default: 64

dropout_ratefloat, optional

Dropout rate for regularization. Default: 0.5

Returns:
modelkeras.Model

Compiled Keras model ready for training

stdpipe.realbogus.preprocess_cutout(cutout_sci, cutout_bg=None, cutout_err=None, fwhm=None, target_fwhm=3.0, target_size=31, downscale_threshold=1.5, normalize=True, asinh_softening=None)[source]

Preprocess cutout for CNN input.

Steps:
  1. Optional scaling (downscale or upscale) to canonical FWHM

  2. Create 2-channel input (background-subtracted linear, asinh-scaled)

  3. Peak normalization (each channel normalized by its own peak value)

  4. Pad/crop to target size

FWHM Scaling Strategy (Symmetric):
  • Downscaling (FWHM > target_fwhm × threshold): Integer block averaging

  • No scaling (target_fwhm / threshold ≤ FWHM ≤ target_fwhm × threshold): Keep as-is

  • Upscaling (FWHM < target_fwhm / threshold): Integer pixel replication

Default: target_fwhm=3.0, threshold=1.5 → Downscale if FWHM > 4.5, upscale if FWHM < 2.0, else unchanged

This ensures all PSFs normalized to approximately the same size regardless of sharpness, eliminating FWHM as a confounding variable.

Channel Design:
  • Channel 0: Background-subtracted (linear scale), peak-normalized

  • Channel 1: Asinh-scaled background-subtracted, peak-normalized

Peak normalization makes the representation brightness-invariant: all sources (faint to extremely bright) are scaled to [-1, 1] range based on their peak value. This allows the CNN to learn pure morphological features that generalize to ANY brightness level, including sources far brighter than the training set.

The asinh channel complements the linear channel by providing compressed dynamic range information useful for distinguishing extended vs. compact sources.

Parameters:
cutout_scindarray

Science image cutout (assumed to be background-subtracted if cutout_bg is None)

cutout_bgndarray, optional

Background cutout (or scalar value). If None, estimated from cutout edges.

cutout_errndarray or float, optional

Error/noise cutout (or scalar value). Used to estimate the noise level (sigma) for asinh softening. Only the median value is used.

fwhmfloat, optional

Image FWHM in pixels. If provided, cutout will be downscaled to target_fwhm.

target_fwhmfloat, optional

Target FWHM for downscaling normalization. Default: 3.0

target_sizeint, optional

Target cutout size (square). Default: 31

downscale_thresholdfloat, optional

Only downscale if fwhm/target_fwhm > threshold. Default: 1.5

normalizebool, optional

Apply peak normalization to each channel (scales to [-1, 1] range). Default: True. This makes the representation brightness-invariant.

asinh_softeningfloat, optional

Asinh softening in units of background sigma. If None, uses DEFAULT_ASINH_SOFTENING_SIGMA. Actual softening is (asinh_softening * sigma), where sigma is estimated from cutout_err.

Returns:
preprocessedndarray

Preprocessed cutout (target_size, target_size, 2)

scale_factorfloat

Applied scale factor (for diagnostics)

stdpipe.realbogus.extract_cutouts(obj, image, bg=None, err=None, mask=None, radius=15, fwhm=None, target_fwhm=3.0, asinh_softening=None, verbose=False)[source]

Extract and preprocess cutouts for all objects.

Parameters:
objastropy.table.Table

Object catalog with ‘x’ and ‘y’ columns

imagendarray

Science image

bgndarray or float, optional

Background map or scalar value

errndarray or float, optional

Error/noise map or scalar value

maskndarray, optional

Boolean mask (True = masked)

radiusint, optional

Cutout radius in pixels. Default: 15 (31x31 cutouts)

fwhmfloat, optional

Image FWHM. If None, estimated from object catalog.

target_fwhmfloat, optional

Target FWHM for downscaling. Default: 3.0

asinh_softeningfloat, optional

Asinh softening in units of background sigma. If None, uses DEFAULT_ASINH_SOFTENING_SIGMA.

verbosebool, optional

Print progress. Default: False

Returns:
cutoutsndarray

Array of preprocessed cutouts (N, 2*radius+1, 2*radius+1, 2)

valid_indicesndarray

Indices of successfully extracted cutouts

stdpipe.realbogus.load_realbogus_model(model_file=None, verbose=False)[source]

Load pre-trained real-bogus model.

Parameters:
model_filestr, optional

Path to model file (.h5 or SavedModel directory). If None, loads default model from ~/.stdpipe/models/

verbosebool, optional

Print loading information. Default: False

Returns:
modelkeras.Model

Loaded Keras model

stdpipe.realbogus.save_realbogus_model(model, model_file=None, verbose=False)[source]

Save trained real-bogus model.

Parameters:
modelkeras.Model

Trained model

model_filestr, optional

Output path. If None, saves to ~/.stdpipe/models/realbogus_default.h5

verbosebool, optional

Print saving information. Default: False

stdpipe.realbogus.classify_realbogus(obj, image, model=None, model_file=None, bg=None, err=None, mask=None, fwhm=None, asinh_softening=None, threshold=0.5, add_score=True, flag_bogus=True, batch_size=128, verbose=False)[source]

Classify detected objects as real or bogus using CNN.

This is the main entry point for real-bogus classification.

Parameters:
objastropy.table.Table

Object catalog with ‘x’ and ‘y’ columns (from photometry.get_objects_*)

imagendarray

Science image

modelkeras.Model, optional

Pre-loaded model. If None, loads from model_file.

model_filestr, optional

Path to model file. If None, uses default model.

bgndarray or float, optional

Background map or scalar value

errndarray or float, optional

Error/noise map or scalar value

maskndarray, optional

Boolean mask (True = masked pixels)

cutout sizederived

Cutout size is inferred from the model input shape. If the model has dynamic spatial dimensions, defaults to 31x31 (radius 15).

fwhmfloat, optional

Image FWHM. If None, estimated from catalog.

asinh_softeningfloat, optional

Asinh softening in units of background sigma. If None, uses DEFAULT_ASINH_SOFTENING_SIGMA.

thresholdfloat, optional

Classification threshold (0-1). Objects with score > threshold are real. Default: 0.5

add_scorebool, optional

Add ‘rb_score’ column to output catalog. Default: True

flag_bogusbool, optional

Set flags=0x1000 for bogus objects and filter them out. Default: True

batch_sizeint, optional

Batch size for inference. Default: 128

verbosebool or callable, optional

Print progress. Can be callable for custom logging. Default: False

Returns:
obj_filteredastropy.table.Table

Filtered catalog with real sources only (if flag_bogus=True) or full catalog with ‘rb_score’ column (if flag_bogus=False)

Examples

>>> from stdpipe import photometry, realbogus
>>> obj = photometry.get_objects_sep(image, thresh=3.0)
>>> obj_clean = realbogus.classify_realbogus(obj, image)
>>> print(f"Kept {len(obj_clean)}/{len(obj)} objects")
stdpipe.realbogus.train_realbogus_classifier(training_data=None, n_simulated=1000, image_size=(2048, 2048), fwhm_range=(1.5, 8.0), real_source_types=['star'], validation_split=0.15, model=None, model_file=None, epochs=50, batch_size=64, class_weight='balanced', callbacks=None, verbose=True)[source]

Train real-bogus classifier on simulated or real data.

Parameters:
training_datatuple or dict, optional

Pre-generated training data: (X, y, fwhm_features) tuple or dict with ‘X’, ‘y’, ‘fwhm’ keys. If None, generates simulated data using simulation.generate_realbogus_training_data().

n_simulatedint, optional

Number of simulated images to generate (if training_data=None). Default: 1000

image_sizetuple, optional

Size of simulated images (width, height). Default: (2048, 2048)

fwhm_rangetuple, optional

Range of FWHM values for simulated images. Default: (1.5, 8.0)

real_source_typeslist, optional

List of source types to consider ‘real’ (if training_data=None). Default: [‘star’] treats only stars as real and galaxies as bogus. Use [‘star’, ‘galaxy’] to train a classifier that treats both as real.

validation_splitfloat, optional

Fraction of data for validation. Default: 0.15

modelkeras.Model, optional

Model to train. If None, creates new model.

model_filestr, optional

Path to save trained model. Default: ~/.stdpipe/models/realbogus_default.h5

epochsint, optional

Training epochs. Default: 50

batch_sizeint, optional

Batch size. Default: 64

class_weightstr or dict, optional

Class weights for imbalanced data. ‘balanced’ or dict {0: w0, 1: w1}. Default: ‘balanced’

callbackslist, optional

Keras callbacks (e.g., early stopping, checkpoints)

verbosebool, optional

Print training progress. Default: True

Returns:
modelkeras.Model

Trained model

historykeras.callbacks.History

Training history

Examples

>>> from stdpipe import realbogus
>>> # Train on simulated data (stars and galaxies as real)
>>> model, history = realbogus.train_realbogus_classifier(
...     n_simulated=500,
...     epochs=30,
...     verbose=True
... )
>>> # Train stars-only classifier (galaxies as bogus)
>>> model, history = realbogus.train_realbogus_classifier(
...     n_simulated=500,
...     real_source_types=['star'],
...     epochs=30,
...     verbose=True
... )
>>> # Or use pre-generated data
>>> data = realbogus.generate_training_data(...)
>>> model, history = realbogus.train_realbogus_classifier(
...     training_data=(data['X'], data['y']),
...     epochs=30
... )