stdpipe.realbogus module¶
Real-Bogus Classifier for Astronomical Object Detection
This module provides a CNN-based classifier to distinguish real astronomical sources (stars, galaxies) from artifacts (cosmic rays, hot pixels, satellite trails) in detected object catalogs.
Key Features: - FWHM-invariant: Hybrid downscaling to canonical PSF size (no auxiliary FWHM input) - Brightness-invariant: Peak normalization generalizes to any flux level - Pure morphology: Classification based solely on source shape from 2-channel images - 2-channel input: background-subtracted (linear), asinh-scaled (dynamic range compression) - Lightweight 5-layer CNN (~100k parameters) - Batch processing for efficient inference - Optional TensorFlow dependency
- Usage:
from stdpipe import photometry, realbogus
# Detect objects obj = photometry.get_objects_sep(image, thresh=3.0)
# Classify and filter obj_clean = realbogus.classify_realbogus(obj, image, threshold=0.5)
# Or add scores without filtering obj = realbogus.classify_realbogus(obj, image, add_score=True, flag_bogus=False) print(obj[‘rb_score’])
Author: STDPipe Contributors
- stdpipe.realbogus.create_realbogus_model(input_shape=(31, 31, 2), filters=(32, 64, 128), dense_units=64, dropout_rate=0.5)[source]¶
Create CNN architecture for real-bogus classification.
- Architecture:
3-5 convolutional layers with batch normalization
Global average pooling (handles variable input sizes)
Dense layer with dropout
Sigmoid output (binary classification)
- Design Philosophy:
FWHM-invariant: Images downscaled to canonical FWHM, no auxiliary FWHM input needed
Brightness-invariant: Peak normalization allows generalization to any flux level
Pure morphology: Classification based solely on source shape
- Input Channels:
Channel 0: Background-subtracted (linear scale), peak-normalized
Channel 1: Asinh-scaled background-subtracted, peak-normalized
- Parameters:
- input_shapetuple, optional
Input shape (height, width, channels). Default: (31, 31, 2) Height/width can be None for variable-size inputs.
- filterstuple, optional
Number of filters in each conv layer. Default: (32, 64, 128)
- dense_unitsint, optional
Units in dense layer. Default: 64
- dropout_ratefloat, optional
Dropout rate for regularization. Default: 0.5
- Returns:
- modelkeras.Model
Compiled Keras model ready for training
- stdpipe.realbogus.preprocess_cutout(cutout_sci, cutout_bg=None, cutout_err=None, fwhm=None, target_fwhm=3.0, target_size=31, downscale_threshold=1.5, normalize=True, asinh_softening=None)[source]¶
Preprocess cutout for CNN input.
- Steps:
Optional scaling (downscale or upscale) to canonical FWHM
Create 2-channel input (background-subtracted linear, asinh-scaled)
Peak normalization (each channel normalized by its own peak value)
Pad/crop to target size
- FWHM Scaling Strategy (Symmetric):
Downscaling (FWHM > target_fwhm × threshold): Integer block averaging
No scaling (target_fwhm / threshold ≤ FWHM ≤ target_fwhm × threshold): Keep as-is
Upscaling (FWHM < target_fwhm / threshold): Integer pixel replication
Default: target_fwhm=3.0, threshold=1.5 → Downscale if FWHM > 4.5, upscale if FWHM < 2.0, else unchanged
This ensures all PSFs normalized to approximately the same size regardless of sharpness, eliminating FWHM as a confounding variable.
- Channel Design:
Channel 0: Background-subtracted (linear scale), peak-normalized
Channel 1: Asinh-scaled background-subtracted, peak-normalized
Peak normalization makes the representation brightness-invariant: all sources (faint to extremely bright) are scaled to [-1, 1] range based on their peak value. This allows the CNN to learn pure morphological features that generalize to ANY brightness level, including sources far brighter than the training set.
The asinh channel complements the linear channel by providing compressed dynamic range information useful for distinguishing extended vs. compact sources.
- Parameters:
- cutout_scindarray
Science image cutout (assumed to be background-subtracted if cutout_bg is None)
- cutout_bgndarray, optional
Background cutout (or scalar value). If None, estimated from cutout edges.
- cutout_errndarray or float, optional
Error/noise cutout (or scalar value). Used to estimate the noise level (sigma) for asinh softening. Only the median value is used.
- fwhmfloat, optional
Image FWHM in pixels. If provided, cutout will be downscaled to target_fwhm.
- target_fwhmfloat, optional
Target FWHM for downscaling normalization. Default: 3.0
- target_sizeint, optional
Target cutout size (square). Default: 31
- downscale_thresholdfloat, optional
Only downscale if fwhm/target_fwhm > threshold. Default: 1.5
- normalizebool, optional
Apply peak normalization to each channel (scales to [-1, 1] range). Default: True. This makes the representation brightness-invariant.
- asinh_softeningfloat, optional
Asinh softening in units of background sigma. If None, uses DEFAULT_ASINH_SOFTENING_SIGMA. Actual softening is (asinh_softening * sigma), where sigma is estimated from cutout_err.
- Returns:
- preprocessedndarray
Preprocessed cutout (target_size, target_size, 2)
- scale_factorfloat
Applied scale factor (for diagnostics)
- stdpipe.realbogus.extract_cutouts(obj, image, bg=None, err=None, mask=None, radius=15, fwhm=None, target_fwhm=3.0, asinh_softening=None, verbose=False)[source]¶
Extract and preprocess cutouts for all objects.
- Parameters:
- objastropy.table.Table
Object catalog with ‘x’ and ‘y’ columns
- imagendarray
Science image
- bgndarray or float, optional
Background map or scalar value
- errndarray or float, optional
Error/noise map or scalar value
- maskndarray, optional
Boolean mask (True = masked)
- radiusint, optional
Cutout radius in pixels. Default: 15 (31x31 cutouts)
- fwhmfloat, optional
Image FWHM. If None, estimated from object catalog.
- target_fwhmfloat, optional
Target FWHM for downscaling. Default: 3.0
- asinh_softeningfloat, optional
Asinh softening in units of background sigma. If None, uses DEFAULT_ASINH_SOFTENING_SIGMA.
- verbosebool, optional
Print progress. Default: False
- Returns:
- cutoutsndarray
Array of preprocessed cutouts (N, 2*radius+1, 2*radius+1, 2)
- valid_indicesndarray
Indices of successfully extracted cutouts
- stdpipe.realbogus.load_realbogus_model(model_file=None, verbose=False)[source]¶
Load pre-trained real-bogus model.
- Parameters:
- model_filestr, optional
Path to model file (.h5 or SavedModel directory). If None, loads default model from ~/.stdpipe/models/
- verbosebool, optional
Print loading information. Default: False
- Returns:
- modelkeras.Model
Loaded Keras model
- stdpipe.realbogus.save_realbogus_model(model, model_file=None, verbose=False)[source]¶
Save trained real-bogus model.
- Parameters:
- modelkeras.Model
Trained model
- model_filestr, optional
Output path. If None, saves to ~/.stdpipe/models/realbogus_default.h5
- verbosebool, optional
Print saving information. Default: False
- stdpipe.realbogus.classify_realbogus(obj, image, model=None, model_file=None, bg=None, err=None, mask=None, fwhm=None, asinh_softening=None, threshold=0.5, add_score=True, flag_bogus=True, batch_size=128, verbose=False)[source]¶
Classify detected objects as real or bogus using CNN.
This is the main entry point for real-bogus classification.
- Parameters:
- objastropy.table.Table
Object catalog with ‘x’ and ‘y’ columns (from photometry.get_objects_*)
- imagendarray
Science image
- modelkeras.Model, optional
Pre-loaded model. If None, loads from model_file.
- model_filestr, optional
Path to model file. If None, uses default model.
- bgndarray or float, optional
Background map or scalar value
- errndarray or float, optional
Error/noise map or scalar value
- maskndarray, optional
Boolean mask (True = masked pixels)
- cutout sizederived
Cutout size is inferred from the model input shape. If the model has dynamic spatial dimensions, defaults to 31x31 (radius 15).
- fwhmfloat, optional
Image FWHM. If None, estimated from catalog.
- asinh_softeningfloat, optional
Asinh softening in units of background sigma. If None, uses DEFAULT_ASINH_SOFTENING_SIGMA.
- thresholdfloat, optional
Classification threshold (0-1). Objects with score > threshold are real. Default: 0.5
- add_scorebool, optional
Add ‘rb_score’ column to output catalog. Default: True
- flag_bogusbool, optional
Set flags=0x1000 for bogus objects and filter them out. Default: True
- batch_sizeint, optional
Batch size for inference. Default: 128
- verbosebool or callable, optional
Print progress. Can be callable for custom logging. Default: False
- Returns:
- obj_filteredastropy.table.Table
Filtered catalog with real sources only (if flag_bogus=True) or full catalog with ‘rb_score’ column (if flag_bogus=False)
Examples
>>> from stdpipe import photometry, realbogus >>> obj = photometry.get_objects_sep(image, thresh=3.0) >>> obj_clean = realbogus.classify_realbogus(obj, image) >>> print(f"Kept {len(obj_clean)}/{len(obj)} objects")
- stdpipe.realbogus.train_realbogus_classifier(training_data=None, n_simulated=1000, image_size=(2048, 2048), fwhm_range=(1.5, 8.0), real_source_types=['star'], validation_split=0.15, model=None, model_file=None, epochs=50, batch_size=64, class_weight='balanced', callbacks=None, verbose=True)[source]¶
Train real-bogus classifier on simulated or real data.
- Parameters:
- training_datatuple or dict, optional
Pre-generated training data: (X, y, fwhm_features) tuple or dict with ‘X’, ‘y’, ‘fwhm’ keys. If None, generates simulated data using simulation.generate_realbogus_training_data().
- n_simulatedint, optional
Number of simulated images to generate (if training_data=None). Default: 1000
- image_sizetuple, optional
Size of simulated images (width, height). Default: (2048, 2048)
- fwhm_rangetuple, optional
Range of FWHM values for simulated images. Default: (1.5, 8.0)
- real_source_typeslist, optional
List of source types to consider ‘real’ (if training_data=None). Default: [‘star’] treats only stars as real and galaxies as bogus. Use [‘star’, ‘galaxy’] to train a classifier that treats both as real.
- validation_splitfloat, optional
Fraction of data for validation. Default: 0.15
- modelkeras.Model, optional
Model to train. If None, creates new model.
- model_filestr, optional
Path to save trained model. Default: ~/.stdpipe/models/realbogus_default.h5
- epochsint, optional
Training epochs. Default: 50
- batch_sizeint, optional
Batch size. Default: 64
- class_weightstr or dict, optional
Class weights for imbalanced data. ‘balanced’ or dict {0: w0, 1: w1}. Default: ‘balanced’
- callbackslist, optional
Keras callbacks (e.g., early stopping, checkpoints)
- verbosebool, optional
Print training progress. Default: True
- Returns:
- modelkeras.Model
Trained model
- historykeras.callbacks.History
Training history
Examples
>>> from stdpipe import realbogus >>> # Train on simulated data (stars and galaxies as real) >>> model, history = realbogus.train_realbogus_classifier( ... n_simulated=500, ... epochs=30, ... verbose=True ... ) >>> # Train stars-only classifier (galaxies as bogus) >>> model, history = realbogus.train_realbogus_classifier( ... n_simulated=500, ... real_source_types=['star'], ... epochs=30, ... verbose=True ... ) >>> # Or use pre-generated data >>> data = realbogus.generate_training_data(...) >>> model, history = realbogus.train_realbogus_classifier( ... training_data=(data['X'], data['y']), ... epochs=30 ... )