API

You may find more practical guidance in example notebooks: nb.

HODDIES.hod based class

Base HOD class

class HODDIES.hod.HOD(param_file=None, args=None, hcat_file=None, path_to_abacus_sim=None, read_pinnochio=None, subsample=None, **kwargs)

Bases: object

Class with tools to generate HOD mock catalogs and plotting functions

Initialize HOD.

Parameters:
  • param_file (str, default=None) – Input parameter file to initialize the HOD class. If None, the default parameter file ‘default_HOD_parameters.yaml’ is used.

  • args (dict, default=None) – Optional Input dictonary to initialize the HOD class. Carefull args is prefered against param_file.

  • hcat (dict, ndarray, Catalog, default=None) – Optional Input halo catalog. The halo catalog must have at least these columns names: [‘x’, ‘y’, ‘z’, ‘vx’, ‘vy’, ‘vz’,’Mh’, ‘Rh’, ‘Rs’, ‘c’, ‘Vrms’, ‘halo_id’]. ‘x’, ‘y’, ‘z’, and ‘vx’, ‘vy’, ‘vz’ are halo positions and velocities. Mh and Rh are halo mass and radius (most of the time consider as Mvir and Rvir). ‘c’ is the halo concentration, ‘Vrms’ is the velocity dispersion of the halo particles (used for NFW satellites). ‘halo_id’ is a unique integer to identify each halo.

  • boxsize (int, default = None) – Simulation box size, to be set if hcat is provide. Prefered if boxsize value is also provided in the parameter file.

  • path_to_abacus_sim (str, default=None) – Optional, Path to Abacus simulation directory. In this case, it automatically load the Abacus box/LC at the corresponding redshift snapshots and initialze boxsize and cosmology.

  • read_pinnochio (bool, default=None) – Optional, load Pinnochio simulation catalog. Need to provide the path in the input parameter file.

  • subsample (dict, ndarray, Catalog, default=None) – Optional, Not yet ready! Input of particles or subhalo catalog. The subsample catalog must have at least these columns names: [‘x’, ‘y’, ‘z’, ‘vx’, ‘vy’, ‘vz’, ‘halo_id’].

  • kwargs (dict) – Optional arguments that can be added that will replace the one provided in the parameter file.

HOD_plot(tracer=None, fig=None)

Plot the HOD (Halo Occupation Distribution) for a given tracer or set of tracers.

This function generates a plot showing the Halo Occupation Distribution (HOD) for specified tracers. It uses different colors for each tracer, and can handle multiple tracers at once. If no tracer is specified, the function uses the default tracers defined in the arguments.

Parameters:
  • tracer (str or list of str, optional) – The name(s) of the tracer(s) for which to plot the HOD. If None, it uses the tracers defined in self.args[‘tracers’]. If a single tracer name is provided, it will be converted into a list.

  • fig (matplotlib.figure.Figure, optional) – An existing matplotlib figure object to which the plot will be added. If None, a new figure will be created.

Returns:

fig – The matplotlib figure object containing the plotted HOD.

Return type:

matplotlib.figure.Figure

Notes

  • The function uses predefined colors for each tracer: ‘ELG’ (deepskyblue), ‘QSO’ (seagreen), and ‘LRG’ (red).

  • The function calls __init_hod_param to initialize the HOD parameters and then uses plot_HOD to generate the plots.

  • If no fig is provided, a new figure is created and returned.

  • The plot is not displayed until plt.show() is called, which happens automatically after all tracers are plotted.

Example

HOD_plot(tracer=’ELG’) # Plot HOD for the ‘ELG’ tracer. HOD_plot(tracer=[‘ELG’, ‘QSO’]) # Plot HOD for both ‘ELG’ and ‘QSO’ tracers.

calc_env_factor(cellsize=5, resampler='cic')

Compute density around each halos on a mesh. Based on pypower CatalogMesh https://pypower.readthedocs.io/en/latest/api/api.html#pypower.mesh.CatalogMesh

Parameters:
  • cellsize (array, float, default=5) – Physical size of mesh cells.

  • resampler (string, ResampleWindow, default='tsc') – Resampler used to assign particles to the mesh. Choices are [‘ngp’, ‘cic’, ‘tcs’, ‘pcs’].

Returns:

Add column named env in the halo catalog self.hcat

Return type:

None

compute_training(nreal=20, training_points=None, start_point=0, verbose=False)

Generate and save training data for HOD model fitting by sampling parameter sets (training points), generating mock catalogs, and computing clustering statistics.

This method automates the process of training data generation for halo occupation distribution (HOD) modeling. It evaluates HOD parameter samples, generates mock galaxy catalogs, computes desired clustering statistics (e.g., wp, xi), and stores the results for each training point on disk.

Parameters:
  • nreal (int, optional) – Number of mock realizations to generate for each training point. Default is 20.

  • training_points (structured array or None, optional) – Array of training points with named fields corresponding to HOD parameters. If None, training points will be generated using genereate_training_points.

  • start_point (int, optional) – Starting index for training point numbering (useful when continuing interrupted runs). Default is 0.

  • verbose (bool, optional) – Whether to print progress messages during execution. Default is False.

Raises:

ValueError – If the defined tracers in self.args do not match those defined in the priors, or if the number of training parameters doesn’t match expectations.

Notes

  • The method checks consistency between tracers and prior parameter definitions.

  • For each training point, the HOD parameters are injected into the model and multiple

mock realizations are generated. - The 2PCF (xi) and/or wp (projected correlation function) are computed per realization, depending on fit_type. - Each result is saved as a .npy file with a name format based on sampling type and training point index.

Files Saved

  • One .npy file per training point is saved to path_to_training_point with structure:
    {

    tracer_1: <updated HOD params dict>, tracer_2: …, ‘wp’: […], ‘xi’: […], ‘hod_fit_param’: <parameter values used>

    }

Example

self.compute_training(nreal=10, verbose=True)

static downsample_mock_cat(cat, ds_fac=0.1, mask=None)

Downsample a mock catalog by randomly selecting a subset of galaxies based on the given downsampling factor.

This method randomly selects a subset of galaxies from the input catalog based on the downsampling factor ds_fac. If a mask is provided, it is used to select galaxies; otherwise, a random selection is made using ds_fac.

Parameters:
  • cat (dict) – The mock catalog to downsample. The catalog should be a dictionary containing the galaxy data, such as galaxy positions (x, y, z), and other associated properties.

  • ds_fac (float, optional) – The downsampling factor, representing the fraction of galaxies to retain in the downsampled catalog. A value between 0 and 1. Default is 0.1 (10% of the galaxies will be selected).

  • mask (array-like, optional) – A boolean mask array that can be used to specify which galaxies to retain in the downsampled catalog. If not provided, a random mask is generated based on the ds_fac downsampling factor.

Returns:

A downsampled catalog containing a subset of the galaxies from the original catalog, based on the mask.

Return type:

dict

Notes

  • If mask is provided, it should have the same length as the catalog (the number of galaxies).

  • The ds_fac parameter defines the probability for each galaxy to be selected; for example, ds_fac=0.1 means each galaxy has a 10% chance of being selected.

  • The downsampling is done independently for each galaxy.

Example

# Downsample a catalog to 10% of its original size downsampled_cat = downsample_mock_cat(cat, ds_fac=0.1)

# Downsample using a custom mask mask = np.array([True, False, True, True, False]) downsampled_cat = downsample_mock_cat(cat, mask=mask)

get_2PCF(cats, tracers=None, ells=None, R1R2=None, verbose=True)

Compute the two-point correlation function (2PCF) for a given mock catalog in a cubic box.

This function calculates the two-point correlation function (2PCF) for specified galaxy tracers in a mock catalog. It computes the correlation for multiple tracers if provided, and returns the 2PCF and its separation distance for each tracer.

Parameters:
  • cats (dict) – A dictionary of mock catalogs where the keys are the names of the tracers and the values are the corresponding catalogs (e.g., ‘LRG’, ‘ELG’).

  • tracers (list or str, optional) – A list of tracer names (keys in cats) for which the 2PCF should be computed. If None, all tracers in self.args[‘tracers’] are considered. Defaults to None.

  • ells (tuple of int, optional) – Multipoles to project onto. If None, ells in self.args[‘2PCF_settings’][‘multipole_index’] are considered. Default to None.

  • R1R2 (tuple or None, optional) – A tuple defining a range for R1 and R2 for the 2PCF computation. If None, the default values will be used. Defaults to None.

  • verbose (bool, optional) – If True, prints progress and computation time for each tracer. Defaults to True.

Returns:

  • s_all (list) – A list of separation distances corresponding to the computed 2PCF for each tracer.

  • xi_all (list) – A list of the two-point correlation function (2PCF) values for each tracer.

Notes

  • The function uses apply_rsd to account for redshift space distortions (RSD) if enabled and Cosmology set.

  • The separation distances s and the correlation values xi are calculated using the

compute_2PCF function, and the results are stored for each tracer. - The results are returned as lists (s_all and xi_all) when multiple tracers are provided. - The function supports a log-scale binning option for radial bins if bin_logscale is True. - The output is either the 2PCF for a single tracer (if only one tracer is given) or for all tracers provided in the list.

Example

s, xi = get_2PCF(cats, tracers=[‘LRG’, ‘ELG’])

get_cross2PCF(cats, tracers, ells=None, R1R2=None, verbose=True)

Compute the two-point correlation function (2PCF) multipoles for a given mock catalog in a cubic box.

This function computes the two-point correlation function (2PCF) and cross-correlations multipoles for pairs of tracers in the mock catalogs. It calculates the 2PCF for all combinations of tracers provided, handling redshift space distortions (RSD) if enabled.

Parameters:
  • cats (dict) – A dictionary of mock catalogs where each key is a tracer and its corresponding catalog is the value (e.g., ‘LRG’, ‘ELG’).

  • tracers (list of str) – A list of tracer names (keys in cats) for which the cross 2PCF should be computed. The function computes the 2PCF for all pairs of tracers in the list.

  • ells (tuple of int, optional) – Multipoles to project onto. If None, ells in self.args[‘2PCF_settings’][‘multipole_index’] are considered. Default to None.

  • R1R2 (tuple or None, optional) – A tuple defining a range for R1 and R2 for the 2PCF computation. If None, the default values will be used. Defaults to None.

  • verbose (bool, optional) – If True, prints progress and computation time for each pair of tracers. Defaults to True.

Returns:

res_dict – A dictionary where the keys are the concatenated names of tracer pairs (e.g., ‘LRG_ELG’) and the values are the average separations and the corresponding two-point correlation functions (2PCF) for each pair.

Return type:

dict

Notes

  • The function computes the cross-correlation 2PCF for all unique pairs of tracers from the input list.

  • If redshift space distortions (RSD) are enabled, the positions of galaxies in the catalogs are adjusted accordingly.

  • The results are stored in res_dict with keys in the format ‘tracer1_tracer2’, where each value is the 2PCF

corresponding to the pair of tracers. - The function uses compute_2PCF to calculate the 2PCF for each tracer pair.

Example

res = get_cross2PCF(cats, tracers=[‘LRG’, ‘ELG’, ‘QSO’])

get_crosswp(cats, tracers, R1R2=None, verbose=True)

Compute the projected correlation and cross-correlation functions (wp) for a given mock catalog in a cubic box.

This function computes the projected two-point correlation and cross-correlation function (wp) for pairs of tracers in the mock catalogs. It calculates wp for all combinations of tracers provided, handling redshift space distortions (RSD) if enabled.

Parameters:
  • cats (dict) – A dictionary of mock catalogs where each key is a tracer and its corresponding catalog is the value (e.g., ‘LRG’, ‘ELG’).

  • tracers (list of str) – A list of tracer names (keys in cats) for which the cross wp should be computed. The function computes the wp for all pairs of tracers in the list.

  • R1R2 (tuple or None, optional) – A tuple defining a range for R1 and R2 for the wp computation. If None, the default values will be used. Defaults to None.

  • verbose (bool, optional) – If True, prints progress and computation time for each pair of tracers. Defaults to True.

Returns:

res_dict – A dictionary where the keys are the concatenated names of tracer pairs (e.g., ‘LRG_ELG’) and the values are the corresponding projected two-point correlation functions (wp) for each pair.

Return type:

dict

Notes

  • The function computes the cross-correlation wp for all unique pairs of tracers from the input list.

  • If redshift space distortions (RSD) are enabled, the positions of galaxies in the catalogs are adjusted accordingly.

  • The results are stored in res_dict with keys in the format ‘tracer1_tracer2’, where each value is the wp

corresponding to the pair of tracers. - The function uses compute_wp to calculate the wp for each tracer pair.

Example

res = get_crosswp(cats, tracers=[‘LRG’, ‘ELG’, ‘QSO’])

get_ds_fac(tracer, verbose=False)

Calculate the density scaling factor based on the specified tracer and density value.

Parameters:
  • tracer (str) – The key identifying the specific tracer (e.g., galaxy type) for which the density scaling factor is calculated.

  • verbose (bool, optional) – If True, prints information about the density and scaling factor. Default is False.

Returns:

The density scaling factor. If no density is set, returns 1.

Return type:

float

Notes

This function uses the density specified in self.args[tracer][‘density’] to calculate the density scaling factor. If the density is specified as a float, it multiplies it by the cube of the box size and divides by the number of galaxies (obtained using the ngal method). If no density is specified, the function defaults to returning 1.

Example

If self.args[tracer][‘density’] is set to 0.01 and self.boxsize = 100: - The density scaling factor will be calculated as 0.01 * 100^3 / self.ngal(tracer)[0]. If no density is specified, the function simply returns 1.

If verbose is set to True, the following message will be printed: - “Set density to 0.01 gal/Mpc/h”.

get_wp(cats, tracers=None, R1R2=None, verbose=True)

Compute the projected two-point correlation function (wp) for a given mock catalog in a cubic box.

This function calculates the projected two-point correlation function (wp) for specified galaxy tracers in a mock catalog. It can compute wp for multiple tracers, returning the results for each.

Parameters:
  • cats (dict) – A dictionary of mock catalogs where the keys are the names of the tracers and the values are the corresponding catalogs (e.g., ‘LRG’, ‘ELG’).

  • tracers (list or str, optional) – A list of tracer names (keys in cats) for which the wp should be computed. If None, all tracers in self.args[‘tracers’] are considered. Defaults to None.

  • R1R2 (tuple or None, optional) – A tuple defining a range for R1 and R2 for the wp computation. If None, the default values will be used. Defaults to None.

  • verbose (bool, optional) – If True, prints progress and computation time for each tracer. Defaults to True.

Returns:

  • rp_all (list) – A list of projected separation distances corresponding to the computed wp for each tracer.

  • wp_all (list) – A list of the projected two-point correlation function (wp) values for each tracer.

Notes

  • The function uses apply_rsd to account for redshift space distortions (RSD) if enabled and Cosmology set.

  • The projected separation distances rp and the correlation values wp are calculated using the

compute_wp function, and the results are stored for each tracer. - The results are returned as lists (rp_all and wp_all) when multiple tracers are provided. - The function supports a log-scale binning option for radial bins if bin_logscale is True. - The output is either the wp for a single tracer (if only one tracer is given) or for all tracers provided in the list.

Example

rp, wp = get_wp(cats, tracers=[‘LRG’, ‘ELG’])

init_cosmology()

Initialize the cosmology model using Cosmoprimo.

This method attempts to set the self.cosmo attribute using cosmological parameters defined in the input configuration (self.args[‘cosmo’]). It supports loading:

  • AbacusSummit cosmologies if the simulation name (sim_name) is present

  • Custom cosmology from provided parameters if no Abacus name is specified

It uses the cosmoprimo package to construct the cosmology and raises a warning if the library is not available.

Returns:

Sets the self.cosmo attribute to an instance of cosmoprimo.Cosmology or leaves it as None if the initialization fails.

Return type:

None

Raises:

ImportWarning – If cosmoprimo is not installed, a warning is issued and no cosmology is set.

Notes

  • Required fields in self.args[‘cosmo’] are passed as keyword arguments to

cosmoprimo.fiducial.Cosmology. - For AbacusSummit cosmologies, the cosmology identifier is extracted from the sim_name string. - Cosmology is used later for computing RSD and distance-based statistics.

init_elg_sat_for_lrg(tracer, sat_cat, fix_seed=None)

Draft function to assign ELG satellites around LRG central galaxies in a NFW profile if they are in the same halo.

This function generates the positions and velocities of Emission Line Galaxies (ELGs) around Luminous Red Galaxies (LRGs) using a Navarro–Frenk–White (NFW) profile. It ensures that the ELG satellites are placed according to the halo’s mass distribution, and their velocities are assigned based on the chosen model (NFW or random normal).

Parameters:
  • tracer (str) – The name of the tracer (e.g., ‘LRG’) used to determine the velocity model and other simulation parameters for satellite galaxies.

  • sat_cat (Catalog) – A catalog of the satellite galaxies, which includes properties such as position, velocity, concentration, mass, and radius. The function will modify the positions and velocities of these satellites.

  • fix_seed (int, optional) – A seed for random number generation, ensuring reproducibility of the mock catalog. Defaults to None, meaning that a random seed will be generated internally.

Returns:

The input satellite catalog (sat_cat) is updated with new positions, velocities, and a Central flag (set to 0 for satellites).

Return type:

Catalog

Notes

  • The position of satellites is computed on a spherical shell using getPointsOnSphere_jit.

  • The velocity of the satellites can be generated in two ways:
    • ‘NFW’: Satellites are assigned velocities based on the NFW profile.

    • ‘rd_normal’: Satellites are assigned random velocities.

  • If the Vrms column is available in sat_cat, it is used to adjust the velocity distribution.

  • This function directly modifies the sat_cat and returns the updated catalog.

make_mock_cat(tracers=None, fix_seed=None, verbose=True)

Generate HOD mock catalogs.

This method creates mock galaxy catalogs based on the Halo Occupation Distribution (HOD) model for each specified tracer. It computes the central and satellite galaxies, assigns them to halos, and optionally includes assembly bias effects, conformity bias, and uses particle-level data for satellite galaxy positions. It handles multiple tracers and provides options for reproducibility and verbose output.

Parameters:
  • tracers (list or str, optional) – Name(s) of the galaxy tracers (e.g., ‘LRG’, ‘ELG’) to include in the mock catalog. If None, all tracers defined in self.tracers are considered. Defaults to None.

  • fix_seed (int, optional) – Fix the seed for reproducibility. This is useful for ensuring consistent results when using a fixed number of threads (nthreads). Defaults to None.

  • verbose (bool, optional) – If True, the function prints progress messages during execution. Defaults to True.

Returns:

final_cat – A dictionary containing mock catalogs for each tracer specified. Each catalog is represented by a Catalog object, which contains the generated galaxy data (centrals and satellites) for the corresponding tracer.

Return type:

dict

Notes

  • The method relies on HOD models defined for each tracer in self.args[tracer][‘HOD_model’] and self.args[tracer][‘sat_HOD_model’].

  • If self.args[‘assembly_bias’] is enabled, assembly bias columns will be computed and included in the mock catalog.

  • The fix_seed parameter ensures that the mock catalogs are generated in a reproducible manner, but it requires consistent

thread configurations (self.args[‘nthreads’]). - When tracers includes both ‘ELG’ and ‘LRG’, the method handles the case where both tracers share the same halo by placing one LRG at the center and positioning other galaxies (like ELGs) based on the NFW profile.

Example

To generate mock catalogs for both ‘LRG’ and ‘ELG’ tracers with fixed seed for reproducibility:

final_cat = mock_catalog.make_mock_cat(tracers=[‘LRG’, ‘ELG’], fix_seed=42)

ngal(tracer, verbose=False)

Return the number of galaxy and the satelitte fraction

Parameters:
  • tracer (str) – Name of the galaxy tracer in self.tracers

  • (bool (verbose) –

  • optional) (Defaults to False.) –

Returns:

  • ngal (float) – Total number of galaxies expected

  • fsat (float) – Expected fraction of satellite galaxy (n_sat/ngal)

plot_HMF(cats, show_sat=False, range=(10.8, 15), tracer=None, inital_HMF=None)

Plot the Halo Mass Function (HMF) for a given mock catalog.

This function generates a plot of the Halo Mass Function (HMF) using the halo mass values (log10_Mh) from the provided catalog(s). The plot can optionally include histograms for central and satellite galaxies, and can display an initial HMF for comparison.

Parameters:
  • cats (dict) – A dictionary of catalog data for different tracers. Each catalog should contain a log10_Mh array representing the halo mass in base-10 logarithmic form, and a Central array indicating whether the galaxy is central (1) or satellite (0).

  • show_sat (bool, optional) – Whether to show the histogram for satellite galaxies separately. Default is False.

  • range (tuple, optional) – The range for the satellite galaxy histogram. Default is (10.8, 15).

  • tracer (str or list of str, optional) – The tracer(s) for which to plot the HMF. If None, it uses the default tracers defined in self.args[‘tracers’]. If a single tracer is provided, it will be converted into a list.

  • inital_HMF (bool, optional) – Whether to plot the initial Halo Mass Function (HMF) for comparison. Default is None (does not plot the initial HMF).

Returns:

The function generates and displays the plot but does not return any value.

Return type:

None

Notes

  • The function uses different colors for each tracer: ‘ELG’ (deepskyblue), ‘QSO’ (seagreen), and ‘LRG’ (red).

  • For satellite galaxies, the histograms are plotted with different line styles ( for centrals, : for satellites).

  • The initial HMF (if provided) is plotted using a gray color.

  • The y-axis is displayed on a logarithmic scale, and the x-axis represents the logarithm of the halo mass in solar masses.

Example

plot_HMF(cats, show_sat=True, range=(10.8, 15), tracer=’ELG’) # Plot HMF for the ‘ELG’ tracer with satellite galaxies. plot_HMF(cats, inital_HMF=True) # Plot HMF with the initial HMF included.

read_training(data, inv_cov2)

Loads and processes HOD training samples, computes chi² statistics for each sample against a target dataset, and returns a structured array for Gaussian Process training.

Parameters:
  • data (array_like) – Observed data vector (e.g., wp or xi measurements) to compare against model predictions.

  • inv_cov2 (ndarray) – Inverse of the covariance matrix used in chi² computation.

Returns:

trainning_set – Structured array where each row corresponds to a training point, including: - HOD parameters - Mean chi² value for the realizations - Uncertainty on chi² (standard deviation / sqrt(N_real))

Return type:

structured ndarray

Notes

  • Reads all training .npy files from self.args[‘fit_param’][‘path_to_training_point’] with the given sampling type.

  • Applies covariance matrix adjustments if requested.

  • Supports either ‘wp’, ‘xi’, or both statistics depending on self.args[‘fit_param’][‘fit_type’].

  • Combines model realizations by flattening tracer combinations and statistics into a single vector.

  • Computes chi² using the compute_chi2() utility, which is assumed to match the data/model shape.

Example

>>> train_set = model.read_training(observed_data, inv_cov2)
run_fit(data_arr, inv_Cov2, training_point, resume_fit=False, verbose=True)

Execute the Gaussian Process MCMC fitting routine for HOD parameter inference.

This method performs iterative Gaussian Process-driven MCMC sampling to explore the Halo Occupation Distribution (HOD) parameter space, fitting mock catalog outputs to observed clustering statistics such as the 2-point correlation function.

For methodological details, see: https://arxiv.org/abs/2302.07056

Parameters:
  • data_arr (array_like) – Observed data vector used in chi-squared comparisons (e.g., wp, xi).

  • inv_Cov2 (ndarray) – Inverse of the covariance matrix used in the chi-squared computation. Must match the dimensionality of data_arr.

  • training_point (structured ndarray) – Existing training sample including HOD parameters and chi-squared values, used to condition the GP model.

  • resume_fit (bool, optional) – If True, resumes from a previously saved fit by loading logs and chains. Default is False.

  • verbose (bool, optional) – If True, displays detailed iteration-level logs. Default is True.

Returns:

All fitting results are saved to disk. No return value.

Return type:

None

Notes

  • Creates and updates files under dir_output_fit, including:
    • *.txt logs of sampled parameter values and chi² results

    • Chains of samples in chains/ directory

    • Diagnostic metrics such as KL divergence

  • Calls the following key internal methods:
    • make_mock_cat(): to generate mock catalogs

    • get_crosswp(), get_cross2PCF(): for 2PCF computation

    • compute_chi2(): to evaluate model-data fit

    • run_gp_mcmc(): for parameter sampling via GP-MCMC

  • Convergence is optionally monitored via KL divergence, but the stopping criterion is commented out.

  • Handles both projected (wp) and full-space (xi) correlation functions depending on fit_type.

  • Assumes the availability of emcee or zeus samplers for MCMC.

  • Results are appended to an evolving training set across iterations.

Example

>>> model.run_fit(data_arr, inv_cov2, training_set, resume_fit=True)
run_gp_mcmc(training_set, niter, logchi2=True, nb_points=1, remove_edges=0.9, random_state=None, verbose=True)

Performs Gaussian Process Regression (GPR) on a training set, runs MCMC sampling over the GPR-predicted posterior, and returns the next suggested parameter point(s) for exploration.

This function enables Bayesian optimization for halo model fitting by building a GP emulator on existing training data, sampling from the GP posterior using MCMC, and identifying the most promising regions in parameter space. Detail of the method in arxiv:2302.07056

Parameters:
  • training_set (structured array) – Training data containing parameters and corresponding chi² values (and uncertainties).

  • niter (int) – Current iteration index (used for file naming and logging).

  • logchi2 (bool, optional) – If True, the GP models log(chi²). Default is True.

  • nb_points (int, optional) – Number of new points to return from the GP+MCMC sampling. Default is 1.

  • remove_edges (float, optional) – Factor to shrink prior boundaries when enforcing parameter limits. Values egal to 1 keep the prior boundaries. Default is 0.9.

  • random_state (int or None, optional) – Seed for reproducibility. Default is None.

  • verbose (bool, optional) – If True, print progress and diagnostics. Default is True.

Returns:

new_points – Array of shape (nb_points, n_parameters) with newly suggested parameter values.

Return type:

ndarray

Notes

  • Trains a GP model using scikit-learn’s GaussianProcessRegressor.

  • Runs MCMC sampling using emcee or zeus. Default sampler is emcee.

  • Logs GPR and MCMC diagnostics to output_GP_*.txt.

  • Saves the full sampled chain with GP predictions to chains/chain_*.txt.

  • The GP kernel is configured based on self.args[‘fit_param’][‘kernel_gp’]. Default kernel is Matern 5/2.

  • Trained GP model and MCMC output are saved for post-analysis and reproducibility.

  • During MCMC, parameter boundaries are enforced via a likelihood mask.

  • GPR score, prediction at the prior mean, and best predicted chi² are logged.

Raises:

ValueError – If an unsupported GP kernel or sampler is specified.

Example

>>> new_pts = model.run_gp_mcmc(training_data, niter=5, nb_points=3, logchi2=True)
set_assembly_bias_values(col, bins=50)

Assign ranked values for assembly bias computation based on a specific column.

This method assigns ranked values linearly between -0.5 and 0.5 for the assembly bias computation. The values are based on a histogram of halo masses (log10_Mh), and the col parameter specifies the column in the halo catalog that will be used for the ranking. The method first checks if the required column (ab_{col}) already exists; if not, it computes the values based on the input column and adds them to the halo catalog.

Parameters:
  • col (str) – The column name in the halo catalog used to compute assembly bias values. This column is expected to be present in the halo catalog.

  • bins (int, optional) – The number of bins used for mass binning in the histogram of halo masses. Default is 50.

Return type:

None

Raises:

ValueError – If the specified column is not present in the halo catalog, a ValueError is raised.

Notes

The method uses log10_Mh values to bin the halos and ranks the halos in each bin based on the specified column. The assembly bias values are assigned linearly between -0.5 and 0.5 within each bin. If the column env is requested but not present, the calc_env_factor() method is called to compute it first. Additionally, the ab_{col} column is only added if it does not already exist in the halo catalog.

Example

If col = ‘env’, the method checks if an assembly bias column for env exists. If not, it computes and adds it. The halos are binned based on their mass, and each halo is ranked within its mass bin. A value between -0.5 and 0.5 is assigned to each halo in the catalog based on the ranking of the env column.

HODDIES.HOD_models module

HODDIES.HOD_models.GHOD(log10_Mh, Ac, Mc, sig_M)

Gaussian HOD model based on arXiv:1708.07628.

Parameters:

  • log10_Mh: float

    Logarithm (base 10) of halo mass.

  • Ac: float

    Normalization amplitude.

  • Mc: float

    Mean halo mass (log10).

  • sig_M: float

    Width of central galaxy mass distribution.

Returns:

  • float

    Expected number of galaxies in a halo of mass log10_Mh.

HODDIES.HOD_models.HMQ(log10_Mh, Ac, Mc, sig_M, gamma, Q, pmax)

Computes the HMQ (High Mass Quenched) Halo Occupation Distribution (HOD) model without normalization based on arxiv:1910.05095.

Parameters:

  • log10_Mh: float

    Logarithm (base 10) of halo mass.

  • Ac: float

    Normalization amplitude.

  • Mc: float

    Mean halo mass (log10).

  • sig_M: float

    Width of central galaxy mass distribution.

  • gamma: float

    Shape parameter controlling the asymetry.

  • Q: float

    Quenching factor.

  • pmax: float

    Maximum probability.

Returns:

  • float

    Expected number of galaxies in a halo of mass log10_Mh.

HODDIES.HOD_models.LNHOD(log10_Mh, Ac, Mc, sig_M)

Log-normal HOD model based on arxiv:2306.06319.

Parameters:

  • log10_Mh: float

    Logarithm (base 10) of halo mass.

  • Ac: float

    Normalization amplitude.

  • Mc: float

    Mean halo mass (log10).

  • sig_M: float

    Width of central galaxy mass distribution.

Returns:

  • float

    Expected number of galaxies in a halo of mass log10_Mh.

HODDIES.HOD_models.Nsat_pow_law(log10_Mh, As, M_0, M_1, alpha)

Power-law model for satellite galaxies (Zheng et al. 2005).

Parameters:

  • log10_Mh: float

    Logarithm (base 10) of halo mass.

  • As: float

    Normalization amplitude.

  • M_0: float

    Minimum halo mass to host a satellite galaxy.

  • M_1: float

    Characteristic halo mass to host a satellite (log10).

  • alpha: float

    Slope of the pozer law distribution.

Returns:

  • float

    Expected number of satellite galaxies in a halo of mass log10_Mh.

HODDIES.HOD_models.SFHOD(log10_Mh, Ac, Mc, sig_M, gamma)

Star-forming HOD model based on arXiv:1708.07628.

Parameters:

  • log10_Mh: float

    Logarithm (base 10) of halo mass.

  • Ac: float

    Normalization amplitude.

  • Mc: float

    Mean halo mass (log10).

  • sig_M: float

    Width of central galaxy mass distribution.

  • gamma: float

    Shape parameter controlling the asymetry.

Returns:

  • float

    Expected number of galaxies in a halo of mass log10_Mh.

HODDIES.HOD_models.SHOD(log10_Mh, Ac, Mc, sig_M)

Standard HOD model (Zheng et al. 2007).

Parameters:

  • log10_Mh: float

    Logarithm (base 10) of halo mass.

  • Ac: float

    Normalization amplitude.

  • Mc: float

    Halo mass cut (log10).

  • sig_M: float

    Stepness of the step function.

Returns:

  • float

    Expected number of galaxies in a halo of mass log10_Mh.

HODDIES.HOD_models.mHMQ(log10_Mh, Ac, Mc, sig_M, gamma)

Computes a simplified version of the HMQ model without normalization and quenching terms based on arxiv:2306.06319.

Parameters:

  • log10_Mh: float

    Logarithm (base 10) of halo mass.

  • Ac: float

    Normalization amplitude.

  • Mc: float

    Mean halo mass (log10).

  • sig_M: float

    Width of central galaxy mass distribution.

  • gamma: float

    Shape parameter controlling the asymetry.

Returns:

  • float

    Expected number of galaxies in a halo of mass log10_Mh.

HODDIES.utils module

HODDIES.utils.apply_rsd(cat, z, boxsize, cosmo, H_0=100, los='z', vsmear=0)

Apply redshift-space distortions (RSD) to a galaxy catalog.

Parameters:
  • cat (dict) – Dictionary containing particle positions and velocities. Keys should include ‘x’, ‘y’, ‘z’, ‘vx’, ‘vy’, ‘vz’.

  • z (float) – Redshift at which to apply the distortions.

  • boxsize (float) – Size of the simulation box in Mpc/h.

  • cosmo (object) – Cosmology object from cosmoprimo.

  • H_0 (float, optional) – Hubble constant in km/s/Mpc. Default is 100.

  • los ({'x', 'y', 'z'}, optional) – Line-of-sight axis. Default is ‘z’.

  • vsmear (float, optional) – Add redshift error using gaussian distribution in km/s. Default is 0.

Returns:

pos_rsd – List of arrays containing RSD-applied positions for x, y, and z.

Return type:

list of ndarray

HODDIES.utils.compute_2PCF(pos1, edges, boxsize, ells=(0, 2), los='z', nthreads=32, R1R2=None, pos2=None, mpicomm=None)

Compute the 2-point correlation function multipoles for a periodic box.

Parameters:
  • pos1 (array-like) – Positions of sample 1 (e.g., galaxies or halos).

  • edges (list of arrays) – Bin edges in separation (s, mu).

  • boxsize (float) – Size of the simulation box.

  • ells (tuple of int, optional) – Multipoles to project onto. Default is (0, 2).

  • los ({'x', 'y', 'z'}, optional) – Line-of-sight direction. Default is ‘z’.

  • nthreads (int, optional) – Number of threads for parallel computation. Default is 32.

  • R1R2 (array-like, optional) – Precomputed RR counts for normalization. Default is None.

  • pos2 (array-like, optional) – Positions of sample 2 (for cross-correlations). Default is None.

  • mpicomm (object, optional) – MPI communicator. Default is None.

Returns:

s, (multipoles) – Average separation for each s bin and computed multipoles of the 2PCF.

Return type:

tuple(array, ndarray)

HODDIES.utils.compute_N(log10_Mh, fun_cHOD, fun_sHOD, p_cen, p_sat, p_ab=None, Nthread=32, ab_arr=None, conformity=False, seed=None)

Compute the number of central and satellite galaxies in a halo using a HOD (Halo Occupation Distribution) model.

Parameters:
  • log10_Mh – ndarray Logarithmic mass of halos.

  • fun_cHOD – callable Function to compute central occupation based on halo mass.

  • fun_sHOD – callable Function to compute satellite occupation based on halo mass.

  • p_cen – ndarray Parameters for the central HOD function.

  • p_sat – ndarray Parameters for the satellite HOD function.

  • p_ab – ndarray or None, optional Assembly bias parameters (central and satellite). Default is None.

  • Nthread – int, optional Number of threads to use in parallel computation. Default is 32.

  • ab_arr – ndarray or None, optional Assembly bias array per halo. Default is None.

  • conformity – bool, optional Whether satellite number is correlated with central galaxy presence. Default is False.

  • seed – ndarray or None, optional Seed array for RNG per thread. Default is None.

Returns:

ndarray

Expected number of central galaxies.

N_satndarray

Expected number of satellite galaxies.

cond_centndarray

Boolean array indicating presence of a central galaxy.

proba_satndarray

Sampled number of satellites per halo.

Return type:

Ncent

HODDIES.utils.compute_fast_NFW(x_h, y_h, z_h, vx_h, vy_h, vz_h, c, M, Rvir, rd_pos, rd_vel, exp_frac=0, exp_scale=1, nfw_rescale=1, vrms_h=None, f_sigv=None, v_infall=None, vel_sat='NFW', Nthread=32, seed=None)

Compute satellite galaxy positions and velocities using the NFW profile.

Parameters:
  • x_h – ndarray Host halo positions.

  • y_h – ndarray Host halo positions.

  • z_h – ndarray Host halo positions.

  • vx_h – ndarray Host halo velocities.

  • vy_h – ndarray Host halo velocities.

  • vz_h – ndarray Host halo velocities.

  • c – ndarray Host halo concentration parameters.

  • M – ndarray Host halo masses.

  • Rvir – ndarray Host halo radii.

  • rd_pos – ndarray Random vectors for position and velocity.

  • rd_vel – ndarray Random vectors for position and velocity.

  • exp_frac – float, optional Fraction of satellites to sample using an exponential halo profile. Default is 0.

  • exp_scale – float, optional Scale of exponential halo profile. Default is 1.

  • nfw_rescale – float, optional Rescaling factor of the concentration parameter. Default is 1.

  • vrms_h – ndarray, optional RMS velocity of hots halo particles. Need to be definied if vel_sat is ‘rd_normal’ or ‘infall’. Default is None.

  • f_sigv – float, optional Velocity dispersion factor. Default is 1.

  • v_infall – float, optional Additional infall velocity toward the host halo center in km/s. Need to be definied if vel_sat is ‘infall’. Default is None.

  • vel_sat – str, optional Velocity model: ‘NFW’, ‘rd_normal’, or ‘infall’.

  • Nthread – int, optional Number of threads for computation. Default is 32.

  • seed – ndarray or None, optional RNG seeds per thread. Default is None.

Returns:

tuple of ndarrays

Satellite positions and velocities (x, y, z, vx, vy, vz).

HODDIES.utils.compute_ngal(log10_Mh, fun_cHOD, fun_sHOD, Nthread, p_cen, p_sat=None, conformity=False)

Compute total number of galaxies and satellite fraction from a given HOD parameter set.

Parameters:
  • log10_Mh – ndarray Logarithmic halo mass.

  • fun_cHOD – callable Function to compute central occupation.

  • fun_sHOD – callable Function to compute satellite occupation.

  • Nthread – int Number of threads for parallel computation.

  • p_cen – ndarray Parameters for central occupation.

  • p_sat – ndarray or None, optionnal Parameters for satellite occupation. Default is None.

  • conformity – bool, optionnal Use conformity when calculating satellites. Default is False.

Returns:

tuple(float, float)

Total number of galaxies, and satellite fraction.

HODDIES.utils.compute_power_spectrum(pos1, boxsize, kedges, pos2=None, los='z', nmesh=256, resampler='tsc', interlacing=2, ells=(0, 2), mpicomm=None)

Compute the power spectrum multipoles from a catalog using FFT-based methods.

Parameters:
  • pos1 (array-like) – Positions of catalog 1.

  • boxsize (float) – Size of the simulation box.

  • kedges (tuple) – k-bin edges for the power spectrum.

  • pos2 (array-like, optional) – Positions of catalog 2 (for cross-spectrum).

  • los (array-like, optional) – Line-of-sight direction.

  • nmesh (int, optional) – Number of mesh cells per dimension. Default is 256.

  • resampler (str, optional) – Mass assignment scheme. Default is ‘tsc’.

  • interlacing (int, optional) – Interlacing order for FFT. Default is 2.

  • ells (tuple of int, optional) – Multipoles to compute. Default is (0, 2, 4).

  • mpicomm (object, optional) – MPI communicator.

Returns:

Power spectrum multipoles.

Return type:

array

HODDIES.utils.compute_wp(pos1, edges, boxsize, pimax=40, los='z', nthreads=32, R1R2=None, pos2=None, mpicomm=None)

Compute the projected correlation function w_p(r_p).

Parameters:
  • pos1 (array-like) – Positions of sample 1 (e.g., galaxies or halos).

  • edges (list of arrays) – Bin edges for projected separation (r_p, pi).

  • boxsize (float) – Size of the simulation box.

  • pimax (float, optional) – Maximum line-of-sight separation for integration.

  • los ({'x', 'y', 'z'}, optional) – Line-of-sight direction. Default is ‘z’.

  • nthreads (int, optional) – Number of threads for parallel computation. Default is 32.

  • R1R2 (array-like, optional) – Precomputed RR counts for normalization. Default is None.

  • pos2 (array-like, optional) – Positions of sample 2 for cross-correlations. Default is None.

  • mpicomm (object, optional) – MPI communicator. Default is None.

Returns:

rp, wp – Seperation and projected correlation function.

Return type:

tuple(array, array)

HODDIES.utils.getPointsOnSphere_jit(nPoints, Nthread=32, seed=None)

Generate random points on a sphere surface using numba jit.

Parameters:
  • nPoints – int Number of points to generate.

  • Nthread – int, optional Number of parallel threads to use. Default is 32.

  • seed – ndarray or None Seed for RNG per thread. Default is None.

Returns:

ndarray

Array of shape (nPoints, 3) with unit vectors uniformly distributed on a sphere.

Return type:

ur

HODDIES.utils.get_etavir_nfw(c)

Draw a normalized NFW radius using inversion sampling of the cumulative mass profile.

Parameters:

c – float Concentration parameter of the NFW profile.

Returns:

float

Normalized radius (r/Rvir).

HODDIES.utils.interp_lambertw(x)
HODDIES.utils.plot_HOD(p_cen, p_sat, fun_cHOD, fun_sHOD, logM=array([10.8, 10.84242424, 10.88484848, 10.92727273, 10.96969697, 11.01212121, 11.05454545, 11.0969697, 11.13939394, 11.18181818, 11.22424242, 11.26666667, 11.30909091, 11.35151515, 11.39393939, 11.43636364, 11.47878788, 11.52121212, 11.56363636, 11.60606061, 11.64848485, 11.69090909, 11.73333333, 11.77575758, 11.81818182, 11.86060606, 11.9030303, 11.94545455, 11.98787879, 12.03030303, 12.07272727, 12.11515152, 12.15757576, 12.2, 12.24242424, 12.28484848, 12.32727273, 12.36969697, 12.41212121, 12.45454545, 12.4969697, 12.53939394, 12.58181818, 12.62424242, 12.66666667, 12.70909091, 12.75151515, 12.79393939, 12.83636364, 12.87878788, 12.92121212, 12.96363636, 13.00606061, 13.04848485, 13.09090909, 13.13333333, 13.17575758, 13.21818182, 13.26060606, 13.3030303, 13.34545455, 13.38787879, 13.43030303, 13.47272727, 13.51515152, 13.55757576, 13.6, 13.64242424, 13.68484848, 13.72727273, 13.76969697, 13.81212121, 13.85454545, 13.8969697, 13.93939394, 13.98181818, 14.02424242, 14.06666667, 14.10909091, 14.15151515, 14.19393939, 14.23636364, 14.27878788, 14.32121212, 14.36363636, 14.40606061, 14.44848485, 14.49090909, 14.53333333, 14.57575758, 14.61818182, 14.66060606, 14.7030303, 14.74545455, 14.78787879, 14.83030303, 14.87272727, 14.91515152, 14.95757576, 15.]), fig=None, color=None, label=None, figsize=(5, 4), show=True)

Plot HOD curves for central and satellite galaxies.

Parameters:
  • p_cen – ndarray Central HOD parameters.

  • p_sat – ndarray Satellite HOD parameters.

  • fun_cHOD – callable Central HOD function.

  • fun_sHOD – callable Satellite HOD function.

  • logM – ndarray, optional Mass bins for evaluation. Default is numpy.linspace(10.8,15,100).

  • fig – matplotlib.figure.Figure or None, optional Existing figure to plot into. Default is None.

  • color – str or None, optional Line color. Default is None.

  • label – str or None, optional Legend label. Default is None.

  • figsize – tuple, optional Figure size. Default is (5,4).

  • show – bool Whether to show the plot immediately. Default is True.

Returns:

matplotlib.figure.Figure

The plotted figure.

HODDIES.utils.rd_draw_NFW(nPoints, burn_in=100000)

Draw random samples from a 3D NFW profile using the Metropolis-Hastings algorithm.

Parameters:
  • nPoints – int Total number of samples to generate.

  • nPoints – int, optional Number of point to remove from the Metropolis-Hastings algorithm. Default is 100000.

Returns:

ndarray

Random samples from the NFW profile.

HODDIES.abacus_io module

Io tools to load AbacusSummit simualtions

HODDIES.abacus_io.compute_col_from_Abacus(N, pos, vel, ParticleMassHMsun, x, y, z, vx, vy, vz, Mvir, log10_Mh, Rs, Rvir, c, r25, r98, Nthread)

Compute derived physical columns for halos (mass, positions, velocities, radii, concentration).

Parameters:
  • N (array) – Number of particles per halo.

  • pos (arrays) – Position and velocity vectors.

  • vel (arrays) – Position and velocity vectors.

  • ParticleMassHMsun (float) – Mass of one simulation particle in Msun/h.

  • x (arrays) – Output arrays for halo positions.

  • y (arrays) – Output arrays for halo positions.

  • z (arrays) – Output arrays for halo positions.

  • vx (arrays) – Output arrays for halo velocities.

  • vy (arrays) – Output arrays for halo velocities.

  • vz (arrays) – Output arrays for halo velocities.

  • Mvir (arrays) – Output arrays for virial mass and log10(Mvir).

  • log10_Mh (arrays) – Output arrays for virial mass and log10(Mvir).

  • Rs (arrays) – Output arrays for scale radius and virial radius in kpc/h.

  • Rvir (arrays) – Output arrays for scale radius and virial radius in kpc/h.

  • c (array) – Output array for halo concentration.

  • r25 (arrays) – Input radius including 25 and 98 % of the halo particles

  • r98 (arrays) – Input radius including 25 and 98 % of the halo particles

  • Nthread (int) – Number of threads for parallel execution.

HODDIES.abacus_io.compute_sat_from_abacus_part(xp, yp, zp, vxp, vyp, vzp, x_sat, y_sat, z_sat, vx_sat, vy_sat, vz_sat, npout, npstart, nb_sat, cum_sum_sat, Nthread, seed=None)

Sample satellite galaxy positions and velocities from Abacus particle subsamples.

Parameters:
  • xp (arrays) – Particle positions in each axis.

  • yp (arrays) – Particle positions in each axis.

  • zp (arrays) – Particle positions in each axis.

  • vxp (arrays) – Particle velocities in each axis.

  • vyp (arrays) – Particle velocities in each axis.

  • vzp (arrays) – Particle velocities in each axis.

  • x_sat (arrays) – Host halo positions in each axis.

  • y_sat (arrays) – Host halo positions in each axis.

  • z_sat (arrays) – Host halo positions in each axis.

  • vx_sat (arrays) – Host halo velocities in each axis.

  • vy_sat (arrays) – Host halo velocities in each axis.

  • vz_sat (arrays) – Host halo velocities in each axis.

  • npout (array) – Number of available particles for each halo.

  • npstart (array) – Starting index in the global particle array for each halo.

  • nb_sat (array) – Number of satellites to assign per halo.

  • cum_sum_sat (array) – Cumulative sum array to locate output indices.

  • Nthread (int) – Number of threads to use in parallel loop.

  • seed (array, optional) – Array of seeds for reproducible random sampling across threads.

Returns:

mask_nfw – Boolean mask identifying entries with no enough particles (to be filled using NFW).

Return type:

array

HODDIES.abacus_io.load_CompaSO(path_to_sim, usecols, mass_cut=None, halo_lc=False, use_particles=None)

Load the CompaSO halo catalog from AbacusSummit simulations.

Parameters:
  • path_to_sim (str or list) – Path(s) to the simulation directory.

  • usecols (list of str) – Fields to load from the catalog.

  • mass_cut (float, optional) – Minimum halo mass threshold (in Msun/h). Halos with smaller mass are excluded.

  • halo_lc (bool, optional) – If True, load halo light cone catalogs from multiple redshifts.

  • use_particles (bool, optional) – Whether to load particle data or not.

Returns:

  • hcat_i (np.ndarray) – Halo catalog array after applying mass cut.

  • header (dict) – Metadata and simulation header.

  • part_subsamples (dict or None) – Dictionary of particle subsamples if applicable.

HODDIES.abacus_io.load_hcat_from_Abacus(path_to_sim, usecols, Lsuff, halo_lc=False, Nthread=64, mass_cut=None, verbose=True, use_particles=None)

Wrapper to load and process Abacus halo catalogs, used for HOD modeling.

Parameters:
  • path_to_sim (str or list) – Path(s) to the halo catalog directory.

  • usecols (list) – Fields to read from the catalog.

  • Lsuff (str) – Suffix (e.g., ‘L2’) used to determine which position/velocity columns to load.

  • halo_lc (bool, optional) – Whether to load from halo light cone catalogs. Default is False.

  • Nthread (int, optional) – Number of threads for computation. Default is 64.

  • mass_cut (float, optional) – Minimum mass threshold to include halos. Default is None.

  • verbose (bool, optional) – Whether to print progress messages. Default is True.

  • use_particles (bool, optional) – Whether to include particle-level data in the returned catalog.

Returns:

  • Catalog – Processed catalog of halo data with derived fields.

  • part_subsamples (dict or None) – Particle subsamples if available.

  • boxsize (float) – Simulation box size in comoving Mpc/h.

  • origin (ndarray or None) – Light cone origin(s) if applicable.

HODDIES.abacus_io.read_Abacus_hcat(args, dir_sim, use_L2=True)

Load an Abacus halo catalog using provided configuration parameters.

Parameters:
  • args (dict) – Dictionary from the parameter file including simulation name, redshift, etc.

  • dir_sim (str) – Path to the root simulation directory.

  • use_L2 (bool, optional) – Whether to use L2com statistic from Abacus simulation or not. Default is True.

  • halo_lc (bool, optional) – Whether to load the halo light cone catalog. Default is False.

Returns:

  • hcat (Catalog) – Halo catalog with derived physical quantities.

  • part_subsamples (dict or None) – Dictionary of particle subsamples if particles are loaded, otherwise None.

  • boxsize (float) – Size of the simulation box in Mpc/h.

  • origin (ndarray or None) – Origin(s) of light cone if halo_lc is True, otherwise None.

HODDIES.fits_functions module

HODDIES.fits_functions.compute_chi2(model, data, inv_Cov2=None, sig=None)

Compute the chi-squared (χ²) statistic between a model and observed data.

The function calculates χ² based on either the standard deviation of errors (sig) or the inverse covariance matrix (inv_Cov2), depending on which is provided.

Parameters:

modelarray_like

The predicted/model values.

dataarray_like

The observed data values.

inv_Cov2array_like, optional

The inverse of the covariance matrix of the data. Used if sig is not provided.

sigarray_like, optional

The standard deviation (1σ errors) of the data. If provided, chi-squared is computed as the sum of squared residuals normalized by the variance.

Returns:

chi2float

The computed chi-squared value.

Raises:

ValueError

If neither sig nor inv_Cov2 is provided.

HODDIES.fits_functions.genereate_training_points(nPoints, priors, sampling_type='lhs', path_to_save_training_point=None, rand_seed=None)

Generate points in a LHS or Hammersley sample for a given priors

Parameters:
  • nPoints (float) – Number of points for the training sample

  • priors (dict) – Dictionary of the priors range (min, max) for each parameter

  • sampling_type (str, default: lhs) – Sampling type: ‘lhs’ or ‘Hammersley’ only

  • path_to_save_training_point (bool, default: False) – Directory to save the sample of points. If path_to_save_training_point is provided the sample will be save with name file points_{sampling_type}.txt as npz file with keys ‘sample’ and ‘header’.

  • rand_seed (int, default: None) – Random seed for lhs sampling

Returns:

sample_points – Sample of points according to the sampling type.

Return type:

float array of shape (nPoints, len(priors))

HODDIES.fits_functions.multivariate_gelman_rubin(chains)

Compute the multivariate Gelman-Rubin convergence diagnostic (R̂) for MCMC chains.

This diagnostic checks for convergence across multiple Markov Chain Monte Carlo (MCMC) simulations by comparing the between-chain and within-chain variances. Values close to 1 indicate convergence.

This is based on the method described in: Brooks, S. P., & Gelman, A. (1998). “General methods for monitoring convergence of iterative simulations.” http://www.stat.columbia.edu/~gelman/research/published/brooksgelman2.pdf

Parameters:

chainsarray_like
A list or array of MCMC chains with shape (nchains, nsteps, ndim), where:
  • nchains is the number of parallel chains,

  • nsteps is the number of samples per chain,

  • ndim is the number of parameters per sample.

Returns:

float

The maximum eigenvalue of the matrix used in the multivariate R̂ computation. A value significantly greater than 1 indicates lack of convergence.

Raises:

AssertionError

If the inversion of Wn1 does not yield an identity matrix within numerical precision.

Module contents