Health care cost modeling can be challenging due to non-normal distributions. There are often many $0 observations and right-skewed cost distributions among health care users. Modeling disease cost to specific health care states adds complexity. Zhou et al. (2023) offer a tutorial on estimating costs with disease model states using generalized linear models.
Step 1: Preparing the dataset:
- Prepare data for discrete time periods and define disease states.
 - Address issues like granular state definitions and multi-state scenarios.
 - Handle censored data and missing cost data with appropriate methods.
 - Map time periods to decision model cycles and transform data.
 - Refer to the sample dataset below:
 

Step 2: Model selection:
- Use a two-part model within a generalized linear model framework.
 - Transform the expected cost value nonlinearly using a GLM.
 - Estimate the link function and error term distribution.
 

- Combine the GLM with a two-part model using the equations below:
 

Step 3: Selecting the final model.
- Consider which covariates are included and evaluate model fit.
 - Explore covariate interactions and alternative selection techniques.
 
Step 4: Model prediction
- Derive marginal effects using recycled prediction for one-part or two-part models.
 - Calculate the difference in mean costs for scenarios of interest.
 - Assess an illustrative example for modeling hospital costs associated with cardiovascular events in the UK.
 
The authors also provide R code for further exploration. Download it here.
.