PHD Discussions Logo

Ask, Learn and Accelerate in your PhD Research

Question Icon Post Your Answer

Question Icon

How to compute expectation and variance for constrained integer sets?

I’m analyzing a dataset where each entity has a non-negative integer count, and these counts must sum to a fixed total. Calculating the naive sample mean and variance ignores this constraint, biasing my model. I need to know the correct way to compute the expectation and variance that properly accounts for this inherent dependency in the data structure.

 

All Answers (1 Answers In All)

By Natasha Answered 1 year ago

This is a classic issue in multivariate analysis and combinatorics. You cannot treat constrained k_i as independent samples. The key is to define the correct sample space. I would recommend modeling your data as a draw from a multivariate distribution, like the multinomial if the total sum N is fixed. The expectation for each category is then N * p_i, and the variance is N * p_i * (1-p_i), with covariance between categories. If p_i is unknown, use the observed proportion. For just non-negativity without a fixed sum, treat it as a Poisson-binomial scenario. The constraint fundamentally changes the covariance structure.

Your Answer