We already dealt with hyper parameter in previous several post. It is parameter of prior distribution. The prior distribution is influenced by how we choose these hyper parameters. There are two main startegies how to choose hyper parameter
Strategy 1
In this strategy, we decide hyper parameter based on our personal knowledge. We can choose proper number by considering how confident we will be when we have n more data points or how many units of information we think we have to include in our prior. For example, let's consider the number of chocolate chips per cookies on average. It is poisson distribution and we can write prior mean and other like this. $$\text{Prior mean :}\frac{\alpha}{\beta}$$ $$\text{Prior std.dev : }\frac{\sqrt{\alpha}}{\beta}$$ $$\text{Effective sample size : }\beta$$
Strategy 2
The purpose of this strategy is decreasing the effective sample size to minimize the influence from prior to posterior. So, we will set prior to vague prior epsilon. $$\epsilon \gt 0, \Gamma(\epsilon, \epsilon)$$ $$\text{Prior mean :} 1$$ $$\text{Prior std.dev :} \frac{1}{\epsilon}$$ $$\text{Posterior mean : }\frac{\epsilon + \Sigma y_i}{\epsilon + n} \sim \frac{\Sigma y_i}{n}$$