# Evidence ratios and AICc

I have three questions related to AICc and evidence ratios. I hope someone can help me out.

1. How do I calculate evidence ratios from AICc? Is it the same way as with AIC, but with AICc instead of AIC, or is that inappropriate?

2. When would people suggest using AICc instead of AIC? I know that AICc is intended for 'small' sample sizes, but how small is small? And is there evidence that AICc is truly superior to AIC with small sample sizes? The only study of which I'm aware that tested AIC vs AICc performance (among other things) found AICc was not superior with small sample sizes. (reference: Raffalovich, L. E., G. D. Deane, D. Armstrong, and H.-s. Tsao. 2008. Model selection procedures in social research: Monte-Carlo simulation results. Journal of Applied Statistics 35:1093-1114)

3. When reporting evidence ratios, how do you report extremely large evidence ratios? My supervisor and I recently debated this because I had reported the evidence ratio values for each of several models including some very large ones (e.g., 6 x 10^30) for models that are highly unlikely given the data. He thought these values conveyed overconfidence in the statistics and should be truncated to ">1000" or some other similar value, similar to P-values truncated to <0.0001. I truncated the evidence ratios as >1000 to appease him, but I'd be curious what others think of this idea.

Thank you in advance to anyone who can offer help.

Jay Fitzsimmons

PhD student, Biology Dept., University of Ottawa, Canada

Hi Jay

I just logged in to BeStat for the first time in ages today and saw your query. You may have already solved your problems, but in answer to your questions:-

1. The same way as with AIC - but with the delta AICc, rather than delta AIC. Remember not to mix up AIC and AICc in the same analysis of your candidate set of models - it has to be one or the other.

2. The rule of thumb for small sample size is that AICc should be used if n/k is less than 40 where n is your sample size and k is your number of fitted parameters (including the intercept). It is sometimes even suggested that, since AICc approximates AIC at large sample sizes anyway, that AICc should be used as a default. Like the paper you cite (which I was unaware of, thanks!), Richards (2005) Ecology 86:2805-2814 also questions whether AICc actually improves inference. It was based on an analysis where he rejected models with delta AICs of greater than 2, 4 or 7 and found that you were more likely to wrongly reject the best approximating model if you used AICc than AIC. The difference was small though - and in any event, this kind of rejection of models based on the delta AIC value is pretty frowned on (which Richards himself warns about). I'd say it's too early to say AICc is actually worse than using AIC - so use whichever you feel comfortable with I suppose (you can always cite Richards or Raffalovich et al. if you want to go with AIC). BTW, there are those (including Richards) who say that AICc can only be used with simple fixed effect linear models - however others say that this is not true. The original paper (Hurvich and Tsai 1989) which suggested using AICc for small smaple size, used simulations based on those kind of models, but it didn't rule out other more complex model formulations (however, it's fair to say that AICc has not been well-tested under such scenarios).

3. I don't know that it would matter if I was refereeing the paper. >1000 or 6x10^30 either way would clearly imply that the alternative model was desperately unlikely, which at the end of the day is presumably the message you want to convey.

Hope this helps (if you ever come back and read this!)

Matt

Thank you very much Matt for your help. You answered all of my questions. Thanks especially for your discussion of AICc - I'll read that Richards paper and keep my eyes open to new papers on the topic.

Jay

One aspect of AICc use that is rarely (ever?) explicitly acknowledged is that it directly ties the evaluation criterion to sample size, which in essence creates issues analogous to power. I've always found this a bit ironic since in some of the papers discussing the use of IC versus "conventional" statistics it is this connection between sample size and test criterion that is criticized. For example, consider the below graph:

http://wolfweb.unr.edu/homepage/mpeacock/Ned.Dochtermann/images/penalty.JPG

When using AICc the penalty factor of additional parameters can be very considerable so they may be excluded from consideration even if they explain a considerable amount of the variance. To me this is pretty much an issue of "power" even if that word has been considered only in the context of your typical test statistics and their distributions.

Given this, a potential way to proceed is to conduct your own simulations to determine ranking behavior of each metric for different statistical models. I've used this approach a couple of times (partly outlined at: http://onlinelibrary.wiley.com/doi/10.1111/j.1439-0310.2010.01846.x/abstract;jsessionid=22271FCD62C6323110389D54FFEE8CD7.d03t01 ). With programming languages like R being free and powerful, this general approach can be used in an *a priori* manner to determine which metric is best suited for the question at hand.