Skip to main content
SearchLoginLogin or Signup

Reproducibility of behavioral phenotypes in mouse models - a short history with critical and practical notes

Published onAug 13, 2020
Reproducibility of behavioral phenotypes in mouse models - a short history with critical and practical notes


Progress in pre-clinical research is built on reproducible findings, yet reproducibility has different dimensions and even meanings. Indeed, the terms reproducibility, repeatability, and replicability are often used interchangeably, although each has a distinct definition. Moreover, reproducibility can be discussed at the level of methods, analysis, results, or conclusions. Despite these differences in definitions and dimensions, the main aim for an individual research group is the ability to develop new studies and hypotheses based on firm and reliable findings from previous experiments. In practice this wish is often difficult to accomplish. In this review, issues affecting reproducibility in the field of mouse behavioral phenotyping are discussed.

Keywords: behavioral phenotyping, mouse, reproducibility, core facility


Kenett RS, Shmueli G. Clarifying the terminology that describes scientific reproducibility. Nat Methods. 2015;12(8):699.

Goodman SN, Fanelli D, Ioannidis JP. What does research reproducibility mean? Sci Transl Med. 2016;8(341):341ps12.

Baker M. 1,500 scientists lift the lid on reproducibility. Nature. 2016;533(7604):452-4.

Begley CG, Ellis LM. Drug development: Raise standards for preclinical cancer research. Nature. 2012;483(7391):531-3.

Begley CG. Six red flags for suspect work. Nature. 2013;497(7450):433-4.

Begley CG, Ioannidis JP. Reproducibility in science: improving the standard for basic and preclinical research. Circulation research. 2015;116(1):116-26.

MT, Baker M, et al. The ARRIVE guidelines 2.0: Updated guidelines for reporting animal research. PLOS Biology. 2020;18(7):e3000410.

Munafò MR, Nosek BA, Bishop DVM, Button KS, Chambers CD, Percie du Sert N, et al. A manifesto for reproducible science. Nature Human Behaviour. 2017;1:0021.

Kilkenny C, Parsons N, Kadyszewski E, Festing MF, Cuthill IC, Fry D, et al. Survey of the quality of experimental design, statistical analysis and reporting of research using animals. PLoS ONE. 2009;4(11):e7824.

van der Worp HB, Howells DW, Sena ES, Porritt MJ, Rewell S, O'Collins V, et al. Can animal models of disease reliably inform human studies? PLoS Med. 2010;7(3):e1000245.

Smith AJ, Lilley E. The Role of the Three Rs in Improving the Planning and Reproducibility of Animal Experiments. Animals. 2019;9(11).

Kilkenny C, Browne WJ, Cuthill IC, Emerson M, Altman DG. Improving bioscience research reporting: the ARRIVE guidelines for reporting animal research. PLoS Biol. 2010;8(6):e1000412.

Smith AJ, Clutton RE, Lilley E, Hansen KEA, Brattelid T. PREPARE: guidelines for planning animal research and testing. Lab Anim. 2018;52(2):135-41.

Menke J, Roelandse M, Ozyurt B, Martone M, Bandrowski A. Rigor and Transparency Index, a new metric of quality for assessing biological and medical science methods. bioRxiv. 2020:2020.01.15.908111.

Reichlin TS, Vogt L, Wurbel H. The Researchers' View of Scientific Rigor-Survey on the Conduct and Reporting of In Vivo Research. PLoS One. 2016;11(12):e0165999.

Percie du Sert N, Hurst V, Ahluwalia A, Alam S, Altman DG, Avey MT, et al. Revision of the ARRIVE guidelines: rationale and scope. BMJ Open Science. 2018;2(1).

Percie du Sert N, Ahluwalia A, Alam S, Avey MT, Baker M, Browne WJ, et al. Reporting animal research: Explanation and elaboration for the ARRIVE guidelines 2.0. PLOS Biology. 2020;18(7):e3000411.

Silva AJ, Paylor R, Wehner JM, Tonegawa S. Impaired spatial learning in alpha-calcium-calmodulin kinase II mutant mice. Science. 1992;257(5067):206-11.

Grant SG, O'Dell TJ, Karl KA, Stein PL, Soriano P, Kandel ER. Impaired long-term potentiation, spatial learning, and hippocampal development in fyn mutant mice. Science. 1992;258(5090):1903-10.

Gerlai R, Clayton NS. Analysing hippocampal function in transgenic mice: an ethological perspective. Trends Neurosci. 1999;22(2):47-51.

Whishaw IQ, Tomie JA. Of mice and mazes: similarities between mice and rats on dry land but not water mazes. Physiol Behav. 1996;60(5):1191-7.

Whishaw IQ, Metz GA, Kolb B, Pellis SM. Accelerated nervous system development contributes to behavioral efficiency in the laboratory mouse: a behavioral review and theoretical proposal. Dev Psychobiol. 2001;39(3):151-70.

Livy DJ, Wahlsten D. Tests of genetic allelism between four inbred mouse strains with absent corpus callosum. J Hered. 1991;82(6):459-64.

Simpson EM, Linder CC, Sargent EE, Davisson MT, Mobraaten LE, Sharp JJ. Genetic variation among 129 substrains and its importance for targeted mutagenesis in mice. Nat Genet. 1997;16(1):19-27.

Crawley JN, Belknap JK, Collins A, Crabbe JC, Frankel W, Henderson N, et al. Behavioral phenotypes of inbred mouse strains: implications and recommendations for molecular studies. Psychopharmacology (Berl). 1997;132(2):107-24.

Gerlai R. Gene-targeting studies of mammalian behavior: is it the mutation or the background genotype? Trends Neurosci. 1996;19(5):177-81.

Crusio WE. Flanking gene and genetic background problems in genetically manipulated mice. Biol Psychiatry. 2004;56(6):381-5.

Silva AJ, Simpson EM, Takahashi JS, Lipp HP, Nakanishi S, Wehner JM, et al. Mutant mice and neuroscience: recommendations concerning genetic background. Banbury Conference on genetic background in mice. Neuron. 1997;19(4):755-9.

Pettitt SJ, Liang Q, Rairdan XY, Moran JL, Prosser HM, Beier DR, et al. Agouti C57BL/6N embryonic stem cells for mouse genetic resources. Nat Methods. 2009;6(7):493-5.

Simon MM, Greenaway S, White JK, Fuchs H, Gailus-Durner V, Wells S, et al. A comparative phenotypic and genomic analysis of C57BL/6J and C57BL/6N mouse strains. Genome Biol. 2013;14(7):R82.

Fontaine DA, Davis DB. Attention to Background Strain Is Essential for Metabolic Research: C57BL/6 and the International Knockout Mouse Consortium. Diabetes. 2016;65(1):25-33.

Ahlgren J, Voikar V. Experiments done in Black-6 mice: what does it mean? Lab animal. 2019;48(6):171-80.

Festing MF. Inbred strains should replace outbred stocks in toxicology, safety testing, and drug development. Toxicologic pathology. 2010;38(5):681-90.

Festing MF. Evidence should trump intuition by preferring inbred strains to outbred stocks in preclinical research. ILAR J. 2014;55(3):399-404.

Sittig LJ, Carbonetto P, Engel KA, Krauss KS, Barrios-Camacho CM, Palmer AA. Genetic Background

Limits Generalizability of Genotype-Phenotype Relationships. Neuron. 2016;91(6):1253-9.

Bespalov A, Steckler T. Lacking quality in research: Is behavioral neuroscience affected more than other areas of biomedical science? J Neurosci Methods. 2018;300:4-9.

Kafkafi N, Agassi J, Chesler EJ, Crabbe JC, Crusio WE, Eilam D, et al. Reproducibility and replicability of rodent phenotyping in preclinical studies. Neurosci Biobehav Rev. 2018;87:218-32.

Crabbe JC. Reproducibility of Experiments with Laboratory Animals: What Should We Do Now? Alcohol Clin Exp Res. 2016;40(11):2305-8.

Crabbe JC, Wahlsten D, Dudek BC. Genetics of mouse behavior: interactions with laboratory environment. Science. 1999;284(5420):1670-2.

Chesler EJ, Wilson SG, Lariviere WR, Rodriguez-Zas SL, Mogil JS. Identification and ranking of genetic and laboratory environment factors influencing a behavioral trait, thermal nociception, via computational analysis of a large data archive. Neurosci Biobehav Rev. 2002;26(8):907-23.

Mandillo S, Tucci V, Holter SM, Meziane H, Banchaabouchi MA, Kallnik M, et al. Reliability, robustness, and reproducibility in mouse behavioral phenotyping: a cross-laboratory study. Physiol Genomics. 2008;34(3):243-55.

Wahlsten D. Standardizing tests of mouse behavior: Reasons, recommendations, and reality. Physiol Behav. 2001;73(5):695-704.

Wahlsten D, Metten P, Phillips TJ, Boehm SL, Burkhart-Kasch S, Dorow J, et al. Different data from different labs: lessons from studies of gene-environment interaction. J Neurobiol. 2003;54(1):283-311.

Andrews AM, Cheng X, Altieri SC, Yang H. Bad Behavior: Improving Reproducibility in Behavior Testing. ACS chemical neuroscience. 2018;9(8):1904-6.

Richter SH, Garner JP, Wurbel H. Environmental standardization: cure or cause of poor reproducibility in animal experiments? Nat Methods. 2009;6(4):257-61.

Wurbel H. Behaviour and the standardization fallacy. Nat Genet. 2000;26(3):263.

Tannenbaum J, Bennett BT. Russell and Burch's 3Rs then and now: the need for clarity in definition and purpose. Journal of the American Association for Laboratory Animal Science : JAALAS. 2015;54(2):120-32.

Wurbel H. More than 3Rs: the importance of scientific validity for harm-benefit analysis of animal research. Lab animal. 2017;46(4):164-6.

Voelkl B, Altman NS, Forsman A, Forstmeier W, Gurevitch J, Jaric I, et al. Reproducibility of animal research in light of biological variation. Nature Reviews Neuroscience. 2020.

Brown SD, Hancock JM, Gates H. Understanding mammalian genetic systems: the challenge of phenotyping in the mouse. PLoS Genet. 2006;2(8):e118.

Crawley JN. What's Wrong With My Mouse? Behavioral Phenotyping of Transgenic and Knockout Mice. New York: Wiley-Liss; 2000.

Wahlsten D. Mouse Behavioral Testing. How to use mice in behavioral neuroscience: Academice Press; 2011.

Wahlsten D, Crabbe JC. Replicability and reliability of behavioral tests. In: Crusio WE, Sluyter F, Gerlai R, Pietropaolo S, editors. Behavioral Genetics of the Mouse Volume I Genetics of Behavioral Phenotypes: Cambridge University Press; 2013.

Gulinello M, Mitchell HA, Chang Q, Timothy O'Brien W, Zhou Z, Abel T, et al. Rigor and reproducibility in rodent behavioral research. Neurobiol Learn Mem. 2019;165:106780.

Bikovski L, Robinson L, Konradsson-Geuken A, Kullander K, Viereckel T, Winberg S, et al. Lessons, insights and newly developed tools emerging from behavioral phenotyping core facilities. J Neurosci Methods. 2020;334:108597.

Taketo M, Schroeder AC, Mobraaten LE, Gunning KB, Hanten G, Fox RR, et al. FVB/N: an inbred mouse strain preferable for transgenic analyses. Proc Natl Acad Sci U S A. 1991;88(6):2065-9.

Björklund M, Sirviö J, Puoliväli J, Sallinen J, Jäkälä P, Scheinin M, et al. Alpha2C-adrenoceptor-overexpressing mice are impaired in executing nonspatial and spatial escape strategies. Mol Pharmacol. 1998;54(3):569-76.

Voikar V, Koks S, Vasar E, Rauvala H. Strain and gender differences in the behavior of mouse lines commonly used in transgenic studies. Physiol Behav. 2001;72(1-2):271-81.

Voikar V, Vasar E, Rauvala H. Behavioral alterations induced by repeated testing in C57BL/6J and 129S2/Sv mice: implications for phenotyping screens. Genes Brain Behav. 2004;3(1):27-38.

Voikar V, Polus A, Vasar E, Rauvala H. Long-term individual housing in C57BL/6J and DBA/2 mice: assessment of behavioral consequences. Genes Brain Behav. 2005;4(4):240-52.

Crawley JN, Paylor R. A proposed test battery and constellations of specific behavioral paradigms to investigate the behavioral phenotypes of transgenic and knockout mice. Horm Behav. 1997;31(3):197-211.

Crawley JN. Behavioral phenotyping of transgenic and knockout mice: experimental design and evaluation of general health, sensory functions, motor abilities, and specific behavioral tests. Brain Res. 1999;835(1):18-26.

Dere E, Jolkkonen J, Voikar V, Tanila H. Editorial to the Special Issue: Animal Model of the Year 2036: Novel Perspectives in Behavioral Neuroscience. Behav Brain Res. 2018;352:1. FENS/NENS schools [Available from: ends/NENS-Grants/Slots-in-NENS-courses-or-program mes/What-they-say-about-the-Slots-in-NENS-courses/.

Stevens JC, Banks GT, Festing MF, Fisher EM. Quiet mutations in inbred strains of mice. Trends Mol Med. 2007;13(12):512-9.

Crusio WE. 'My mouse has no phenotype'. Genes Brain Behav. 2002;1(2):71.

Hanell A, Marklund N. Structured evaluation of rodent behavioral tests used in drug discovery research. Front Behav Neurosci. 2014;8:252.

Wahlsten D, Bachmanov A, Finn DA, Crabbe JC. Stability of inbred mouse strain differences in behavior and brain size between laboratories and across decades. Proc Natl Acad Sci U S A. 2006;103:16364-9.

Kulesskaya N, Karpova NN, Ma L, Tian L, Voikar V. Mixed housing with DBA/2 mice induces stress in C57BL/6 mice: implications for interventions based on social enrichment. Front Behav Neurosci. 2014;8:257.

Kulesskaya N, Voikar V. Assessment of mouse anxiety-like behaviour in the light-dark box and open-field arena: Role of equipment and procedure. Physiol Behav. 2014;133:30-8.

Ahlgren J, Voikar V. Housing mice in the individually ventilated or open cages-Does it matter for behavioral phenotype? Genes Brain Behav. 2019;18(7):e12564.

Beery AK, Zucker I. Sex bias in neuroscience and biomedical research. Neurosci Biobehav Rev. 2011;35(3):565-72.

Prendergast BJ, Onishi KG, Zucker I. Female mice liberated for inclusion in neuroscience and biomedical research. Neurosci Biobehav Rev. 2014;40:1-5.

Fritz AK, Amrein I, Wolfer DP. Similar reliability and equivalent performance of female and male mice in the open field and water-maze place navigation task. Am J Med Genet C Semin Med Genet. 2017;175(3):380-91.

Clayton JA. Applying the new SABV (sex as a biological variable) policy to research and clinical care. Physiol Behav. 2018;187:2-5.

Karp NA, Reavey N. Sex bias in preclinical research and an exploration of how to change the status quo. Br J Pharmacol. 2018.

Mogil JS. Qualitative sex differences in pain processing: emerging evidence of a biased literature. Nature Reviews Neuroscience. 2020.

Bohlen M, Hayes ER, Bohlen B, Bailoo J, Crabbe JC, Wahlsten D. Experimenter effects on behavioral test scores of eight inbred mouse strains under the influence of ethanol. Behav Brain Res. 2014;272:46-54.

Hurst JL, West RS. Taming anxiety in laboratory mice. Nat Methods. 2010;7:825-6.

Stanford SC. The Open Field Test: reinventing the wheel. J Psychopharmacol. 2007;21(2):134-5.

Stanford SC. Open fields (unlike wheels) can be any shape but still miss the target. J Psychopharmacol. 2007;21(2):144.

Wahlsten D, Rustay NR, Metten P, Crabbe JC. In search of a better mouse test. Trends Neurosci. 2003;26(3):132-6.

Richardson CA. The power of automated behavioural homecage technologies in characterizing disease progression in laboratory mice: A review. Appl Anim Behav Sci. 2015;163(0):19-27.

Richter SH. Automated Home-Cage Testing as a Tool to Improve Reproducibility of Behavioral Research? Frontiers in Neuroscience. 2020;14:383.

Festing MF. We are not born knowing how to design and analyse scientific experiments. Altern Lab Anim. 2013;41(2):P19-21.

Dirnagl U. The p value wars (again). European journal of nuclear medicine and molecular imaging. 2019;46(12):2421-3.

No comments here
Why not start the discussion?