It’s rare that reading a paper in PER (Physics Education Research) will take me through an emotional roller coaster, but I recently had that experience with a paper by Nathan Lasry and friends in the September AJP.  I came across the title and abstract on Michael Wittmann’s Perticles blog that gives early notice of papers in PER.  The title of the paper is “The puzzling reliability of the Force Concept Inventory.” By "reliability" they mean test-retest consistency. Since I’ve been giving my students the FCI  (or the similar FMCE ) as a pre-post test in my algebra-based physics class for many years (and then occasionally putting some of those questions on my final), I was interested to see what they had found. The abstract says that the authors gave the test twice to within a week to students in the second semester of an introductory physics class. The test-retest results were consistent – but students’ responses on individual items were much less so. They conclude the abstract with the sentence, “The puzzling conclusion is that although individual FCI responses are not reliable, the FCI total score is highly reliable.”
My first reaction to this was a smug satisfaction followed by irritation. “Well! You might find this result puzzling, Nathan, but I don’t. If you had asked me, I would have predicted it.” I look at physics teaching and learning through the theoretical lens of the resource framework. This is a theory of student thinking based on careful educational research, teachers’ experience, and the growing understanding of cognition based in psychology and neuroscience. It began with Andy diSessa’s “knowledge in pieces” approach  and has been extended and elaborated by many researchers over the past two decades , including some members of my research group at Maryland. 
The heart of the resource idea is that student knowledge of physics (indeed, any knowledge) is made up of bits and pieces that are linked together in a structure whose activation is dynamic and highly context dependent. As students learn, they often make a transition from being highly confident about their answers (which, however, may be inconsistent when they are activated in different situations), to being confused, and finally to being more certain of answers that become more consistent and consistent with the physics they have been taught.
One of my former students, Lei Bao (now at the Ohio State University), developed a method for analyzing the FCI and other such tests by treating the state of student knowledge as a probability variable. Bao’s Model Analysis measures the state of student confusion by presenting an “expert equivalent set (EES)” of items. (Such items may not appear equivalent to confused students.)  His hypothesis is that the student has a probability for giving a particular answer to a particular item and that probability is what is being measured by the set of questions.
So in the theoretical framework I use it’s to be expected that a student may well be unstable enough in their knowledge to answer the same question differently on two successive tests. Bao’s model would be useful if there were a well-defined probability of answering questions in an EES correctly and that the probability were more stable than the answers to individual items. Lasry et al.’s result supports this idea.
But let’s go meta for a second. Why should Lasry et al. find this result puzzling? Many teachers who are not well versed in education theories know that students’ knowledge fluctuates. I suspect that part of the problem is that the context of “an exam” activates “measurement” in the researcher and this in turn activates, “measuring something that can be measured – uniquely.” This in turn leads to the activation of what might be called the binary pedagogical misconception – the idea that the student either knows something or doesn’t and that a test measures which.
I waited with interest for my copy of the AJP to appear in the mail. When it did, I read Lasry et al. with anticipation. It was very clear from the first half of the paper that they had done a very careful and well thought out experiment with excellent statistical analyses. But my next emotional state was delight. In the discussion section I found the following paragraph:
From the perspective of a resources model, the FCI questions provide a context that activates concept-of-force related schema or a related set of resources. Given that the context for the test and retest was similar, the resources activated should be similar, and hence the probability of selecting a given FCI response should be similar. This similarity means that the probability of choosing an answer will be the same every time, not that they will choose the same answer every time. Hence, although individual responses fluctuate, the overall time-averaged mean-score is unchanged. In retrospect, our data provide good empirical support for the resource model.
They got it! They even cited us! Excellent!
But then my third emotion kicked in: dismay. One paragraph hidden in the middle of the discussion section, Nathan? No comment in the abstract or conclusion that points out that you came in with a theoretical expectation (even if it wasn’t explicitly stated) and your result strongly supported a competing theory? Why isn’t that the main point of the paper? Why isn’t the title something like, “Reliability of the FCI supports resources theory”? I suspect that the paragraph was put in as an afterthought in response to a comment from a referee (not me).
We in PER often make the claim that we are “applying the methods of science to the question of student learning.” One of those methods that is fundamental to science is developing hypotheses and testing them; and coordinating validated hypotheses into theories. We don’t do nearly enough of this in PER. Isn’t it time we education researchers began to take ourselves seriously as scientists?
 N. Lasry et al., “The puzzling reliability of the Force Concept Inventory,” Am. J. Phys. 79(9), 909-912 (September, 2011).
 D. Hestenes, M. Wells and G. Swackhamer, “Force Concept Inventory,” Phys. Teach. 30, 141-158 (1992).
 R.K. Thornton and D.R. Sokoloff, “Assessing student learning of Newton’s laws: The Force and Motion Conceptual Evaluation,” Am. J. Phys. 66(4), 228-351 (1998).
 A. A. diSessa, “Knowledge in Pieces,” in Constructivism in the Computer Age, G. Foreman and P. B. Putall, eds. (Lawrence Earlbaum, 1988) 49-70.
 E. F. Redish, “A Theoretical Framework for Physics Education Research: Modeling student thinking,” in Proceedings of the International School of Physics, "Enrico Fermi" Course CLVI, E. F. Redish and M. Vicentini (eds.) (IOS Press, Amsterdam, 2004).
 D. Hammer, A. Elby, R. E. Scherr, & E. F. Redish, “Resources, framing, and transfer,” in Transfer of Learning: Research and Perspectives, J. Mestre, ed. (Information Age Publishing, 2004).
 M. Sabella and E. F. Redish, "Knowledge organization and activation in physics problem solving," Am. J. Phys. 75, 1017-1029 (2007).
 L. Bao and E. F. Redish, “Model analysis: Representing and assessing the dynamics of student learning,” Phys. Rev. ST-PER 2, 010103, 1-16 (2006).