The Unabashed Academic

09 October 2011

“Reliability of the FCI supports resources theory”

It’s rare that reading a paper in PER (Physics Education Research) will take me through an emotional roller coaster, but I recently had that experience with a paper by Nathan Lasry and friends in the September AJP. [1] I came across the title and abstract on Michael Wittmann’s Perticles blog that gives early notice of papers in PER. [2] The title of the paper is “The puzzling reliability of the Force Concept Inventory.” By "reliability" they mean test-retest consistency. Since I’ve been giving my students the FCI [3] (or the similar FMCE [4]) as a pre-post test in my algebra-based physics class for many years (and then occasionally putting some of those questions on my final), I was interested to see what they had found. The abstract says that the authors gave the test twice to within a week to students in the second semester of an introductory physics class. The test-retest results were consistent – but students’ responses on individual items were much less so. They conclude the abstract with the sentence, “The puzzling conclusion is that although individual FCI responses are not reliable, the FCI total score is highly reliable.”
My first reaction to this was a smug satisfaction followed by irritation. “Well! You might find this result puzzling, Nathan, but I don’t.  If you had asked me, I would have predicted it.” I look at physics teaching and learning through the theoretical lens of the resource framework. This is a theory of student thinking based on careful educational research, teachers’ experience, and the growing understanding of cognition based in psychology and neuroscience. It began with Andy diSessa’s “knowledge in pieces” approach [5] and has been extended and elaborated by many researchers over the past two decades [6], including some members of my research group at Maryland. [7][8][9]
The heart of the resource idea is that student knowledge of physics (indeed, any knowledge) is made up of bits and pieces that are linked together in a structure whose activation is dynamic and highly context dependent. As students learn, they often make a transition from being highly confident about their answers (which, however, may be inconsistent when they are activated in different situations), to being confused, and finally to being more certain of answers that become more consistent and consistent with the physics they have been taught.
One of my former students, Lei Bao (now at the Ohio State University), developed a method for analyzing the FCI and other such tests by treating the state of student knowledge as a probability variable. Bao’s Model Analysis measures the state of student confusion by presenting an “expert equivalent set (EES)” of items. (Such items may not appear equivalent to confused students.) [10] His hypothesis is that the student has a probability for giving a particular answer to a particular item and that probability is what is being measured by the set of questions.
So in the theoretical framework I use it’s to be expected that a student may well be unstable enough in their knowledge to answer the same question differently on two successive tests. Bao’s model would be useful if there were a well-defined probability of answering questions in an EES correctly and that the probability were more stable than the answers to individual items. Lasry et al.’s result supports this idea.
But let’s go meta for a second. Why should Lasry et al. find this result puzzling? Many teachers who are not well versed in education theories know that students’ knowledge fluctuates. I suspect that part of the problem is that the context of “an exam” activates “measurement” in the researcher and this in turn activates, “measuring something that can be measured – uniquely.” This in turn leads to the activation of what might be called the binary pedagogical misconception – the idea that the student either knows something or doesn’t and that a test measures which.
I waited with interest for my copy of the AJP to appear in the mail. When it did, I read Lasry et al. with anticipation. It was very clear from the first half of the paper that they had done a very careful and well thought out experiment with excellent statistical analyses. But my next emotional state was delight. In the discussion section I found the following paragraph:
From the perspective of a resources model, the FCI questions provide a context that activates concept-of-force related schema or a related set of resources. Given that the context for the test and retest was similar, the resources activated should be similar, and hence the probability of selecting a given FCI response should be similar. This similarity means that the probability of choosing an answer will be the same every time, not that they will choose the same answer every time. Hence, although individual responses fluctuate, the overall time-averaged mean-score is unchanged. In retrospect, our data provide good empirical support for the resource model.

They got it! They even cited us! Excellent!
But then my third emotion kicked in: dismay. One paragraph hidden in the middle of the discussion section, Nathan? No comment in the abstract or conclusion that points out that you came in with a theoretical expectation (even if it wasn’t explicitly stated) and your result strongly supported a competing theory? Why isn’t that the main point of the paper? Why isn’t the title something like, “Reliability of the FCI supports resources theory”? I suspect that the paragraph was put in as an afterthought in response to a comment from a referee (not me).
We in PER often make the claim that we are “applying the methods of science to the question of student learning.”  One of those methods that is fundamental to science is developing hypotheses and testing them; and coordinating validated hypotheses into theories. We don’t do nearly enough of this in PER. Isn’t it time we education researchers began to take ourselves seriously as scientists?
[1] N. Lasry et al., “The puzzling reliability of the Force Concept Inventory,” Am. J. Phys. 79(9), 909-912 (September, 2011).
[3] D. Hestenes, M. Wells and G. Swackhamer, “Force Concept Inventory,” Phys. Teach. 30, 141-158 (1992).
[4] R.K. Thornton and D.R. Sokoloff, “Assessing student learning of Newton’s laws: The Force and Motion Conceptual Evaluation,” Am. J. Phys. 66(4), 228-351 (1998).
[5] A. A. diSessa, “Knowledge in Pieces,” in Constructivism in the Computer Age, G. Foreman and P. B. Putall, eds. (Lawrence Earlbaum, 1988) 49-70.
[7] E. F. Redish, “A Theoretical Framework for Physics Education Research: Modeling student thinking,” in Proceedings of the International School of Physics, "Enrico Fermi" Course CLVI, E. F. Redish and M. Vicentini (eds.) (IOS Press, Amsterdam, 2004).
[8] D. Hammer, A. Elby, R. E. Scherr, & E. F. Redish, “Resources, framing, and transfer,” in Transfer of Learning: Research and Perspectives, J. Mestre, ed. (Information Age Publishing, 2004).
[9] M. Sabella and E. F. Redish, "Knowledge organization and activation in physics problem solving," Am. J. Phys. 75, 1017-1029 (2007).
[10] L. Bao and E. F. Redish, “Model analysis: Representing and assessing the dynamics of student learning,” Phys. Rev. ST-PER 2, 010103, 1-16 (2006).


  • What is more confusing to me is that the paragraph that you cite about the resources model is between two paragraphs that contain ideas that seem to oppose the resources model.

    First, they conclude that the consistency of the total FCI score indicates that it measures the Newtonian "concept-of-force," which seems to fit with a more unitary model of cognition. Second, they talk about getting a "false positive" when a non-Newtonian thinker gets a correct response. This idea also seems to suppose that people either have or don't have a Newtonian concept-of-force. To me, both of these ideas seem to be inconsistent with a resources model of cognition.

    You mentioned your dismay that the impact of the paragraph on supporting the resources model was lessened by being buried in the discussion. I feel that the impact of this paragraph was lessened further by being presented as coherent with ideas that it explicitly opposes.


    By Anonymous Anonymous, at 12:07 AM  

  • Eric -- Yes, their use of binary assumptions is inconsistent with the resources model. But their data is not. I think the issue you raise has to be looked at through the lens of grain size. When you look at a block of ice without a microscope it looks solid. Looked at more closely, you may find non-uniformities, bubbles of water and gas. Similarly, when you look at the behavior of a student on many questions, it may look stable - but looked at closer you see fluctuations. They are assuming a kind of "phase transition" model appropriate when you have nearly an infinite number of interacting objects. In the brain the number of resources is large, but not on the scale needed for a phase transition model. Fluctuations seem much more important and their stable measure from repeated testing needs to be seen as a measure of a probability, not "finding false positives" if the students aren't "fully Newtonian thinkers."

    By Blogger An Unabashed Academic, at 11:06 AM  

Post a Comment

Links to this post:

Create a Link

<< Home