From F-twist to N-twist
“For every complex problem there is an answer that is clear, simple, and wrong.”
On the F-twist: A Methodology of Positive Economics
1953 can be labeled a decisive year for the economics discipline due to the publication of “The methodology of positive economics”, a Milton Friedman article that became the “most cited, most influential, and most controversial piece of methodological writing in twentieth-century economics” (Maki 2009). Since its publication, Friedman’s paper (1953), F53 from now on, has received over 8500 citations, when “economic methodologists would not dream of having more than 2500 citations” (Reiss 2010), and even had a 50th-anniversary celebration. After 70 years, the paper is still alive and receiving citations, even though the reception has not been all that welcoming since its publication. (Boland 1979), a supporter of F53, claims that while textbooks have welcomed and adopted the methodology, “virtually all the journal articles that have been written about that essay have been very critical.” From another perspective, Hands (2001) believes F53 is greeted by practical economists and criticized by non-economists such as methodologists and philosophers.
Nonetheless, what is F53 all about? F53 is supposed to be about the methodology of positive economics, i.e., how to study economic phenomena and do so in a positive, value-free manner. However, in either of the definitions of the term methodology, as the study of methods, the study of specific methods used in a research intervention, or a particular set of methods commonly used together in practice (Mingers 2006a), F53 is not about methodology but primarily about how to validate an economics model, theory, or hypothesis. The only validation method and judgment criterion that F53 recognizes is “prediction power”, to the level that “the ultimate goal of a positive science is the development of a theory or hypothesis that yields valid and meaningful predictions” and “the only relevant test of the validity of a hypothesis is comparison of its predictions with experience”. Consequently, the validation or reality of the assumptions of a model should not and cannot play any role in model validation to the level of “impossibility of testing a theory by its assumptions”. Besides the central theme of method validation, F53 touches on other topics, including the goal of science (p3), the role of language in science (p7), the distinction between normative and positive economics (p3-6), the merits of controlled experiments and observational studies (p10-11), the role of empirical evidence in theoretical work (p12), model and realism (p25) to name a few.
On the F-twist Shortcomings: Neither a Methodology, nor Positive
While a full review of the F53 criticisms is beyond the scope of this paper, it is insightful to review two prominent critical works.1 One of these works is (Musgrave 1981), which highlights that F53 does not discern different types of assumptions and thus carelessly lumps them. The author explains that economics has at least three types of assumptions: negligible, domain, and heuristic. The negligibility assumptions can safely be excluded from the model; the domain assumptions are conditions under which the hypothesis holds; and the heuristic ones are used in successive approach modeling to capture complex phenomena. According to Musgrave (1981), the validation of the two latter assumptions matters in contrast to the F53 suggestion. Another prominent critical work is (Mäki 2003), which finds F53 “a pool of ambiguous and inconsistent ingredients”, a F-mix instead of a F-twist, that encompasses “too many methodological doctorines, hence fails to capture a single doctorine,” and therefore fails to deliver what it is supposed to deliver.
From my perspective, the validation method of F53 is not a panacea to validate economic models once and for all. While prediction power is necessary, it is insufficient, and we need further steps for model validation. I found the following issues with the F53:
The goal of science is not prediction but the explanation of phenomena. A good explanation helps to make precise predictions, and a good prediction improves our insight into the system. However, it is always possible to have a predictive model that yields very little insight into the system’s mechanisms. If we want to understand, we have to hire models designed primarily for explanation rather than prediction.
The terms need to be clearly defined in F53 and are oftentimes used loosely. In F53, theory and hypothesis are interchangeable terms, but prediction is the most problematic to me. Friedman (1953) uses prediction in the sense of predictive analytics as well as prescriptive analytics, while these two are different. The former is about prediction based on historical data, and the latter is prediction as the outcome of a scenario based on the structure of the system. These two mandate different types of modeling.
Probably the most dangerous error of F53 is lowering the bar of “confirmation” of a hypothesis: ‘The constructed hypothesis is presumably valid, that is, it yields “sufficiently” accurate predictions’. The term sufficient opens the door to acceptance of any model with some predictive power. In complex systems where all components are interconnected, each component has some degree of associative behaviour and, thus, some predictive power about other components. Consequently, any hypotheses can be validated using the F53 minimal validation criterion. This caveat gives an approval card to the ideological biases in econometrics since the economists can now choose sufficiently good models based on their taste and theory (Mehrara, Ghazanfari, and Majdzadeh 2014), i.e., ideology. This behavior undermines the aspiration of achieving positive economics since empirical does not necessarily mean positive or value-free.
In contrast to the lax approach of the F53 to prediction error, any amount of error matters in practice and is meaningful. Any amount of error either denotes some explanatory variables are absent from the model, i.e., an insufficient model boundary or denotes an insufficient functional form. (Tibshirani, James, and Trevor 2017)
A model with high prediction power does not necessarily yield an understanding and knowledge of the system, and such insight is needed for policy analysis. In order to change a system’s behavior, we need to find its leverage points and understand its counterintuitive behavior. (Meadows 1999) These leverage points do not necessarily have a close association with the output behavior and, thus might not have excellent prediction power.
For the phenomena that are developing and unfolding at the time of the study, the single validation criterion of the prediction power loses its practicality even more. When there is not enough data at hand, we can validate myriad models with scarce data.
This validation method is blind to the possibility of incorrect models yielding correct predictions for an event or a period of time. An example is the prediction of the GFC done by Steve Keen, a MMT/PK economist; Peter Schiff, a Financial-assets Trader of Austrian Economics; and Michael Roberts, a Marxist economics Commentator. These predictions were based on very different arguments. Were they equally valid? The only answer that F53 provides is model-selection based on simplicity and fruitfulness!
There is no reason behind antagonism against an effort to make a realistic model. F53 lumps being oppose to reality with being similar to reality and call both of them unrealistic. According to F53 any effort to make realistic assumptions is futile.
Nevertheless, the relationship between the influential F53 and the mainstream economics methodology in practice is more complicated. To the best of my experience, prediction is not being used at all or properly. The typical econometric models are validated using a variety of R2 indices, and model selection is done using statistical tests such as the F-test and not based on their comparative prediction power.2. Moreover, in rare cases with a trace of prediction, it is not based on a sound method such as cross-validation and a sound model comparison.(Kuhn et al. 2013) (Figueiredo Filho, Júnior, and Rocha 2011)
Precendented Shortcomings vs. Unprecendented Detrimental Phenomena
Lack of a comprehensive validation framework and even not using the narrow validation method of F53 have led to the confirmation and publication of methods whose validation is questionable at best. While such sloppy practice might not cost critically in insignificant areas such as “stock price prediction”, relying on invalid models has jeopardized and will put millions’ lives in peril in large-scale events and policies such as austerity measures and climate change. The latter is a matter of life and death for the planet and its inhabitants. (Kalmus 2022)
The legacy of F53 can be traced in studies of the bilateral effects of climate change and economics. The summary of the temperature-GDP models in (Newell, Prest, and Sexton 2021) shows how the F53 narrow validation method, which is especially bad at dealing with unprecedented unfolding phenomena, has left its mark. According to the authors, who assessed the models using a cross-validation method, the out-of-sample performance of a wide variety of 800 tested models is “statistically indistinguishable”, i.e., prediction power is unable to separate and rank different models and thus to find the most valid one. Additionally, the authors find that “the 95% confidence interval that accounts for both sampling and model uncertainty across the best-performing models ranges from 84% GDP losses to 359% gains.” Such a wide range of predictions portrays the tragicomic situation of climate change modeling from an economic perspective.
N-Twist: Nordhaus and RICE Economic Climate Change Model
One of such models, and probably the most famous one, is the RICE model of Nordhaus and Yang (W. D. Nordhaus and Yang 1996). The model is well-known not only because it is one of the first efforts at economic modeling of climate change but also because the author has won the 2018 Nobel Prize (“2018 NobelPrize,” n.d.). The RICE model is essentially a dynamic non-linear optimization model with an objective function of regional welfare maximization over a set of economic and climate constraints. (W. Nordhaus 2019; W. D. Nordhaus and Yang 1996) According to the RICE model, 3C global warming would have 2% damage to the output, and the cost of net zero emissions is a 3.5% drop in the output (W. Nordhaus 2019), thus 3C global heating has much less impact on the economy than the GFC.
The RICE model, its outputs, and its assumptions have been censured by a few scholars, including (Keen 2021). Keen’s criticism orients around the unrealistic assumptions behind the RICE model, e.g., extrapolation of the economic disparities under the current climate to the world affected by global heating, unaffected indoor activities by the climate change, functional form of the global heating damage to the output, and exclusion of energy from the model, to name a few. Also, the author highlights the inability of the reviewed models to capture tipping-point behaviors, i.e., exponential growth followed by exponential decay, as well as the low data quality behind these models.
The striking points that occurred to me while reading the initial paper on the RICE model (W. D. Nordhaus and Yang 1996) are:
Formulating the problem using an optimization method. Hard operations research methods, e.g., the RICE modeling approach, has been under harsh criticism even by some of its pioneers due to the “mathematically sophisticated but contextually naive” approach of the optimization methods and its “obsessed with techniques” practitioners who “decreasingly took problematic situations as they came, but increasingly sought, select, and distorted them so that favoured technique could be applied to them.” (Ackoff 1979) While optimization modeling has proven useful for solving niche technical problems with homogenous components, such as parameter optimization in machine learning models, it seems structurally unfit for human-involved complex problems. (see Carissimo and Korecki 2023) The choice of such optimization methodology might stem from the neoclassical understanding of entities, from humans to countries, as “optimizers”. (Rappaport 1996)
Lack of any validation method, let alone prediction-based validation, in (W. D. Nordhaus and Yang 1996). A common problem with mainstream economics studies that I had highlighted earlier. This might have something to do with optimization problems, as there is very little on validation of the methods with real-world data in the methodologies textbooks, e.g., (Winston 2022).
Limited scope of the formulated problem, which disregards energy, other GHGs besides CO2, biodiversity loss, and its feedback loops to global overheating, etc.
The hard-coded parameters, e.g., discount rate, without any clear justifications. It seems to me that such parameters should be the decision variables of an optimization model, not its constant coefficients.
Conclusion: In Search of a Framework
In conclusion, the neoclassical methodology seems to suffer from a lack of tool diversity, misuse of the limited available tools, and insufficient model validation guidelines. These shortcomings are more blatant when facing unprecedented large-scale complex problems such as climate change. And since this toolset is deeply rooted in the neoclassical paradigm, it would be very difficult to radically improve it.
In contrast, heterodox economics is paradigm-wise in an advantageous position in the toolbox and validation perspectives. Nevertheless, an alternative paradigm should go beyond criticism of the neoclassical paradigm and suggest a comprehensive and versatile methodology framework. Perhaps a matrix of the problem types, solution stages, and available tools for each stage of each problem, similar to what was suggested by (Mingers and Brocklesby 1997; Mingers and Rosenhead 2011) but for economic problems, might be necessary, with an emphasis on multi-methodology to tackle multi-aspect complex problems of the 21st century.(see Mingers 2006b)
NOTE It is a very bad practice to make a them and us categorization, especially in the methods. All the methods are useful in their own scope. Machine learning is useful in its own scope, SD also, causal discovery and causal inference, network analysis, ABM, etc.