Masal ve Hikayeler: science

science etiketine sahip kayıtlar gösteriliyor. Tüm kayıtları göster

Statistics controversy: missing the p-oint.

There is a valuable discussion in Nature about the problems that have arisen related to the (mis)use of statistics for decision-making. To simplify the issue, it is the idea that a rather subjectively chosen cutoff, or p, value leads to dichotomizing our inferences, when the underlying phenomena may or may not be dichotomous. For example, in a simplistic way to explain things, if a study's results pass such a cutoff test, it means that the chance the observed result would arise if nothing is going on (as opposed to the hypothesized effect) is so small--less than p percent of the time--that we accept the data as showing that our suggested something is going on. In other words, rare results (using our cutoff criterion for what 'rare' means) are considered to support our idea of what's afoot. The chosen cutoff level is arbitrary and used by convention, and its use doesn't reflect the various aspects of uncertainty or alternative interpretations that may abound in the actual data.

The Nature commentaries address these issues in various ways, and suggestions are made. These are helpful and thoughtful in themselves but they miss what I think is a very important, indeed often the critical point, when it comes to their application in many areas of biology and social science.

Instrumentation errors
In these (as other) sciences, various measurements and technologies are used to collect data. These are mechanical, so to speak, and are always imperfect. Sometimes it may be reasonable to assume that the errors are unrelated to what is being measured (for example, their distribution is unrelated to the value of a given instance) and don't affect what is being measured (as quantum measurements can do), then correcting for them in some reasonably systematic way, such as assuming normally distributed errors, clearly helps adjust findings for the inadvertent but causally unconnected errors.

Such corrections seem to apply quite validly to social and biological, including evolutionary and genetic, sciences. We'll never have perfect instrumentation or measurement, and often don't know the nature of our imperfections. Assuming errors uncorrelated with what is being sought seems reasonable even if approximate to some unknown degree. It's worked so well in the past that this sort of probabilistic treatment of results seems wholly appropriate.

But instrumentation errors are not the only possible errors in some sciences.

Conceptual errors: you can't 'correct' for them in inappropriate studies
Statistics is, properly, a branch of mathematics. That means it is an axiomatic system, an if-then way to make deductions or inductions. When and if the 'if' conditions are met, the 'then' consequences must follow. Statistics rests on probabilism rather than determinism, in the sense that it relates to and is developed around, the idea that some phenomena only occur with a given probability, say p, and that such a value somehow exists in Nature.

It may have to do with the practicalities of sampling by us, or by some natural screening phenomenon (as in, say, mutation, Mendelian transmission, natural selection). But it basically always rests on some version or other of an assumption that the sampling is parametric, that is, that our 'p' value somehow exists 'out there' in Nature. If we are, say, sampling 10% of a population (and the latter is actually well-defined!) then each draw has the same properties. For example, if it is a 'random' sample, then no property of a potential samplee affects whether or not it is actually sampled.

But note there is a big 'if' here: Sampling or whatever process is treated as probabilistic needs to have a parameter value! It is that which is used to compute significance measures and so on, from which we draw conclusions based on the results of our sample.

Is the universe parametric? Is life?
In physics, for example, the universe is assumed to be parametric. It is, universally, assumed to have some properties, like gravitational constant, Planck's constant, the speed of light, and so on. We can estimate the parameters here on earth (as, for example, Newton himself suggested), but assume they're the same elsewhere. If observation challenges that, we assume the cosmos is regular enough that there are at least some regularities, even if we've not figured them all out yet.

A key feature of a parametric universe is replicability. When things are replicable, because they are parametric--have fixed universal properties, then statistical estimates and their standard deviations etc. make sense and should reflect the human-introduced (e.g., measurement) sources of variation, not Nature's. Statistics is a field largely developed for this sort of context, or others in which sampling was reasonably assumed to represent the major source of error.

In my view it is more than incidental, but profound, that 'science' as we know it was an enterprise developed to study the 'laws' of Nature. Maybe this was the product of the theological beliefs that had preceded the Enlightenment or, as I think at least Newton said, 'science' was trying to understand God's laws.

In this spirit, in his Principia Mathematica (his most famous book), Newton stated the idea that if you understand how Nature works in some local example, what you learned would apply to the entire cosmos. This is how science, usually implicitly, works today. Chemistry here is assumed to be the same as chemistry on any distant galaxy, even those we cannot see. Consistency is the foundation upon which our idea of the cosmos and in that sense, classical science has been built.

Darwin was, in this sense, very clearly a Newtonian. Natural selection was a 'force' he likened to gravity, and his idea of 'chance' was not the formal one we use today. But what he did observe, though implicitly, was that evolution was about competing differences. In this sense, evolution is inherently not parametric.

Not only does evolution rest heavily on probability--chance aspects of reproductive success, which Darwin only minimally acknowledged, but it rests on each individual's own reproductive success being unique. Without variation, and that means variation in the traits that affect success, not just 'neutral' ones, there would be no evolution.

In this sense, the application of statistics and statistical inference in life sciences is legitimate relative to measurement and sampling issues, but is not relevant in terms of the underlying assumptions of its inferences. Each study subject is not identical except for randomly distributed 'noise', whether in our measurement or in its fate.

Life has properties we can measure and assign average values to, like the average reproductive success of AA, Aa, and aa genotypes at a given gene. But that is a retrospective average, and it is contrary to what we know about evolution to assume that, say, all AA's have the same fitness parameter and their reproductive variation is only due to chance sampling from that parameter.

Thinking of life in parametric terms is a convenience, but is an approximation of unknown and often unknowable inaccuracy. Evolution occurs over countless millennia, in which the non-parametric aspects can be dominating. We can estimate, say, recombination or mutation or fitness values from retrospective data, but they are not parameters that we can rigorously apply to the future and they typically are averages among sampled individuals.

Genetic effects are unique to each background and environmental experience, and we should honor that uniqueness as such! The statistical crisis that many are trying valiantly to explain away, so they can return to business as usual (even if not reporting p values) is a crisis of convenience, because it makes us think that a bit of different reportage (confidence limits rather than p values, for example) will cure all ills. That is a band-aid that is a convenient port-in-a-storm, but an illusory fix. It does not recognize the important, or even central, degree to which life is not a parametric phenomenon.

The Knowledge Factory Crisis: A different, anthropological way to view universities

Nothing we humans do lives up to its own mythology. We are fallible, social, competitive, acquisitive, our understanding is incomplete, and we have competing interests to address, in our lives and as a society. I posted yesterday about universities as 'knowledge factories, reacting to a BBC radio program that discussed what is happening in universities, when research findings seem unrepeatable.

That program, and my discussion of what is going on at universities, took the generally expressed view of what universities are supposed to be, and examined how that is working. The discussion concerned technical aspects that related to the nature of scientific information universities address or develop. That is, in this context, their 'purpose' for being. How well do they live up to what they are 'supposed' to be?

Many of my points in the post were about the nature of faculty jobs are these days, and the way in which pressures lead to the over-claiming of findings, and so on. I made some suggestions that, in principle, could help science live up to its ideal.

Here in this post, however, I want to challenge what I have said about this. Instead, I want to take a somewhat distanced viewpoint, looking at universities from the outside, in a standard kind of viewpoint that anthropologists take, rather than simply accepting universities' own assessments of what they are about.

Doing poorly by their ideal standard
My post noted ways in which universities have become not just a 'knowledge factory', but more crass business factories, as making money blatantly increasingly over-rides their legitimate--or at least, stated--role as idea and talent engines for society. Here's a story from a few years ago about that, that is still cogent. The fiscal pursuit discussed in this post is part of the phenomenon. As universities are run more and more as businesses, which happens even in state universities, they become more exclusive, belying their original objective which (as in the land-grant public universities) was to make higher education available to everyone. In addition to becoming money makers themselves, academia has become a boon for student-loan bankers, too.

But this is a criticism of university-based science, and expressed as it relates to how universities are structured. That structure, even in science, leads to problems of science. One might think that something so fundamentally wrong would be easy to see and to correct. But perhaps not, because universities are not isolated from society--they are of society, and therein lies some deep truth.

Excelling hugely as viewed anthropologically
If you stop examining how universities compare to their ideals, or to what most people would tell you universities were for, and instead look at them as parts of society, a rather different picture emerges.

Universities are a huge economic engine of society. They garner their very large incomes from various sources: visitors to their football and basketball stadiums, students whose borrowed money pays tuition, and agencies private and public that pour in money for research. Whether or not they are living up to some ideal function or nature, they are a major and rather independent part of our economy.

Their employees, from their wildly paid presidents, down to the building custodians, span every segment of society. The money universities garner pays their salaries, and buys all sorts of things on the open commercial economy, thereby keeping many other people gainfully employed. Their activities (such as the major breakthrough discoveries they announce almost daily) generate material and hence income for the media industries, print and electronic, which in turn helps feed those industries and their relevant commercial influences (such as customers, television sales, and more).

Human society is a collective way for we human organisms to extract our living from Nature. We compete as individuals in doing this, and that leads to hierarchies. Overall, over time, societies have evolved such that these structures extract ever more resources and energy. Via various cultural ideologies we are able to keep things going smoothly enough, at least internally, so as not to disrupt this extractive activity.

Religion, ownership hierarchies, imperialism, military, and other groups have self-justifications that make people feel they belong. This contributes to building pyramids--whether they be literal, or figurative such as religions, universities, armies, political entities, social classes, or companies. Often the justification is religious--nobility by divine right, conquest as manifest destiny, and so on. That not one of these resulting societal structures lives up to its own ideology has long been noted. Why should we expect universities to be any different? These are the cultural ways people organize themselves to extract resources for themselves.

Universities are parasites on society, very hierarchical with obscenely overpaid nobles at the top? They show no limits on the trephining they do on those who depend on them, such as graduating students with life-burdening debt? They churn through those who come to them for whom they claim to 'provide' the good things in life? Of course! Like it or not, by promising membership and a better life, they are just like religions or political classes or corporations!

Institutions may be so caught up in their belief systems that they don't adapt to the times or competitors, or they may change their actions (if not always their self-description). If they don't adapt they eventually crumble and are replaced by new entities with new justifications to gain popular appeal or acceptance. However, fear not, because relative to their actual (as opposed to symbolic) role in societies, universities are doing very well: at present, they very clearly show their adaptability.

In this anthropological sense, universities are doing exceedingly well, far better than ever before, churning resources and money over far faster than ever before. Grumps (like us) may point out the failings of lacking to live up to our own purported principles--but how is that different from any other engine of society?

In that anthropological sense, whether educating people 'properly' or not, whether claiming more discoveries that stand up to scrutiny, universities are doing very, very, very well. And that, not the purported reason that an institution exists, is the measure of how and why societal institutions persist or expand. Hypocrisy and self-justification, or even self-mythology, are always part of social organization. A long-standing anthropological technique for understanding distinguishes what are called emics, from etics: what people say they do, from what they actually do.

Yes, there will have to be some shrinkage with demographic changes, and fewer students attending college, but that doesn't change the fact that, by material measures, universities are incredibly successful parts of society.

What about the intended material aspect of the knowledge factory--knowledge?
But there is another important side to all of this, which takes us back to science itself, which I think is actually important, even if it is naive or pointless to crab at the hypocrisies of science that are explicable in deep societal terms.

This has to do with knowledge itself, and with science on its own terms and goals. It relates to what could, at least in principle, advance the science itself (assuming such changes could happen without first threatening science's and scientists' and universities' assets). That will be the subject of our next post.

The 'knowledge factory'

This post reflects much that is in the science news, in particular our current culture's romance with data (or, to be more market-savvy about it, Big Data). I was led to write this after listening to a BBC Radio program, The Inquiry, an ongoing series of discussions of current topics. This particular episode is titled Is The Knowledge Factory Broken?

Replicability: a problem and a symptom
The answer is pretty clearly yes. One of the clearest bits of evidence is the now widespread recognition that too many scientific results, even those published in 'major' journals, are not replicable. When even the same lab tries to reproduce previous results, they often fail. The biggest recent noise on this has been in the social, psychological, and biomedical sciences, but The Inquiry suggests that chemistry and physics also have this problem. If this is true, the bottom line is that we really do have a general problem!

But what is the nature of the problem? If the world out there actually exists and is the result of physical properties of Nature, then properly done studies that aim to describe that world should mostly be replicable. I say 'mostly' because measurement and other wholly innocent errors may lead to some false conclusion. Surprise findings that are the luck of the draw, just innocent flukes, draw headlines and are selectively accepted by the top journals. Properly applied, statistical methods are designed to account for these sorts of things. Even then, in what is very well known as the 'winner's curse', there will always be flukes that survive the test, are touted by the major journals, but pass into history unrepeated (and often unrepentant).

This, however, is just the tip of the bad-luck iceberg. Non-reproducibility is so much more widespread that what we face is more a symptom of underlying issues in the nature of the scientific enterprise itself today than an easily fixable problem. The best fix is to own up to the underlying problem, and address it.

Is it rats, or scientists who are in the treadmill?
Scientists today are in a rat-race, self-developed and self-driven, out of insatiability for resources, ever-newer technology, faculty salaries, hungry universities....and this system can be arguably said to inhibit better ideas. One can liken the problem to the famous skit in a candy factory, on the old TV show I Love Lucy. That is how it feels to many of those in academic science today.

This Inquiry episode about the broken knowledge factory tells it like it is....almost. Despite concluding that science is "sending careers down research dead-ends, wasting talent and massive resources, misleading all of us", in my view, this is not critical enough. The program suggests what I think are plain-vanilla, clearly manipulable 'solutions. They suggest researchers should post their actual data and computer program code in public view so their claims could be scrutinized, that researchers should have better statistical training, and that we should stop publishing just flashy findings. In my view, this doesn't stress the root and branch reform of the research system that is really necessary.

Indeed, some of this is being done already. But the deeper practical realities are that scientific reports are typically very densely detailed, investigators can make weaknesses hard to spot (this can be done inadvertently, or sometimes intentionally as authors try to make their findings dramatically worthy of a major journal--and here I'm not referring to the relatively rare actual fraud).

A deeper reality is that everyone is far too busy on what amounts to a research treadmill. The tsunami of papers and their online supporting documentation is far too overwhelming, and other investigators, including readers, reviewers and even co-authors are far too busy with their own research to give adequate scrutiny to work they review. The reality is that open-publishing of raw data and computer code etc. will not generally be very useful, given the extent of the problem.

Science, like any system, will always be imperfect because it's run by us fallible humans. But things can be reformed, at least, by clearing the money and job-security incentives out of the system--really digging out what the problem is. How we can support research better, to get better research, when it certainly requires resources, is not so simple, but is what should be addressed, and seriously.

We've made some of these points before, but with apology, they really do bear stressing and repeating. Appropriate measures should include:

(1) Stop paying faculty salaries on grants (have the universities who employ them, pay them);

(2) Stop using manipulable score- or impact-factor counting of papers or other counting-based items to evaluate faculty performance, and try instead to evaluate work in terms of better measures of quality rather than quantity;

(3) Stop evaluators considering grants secured when evaluating faculty members;

(4) Place limits on money, numbers of projects, students or post-docs, and even a seniority cap, for any individual investigator;

(5) Reduce university overhead costs, including the bevy of administrators, to reduce the incentive for securing grants by any means;

(6) Hold researchers seriously accountable, in some way, for their published work in terms of its reproducibility or claims made for its 'transformative' nature.

(7) Grants should be smaller in amount, but more numerous (helping more investigators) and for longer terms, so one doesn't have to start scrambling for the next grant just after having received the current one.

(8) Every faculty position whose responsibilities include research should come with at least adequate baseline working funds, not limited to start-up funds.

(9) Faculty should be rewarded for doing good research that does not require external funding but does address an important problem.

(10) Reduce the number of graduate students, at least until the overpopulation ebbs as people retire, or, at least, remove such number-counts from faculty performance evaluation.

Well, these are snarky perhaps and repetitive bleats. But real reform, beyond symbolic band-aids, is never easy, because so many people's lives depend on the system, one we've been building over more than a half-century to what it is today (some authors saw this coming decades ago and wrote with warnings). It can't be changed overnight, but it can be changed, and it can be done humanely.

The Inquiry program reflects things now more often being openly acknowledged. Collectively, we can work to form a more cooperative, substantial world of science. I think we all know what the problems are. The public deserves better. We deserve better!

PS. P.S.: In a next post, I'll consider a more 'anthropological' way of viewing what is happening to our purported 'knowledge factory'.

Even deeper, in regard to the science itself, and underlying many of these issues are aspects of the modes of thought and the tools of inference in science. These have to do with fundamental epistemological issues, and the very basic assumptions of scientific reasoning. They involve ideas about whether the universe is actually universal, or is parametric, or its phenomena replicable. We've discussed aspects of these many times, but will add some relevant thoughts in the near future.

The state of play in science

I've just read a new book that MT readers would benefit from reading as well. It's Rigor Mortis, by Richard Harris (2017: Basic Books). His subtitle is How sloppy science creates worthless cures, crushes hope, and wastes billions. One might suspect that this title is stridently overstated, but while it is quite forthright--and its argument well-supported--I think the case is actually understated, for reasons I'll explain below.

Harris, science reporter for National Public Radio, goes over many different problems that plague biomedical research. At the core is the reproducibility problem, that is, the numbers of claims by research papers that are not reproducible by subsequent studies. This particular problem made the news within the last couple of years in regard to using statistical criteria like p-values (significance cutoffs), and because of the major effort in psychology to replicate published studies, with a lot of failure to do so. But there are other issues.

The typical scientific method assumes that there is a truth out there, and a good study should detect its features. But if it's a truth, then some other study should get similar results. But many many times in biomedical research, despite huge media ballyhoo with cheerleading by the investigators as well as the media, studies' breakthrough!! findings can't be supported by further examination.

As Harris extensively documents, this phenomenon is seen in claims of treatments or cures, or use of animal models (e.g., lab mice), or antibodies, or cell lines, or statistical 'significance' values. It isn't a long book, so you can quickly see the examples for yourself. Harris also accounts for the problems, quite properly I think, by documenting sloppy science but also the careerist pressures on investigators to find things they can publish in 'major' journals, so they can get jobs, promotions, high 'impact factor' pubs, and grants. In our obviously over-crowded market, it can be no surprise to anyone that there is shading of the truth, a tad of downright dishonesty, conveniently imprecise work, and so on.

Since scientists feed at the public trough (or depend on profits and sales for biomedical products to grant-funded investigators), they naturally have to compete and don't want to be shown up, and they have to work fast to keep the funds flowing in. Rigor Mortis properly homes in on an important fact, that if our jobs depend on 'productivity' and bringing in grants, we will do what it takes, shading the truth or whatever else (even the occasional outright cheating) to stay in the game.

Why share data with your potential competitors who might, after all, find fault with your work or use it to get the jump on you for the next stage? For that matter, why describe what you did in enough actual detail that someone (a rival or enemy!) might attempt to replicate your work.....or fail to do so? Why wait to publish until you've got a really adequate explanation of what you suggest is going on, with all the i's dotted and t's crossed? Haste makes credit! Harris very clearly shows these issues in the all-too human arena of our science research establishment today. He calls what we have now, appropriately enough, a "broken culture" of science.

Part of that I think is a 'Malthusian' problem. We are credited, in score-counting ways, by chairs and deans, for how many graduate students we turn (or churn) out. Is our lab 'productive' in that way? Of course, we need that army of what often are treated as drones because real faculty members are too busy writing grants or traveling to present their (students') latest research to waste--er, spend--much time in their labs themselves. The result is the cruel excess of PhDs who can't find good jobs, wandering from post-doc to post-doc (another form of labor pool), or to instructorships rather than tenure-track jobs, or who simply drop out of the system after their PhD and post-docs. We know of many who are in that boat; don't you? A recent report showed that the mean age of first grant from NIH was about 45: enough said.

A reproducibility mirage
If there were one central technical problem that Harris stresses, it is the number of results that fail to be reproducible in other studies. Irreproducible results leave us in limbo-land: how are we to interpret them? What are we supposed to believe? Which study--if any of them--is correct? Why are so many studies proudly claiming dramatic findings that can't be reproduced, and/or why are the news media and university PR offices so loudly proclaiming these reported results? What's wrong with our practices and standards?

Rigor Mortis goes through many of these issues, forthrightly and convincingly--showing that there is a problem. But a solution is not so easy to come by, because it would require major shifting of and reform in research funding. Naturally, that would be greatly resisted by hungry universities and those who they employ to set up a shopping-mall on their campus (i.e., faculty).

One purpose of this post is to draw attention to the wealth of reasons Harris presents for why we should be concerned about the state of play in biomedical research (and, indeed, in science more generally). I do have some caveats, that I'll discuss below, but that is in no way intended to diminish the points Harris makes in his book. What I want to add is a reason why I think that, if anything, Harris' presentation, strong and clear as it is, understates the problem. I say this because to me, there is a deeper issue, beyond the many Harris enumerates: a deeper scientific problem.

Reproducibility is only the tip of the iceberg!
Harris stresses or even focuses on the problem of irreproducible results. He suggests that if we were to hold far higher evidentiary standards, our work would be reproducible, and the next study down the line wouldn't routinely disagree with its predecessors. From the point of view of careful science and proper inferential methods and the like, this is clearly true. Many kinds of studies in biomedical and psychological sciences should have a standard of reporting that leads to at least some level of reproducibility.

However, I think that the situation is far more problematic than sloppy and hasty standards, or questionable statistics, even if they are clearly a prominent ones. My view is that no matter how high our methodological standards are, the expectation of reproducibility flies in the face of what we know about life. That is because life is not a reproducible phenomenon in the way physics and chemistry are!

Life is the product of evolution. Nobody with open eyes can fail to understand that, and this applies to biological, biomedical, psychological and social scientists. Evolution is at its very core a phenomenon that rests essentially on variation--on not being reproducible. Each organism, indeed each cell, is different. Not even 'identical' twins are identical.

One reason for this is that genetic mutations are always occurring, even among the cells within our bodies. Another reason is that no two organisms are experiencing the same environment, and environmental factors affect and interact with the genomes of each individual organism of any species. Organisms affect their environments in turn. These are dynamic phenomena and are not replicable!

This means that, in general, we should not be expecting reproducibility of results. But one shouldn't overstate this because while obviously the fact that two humans are different doesn't mean they are entirely different. Similarity is correlated with kinship, from first-degree relatives to members of populations, species, and different species. The problem is not that there is similarity, it is that we have no formal theory about how much similarity. We know two samples of people will differ both among those in each sample and between samples. And, even the same people sampled at separate times will be different, due to aging, exposure to different environments and so on. Proper statistical criteria and so on can answer questions about whether differences seem only due to sampling from variation or from causal differences. But that is a traditional assumption from the origin of statistics and probability, and isn't entirely apt for biology: since we cannot assume identity of individuals, much less of samples or populations (or species, as in using mouse models for human disease), our work requires some understanding of how much difference, or what sort of difference, we should expect--and build into our models and tests etc.

Evolution is by its very nature an ad hoc phenomenon in both time and place, meaning that there are no fixed rules about this, as there are laws of gravity or of chemical reactions. That means that reproducibility is not, in itself, even a valid criterion for judging scientific results. Some reproducibility should be expected, but we have no rule for how much and, indeed, evolution tells us that there is no real rule for that.

One obvious and not speculative exemplar of the problem is the redundancy in our systems. Genomewide mapping has documented this exquisitely well: if variation at tens, hundreds, or sometimes even thousands of genome sites' affects a trait, like blood pressure, stature, or 'intelligence' and no two people have the same genotype, then no two people, even with the same trait measure have that measure for the same reason. And as is very well known, mapping only accounts for a fraction of the estimated heritability of the studied traits, meaning that much or usually most of the contributing genetic variation is unidentified. And then there's the environment. . . . .

It's a major problem. It's an inconvenient truth. The sausage-grinder system of science 'productivity' cannot deal with it. We need reform. Where can that come from?

The Law of No Restraint

There's a new law of science reporting or, perhaps more accurately put, of the science jungle. The law is to feed any story, no matter how fantastic, to science journalists (including your university's PR spinners), and they will pick up whatever can be spun into a Big Story, and feed it to the eager mainstream media. Caveats may appear somewhere in the stories, but not the headlines so that, however weak or tentative or incredible, the story gets its exposure anyway. Then on to tomorrow's over-sell.

One rationale for this is that unexpected findings--typically presented breathlessly as 'discoveries'--sell: they rate the headline. The caveats and doubts that might un-headline the story may be reported as well, but often buried in minimal terms late in the report. Even if the report balances skeptics and claimants, simply publishing the story is enough to give at least some credence to the discovery.

The science journalism industry is heavily inflated in our commercial, 24/7 news environment. It would be better for science, if not for sales, if all these hyped papers, rather than being publicized at the time the paper is published, first appeared in musty journals for specialists to argue over, and in the pop-sci news only after some mature judgments are made about them. Of course, that's not good for commercial or academic business.

We have just seen a piece reporting that humans were in California something like 135,000 years ago, rather than the well-established continental dates of about 12,000. The report which I won't grace by citing here, and you've probably seen it anyway, then went on to speculate about what 'species' of our ancestors these early guys might have been.

Why is this so questionable? If it were a finding on its own, it might seem credible, but given the plethora of skeletal and cultural archeological findings, up and down the Americas, such an ancient habitation seems a stretch. There is no comparable trail of earlier settlements in northeast Asia or Alaska that might suggest it, and there are lots of animal and human archeological remains--all basically consistent with each other, so why has no earlier finding yet been made? It is of course possible that this is the first and is a correct one, but it is far too soon for this to merit a headline story, even with caveats.

Another piece we saw today reported that a new analysis casts doubt on whether diets high in saturated fat are bad for you. This was a meta-analysis of various other studies that have been done, and got some headline treatment because the authors report that, contrary to many findings over many years, saturated fats don't clog arteries. Instead, they say, coronary heart disease is a chronic inflammatory condition. Naturally, the study's basic data are being challenged, as reflected in this story's discussion, by critiques of its data and method. These get into details we're not qualified to judge, and we can't comment on the relative merits of the case.

However, one thing we can note is that with respect to coronary heart disease, study after study has reported more or less the same, or at least consistent findings about the correlation between saturated fats and risk. Still, despite so very much careful science, including physiological studies as well as statistical analysis of population samples, can we still apparently not be sure about a dietary component that we've been told for years should play a much reduced role in what we eat? How on earth could we possibly still not know about saturated fat diets and disease risk?

If this very basic issue is unresolved after so long, and the story is similar for risk factors for many complex diseases, then what is all this promise of 'precise' medicine all about? Causal explanations are still fundamentally unclear for many cancers, dementias, psychiatric disorders, heart disease, and so on. So why isn't the most serious conclusion that our methods and approaches themselves are for some reason simply not adequate to answer such seemingly simple questions as 'is saturated fat bad for you?' Were the plethora of previous studies all flawed in some way? Is the current study? Do the publicizing of the studies themselves change behaviors in ways that affects future studies?

There may be no better explanation than that diets and physiology are hard to measure and are complex, and that no simple answer is true. We may all differ for genetic and other reasons to such an extent that population averages are untrustworthy, or our habits may change enough that studies don't get consistent answers. Or asking about one such risk factor when diets and lifestyles are complex is a science modus operandi that developed for studying simpler things (like exposure to toxins or bacteria, the basis of classical epidemiology), and we simply need a better gestalt from which to work.

Clearly a contributory sociological factor is that the science industry has simply been cruising down the same rails despite constant popping of promise bubbles, for decades now. It's always more money for more and bigger studies. It's rarely let's stop and take a deep breath and think of some better way to understand (in this case) dietary relationships to physical traits. In times past, at least, most stories like the ancient Californian didn't get ink so widely and rapidly. But if I'm running a journal, or a media network, or am a journalist needing to earn my living, and I need to turn a buck, naturally I need to write about things that aren't yet understood.

Unfortunately, as we've noted before, the science industry is a hungry beast that needs its continual feeding, and (like our 3 cats) always demands more, more, and more. There are ways we could reform things, at least up to a point. We'll never end the fact that some scientists will claim almost anything to get attention, and we'll always be faced with data that suggest one thing that doesn't turn out that way. But we should be able to temper the level of BS and get back more to sober science rather than sausage factory 'productivity'. And educate the public that some questions can't be answered the way we'd like, or aren't being asked in the right way. But that is something science might address effectively, if it weren't so rushed and pressured to 'produce'.

Reforming research funding and universities

Any aspect of society needs to be examined on a continual basis to see how it could be improved. University research, such as that which depends on grants from the National Institutes of Health, is one area that needs reform. It has gradually become an enormous, money-directed, and largely self-serving industry, and its need for external grant funding turns science into a factory-like industry, which undermines what science should be about, advancing knowledge for the benefit of society.

The Trump policy, if there is one, is unclear, as with much of what he says on the spur of the moment. He's threatened to reduce the NIH budget, but he's also said to favor an increase, so it's hard to know whether this represents whims du jour or policy. But regardless of what comes from on high, it is clear to many of us with experience in the system that health and other science research has become very costly relative to its promise and too largely mechanical rather than inspired.

For these reasons, it is worth considering what reforms could be taken--knowing that changing the direction of a dependency behemoth like NIH research funding has to be slow because too many people's self-interests will be threatened--if we were to deliver in a more targeted and cost-efficient way on what researchers promise. Here's a list of some changes that are long overdue. In what follows, I have a few FYI asides for readers who are unfamiliar with the issues.

1. Reduce grant overhead amounts

[ FYI: Federal grants come with direct and indirect costs. Direct costs pay the research staff, the supplies and equipment, travel and collecting data and so on. Indirect costs are worked out for each university, and are awarded on top of the direct costs--and given to the university administrators. If I get $100,000 on a grant, my university will get $50,000 or more, sometimes even more than $100K. Their claim to this money is that they have to provide the labs, libraries, electricity, water, administrative support and so on, for the project, and that without the project they'd not have these expenses. Indeed, an indicator of the fat that is in overhead is that as an 'incentive' or 'reward', some overhead is returned as extra cash to the investigator who generated it.]

University administrations have notoriously been ballooning. Administrators and their often fancy offices depend on individual grant overhead, which naturally puts intense pressure on faculty members to 'deliver'. Educational institutions should be lean and efficient. Universities should pay for their own buildings and libraries and pare back bureaucracy. Some combination of state support, donations, and bloc grants could be developed to cover infrastructure, if not tied to individual projects or investigators' grants.

2. No faculty salaries on grants

[ FYI: Federal grants, from NIH at least, allow faculty investigators' salaries to be paid from grant funds. That means that in many health-science universities, the university itself is paying only a fraction, often tiny and perhaps sometimes none, of their faculty's salaries. Faculty without salary-paying grants will be paid some fraction of their purported salaries and often for a limited time only. And salaries generate overhead, so they're now well paid: higher pay, higher overhead for administrators! Duh, a no-brainer!]

Universities should pay their faculty's salaries from their own resources. Originally, grant reimbursement for faculty investigators' salaries were, in my understanding, paid on grants so the University could hire temporary faculty to do the PI's teaching and administrative obligations while s/he was doing the research. Otherwise, if they're already paid to do research, what's the need? Faculty salaries paid on grants should only be allowed to be used in this way, not just as a source of cash. Faculty should not be paid on soft money, because the need to hustle one's salary steadily is an obvious corrupting force on scientific originality and creativity.

3. Limit on how much external funding any faculty member or lab could have

There is far too much reward for empire-builders. Some do, or at least started out doing, really good work, but that's not always the case and diminishing returns for expanding cost is typical. One consequence is that new faculty are getting reduced teaching and administrative duties so they can (must!) write grant applications. Research empires are typically too large to be effective and often have absentee PIs off hustling, and are under pressure to keep the factory running. That understandably generates intense pressure to play it safe (though claiming to be innovative); but good science is not a predictable factory product.

4. A unified national health database

We need health care reform, and if we had a single national health database it would reduce medical costs and could be anonymized so research could be done, by any qualified person, without additional grants. One can question the research value of such huge databases, as is true even of the current ad hoc database systems we pay for, but they would at least be cost-effective.

5. Temper the growth ethic

We are over-producing PhDs, and this is largely to satisfy the game of the current faculty by which status is gained by large labs. There are too many graduate students and post-docs for the long-term job market. This is taking a heavy personal toll on aspiring scientists. Meanwhile, there is inertia at the top, where we have been prevented from imposing mandatory retirement ages. Amicably changing this system will be hard and will require creative thinking; but it won't be as cruel as the system we have now.

6. An end to deceptive publication characteristics

We routinely see papers listing more authors than there are residents in the NY phone book. This is pure careerism in our factory-production mode. As once was the standard, every author should in principle be able to explain his/her paper on short notice. I've heard 15 minutes. Those who helped on a paper such as by providing some DNA samples, should be acknowledged, but not listed as authors. Dividing papers into least-publishable-units isn't new, but with the proliferation of journals, it's out of hand. Limiting CV lengths (and not including grants on them) when it comes to promotion and tenure could focus researchers' attention on doing what's really important rather than chaff-building. Chairs and Deans would have to recognize this, and move away from safe but gameable bean-counting.

[ FYI: We've moved towards judging people internally, and sometimes externally in grant applications, on the quantity of their publications rather than the quality, or on supposedly 'objective' (computer-tallied) citation counts. This is play-it-safe bureaucracy and obviously encourages CV padding, which is reinforced by the proliferation of for-profit publishing. Of course some people are both highly successful in the real scientific sense of making a major discovery, as well as in publishing their work. But it is naive not to realize that many, often the big players grant-wise, manipulate any counting-based system. For example, they can cite their own work in ways that increase the 'citation count' that Deans see. Papers with very many authors also lead to red-claiming that is highly exaggerated relative to the actual scientific contribution. Scientists quickly learn how to manipulate such 'objective' evaluation systems.]

7. No more too-big-and-too-long-to-kill projects

The Manhattan Project and many others taught us that if we propose huge, open-ended projects we can have funding for life. That's what the 'omics era and other epidemiological projects reflect today. But projects that are so big they become politically invulnerable rarely continue to deliver the goods. Of course, the PIs, the founders and subsequent generations, naturally cry that stopping their important project after having invested so much money will be wasteful! But it's not as wasteful as continuing to invest in diminishing returns. Project duration should be limited and known to all from the beginning.

8. A re-recognition that science addressing focal questions is the best science

Really good science is risky because serious new findings can't be ordered up like hamburgers at McD's. We have to allow scientists to try things. Most ideas won't go anywhere. But we don't have to allow open-ended 'projects' to scale up interminably as has been the case in the 'Big Data' era, where despite often-forced claims and PR spin, most of those projects don't go very far, either, though by their size alone they generate a blizzard of results.

9. Stopping rules need to be in place

For many multi-year or large-scale projects, an honest assessment part-way through would show that the original question or hypothesis was wrong or won't be answered. Such a project (and its funds) should have to be ended when it is clear that its promise will not be met. It should be a credit to an investigator who acknowledges that an idea just isn't working out, and those who don't should be barred for some years from further federal funding. This is not a radical new idea: it is precedented in the drug trial area, and we should do the same in research.

It should be routine for universities to provide continuity funding for productive investigators so they don't have to cling to go-nowhere projects. Faculty investigators should always have an operating budget so that they can do research without an active external grant. Right now, they have to piggy-back their next idea by using funds in their current grant, and without internal continuity funding, this is naturally leads to safe 'fundable' projects, rather than really innovative ones. The reality is that truly innovative projects typically are not funded, because it's easy for grant review panels to fault-find and move on the safer proposals.

10. Research funding should not be a university welfare program

Universities are important to society and need support. Universities as well as scientists become entrenched. It's natural. But society deserves something for its funding generosity, and one of the facts of funding life could be that funds move. Scientists shouldn't have a lock on funding any more than anybody else. Universities should be structured so they are not addicted to external funding on grants. Will this threaten jobs? Most people in society have to deal with that, and scientists are generally very skilled people, so if one area of research shrinks others will expand.

11. Rein in costly science publishing
Science publishing has become what one might call a greedy racket. There are far too many journals, rushing out half-way reviewed papers for pay-as-you-go authors. Papers are typically paid for on grant budgets (though one can ask how often young investigators shell out their own personal money to keep their careers). Profiteering journals are proliferating to serve the CV-padding hyper-hasty bean-counting science industry that we have established. Yet the vast majority of papers have basically no impact. That money should go to actual research.

12. Other ways to trim budgets without harming the science

Budgets could be trimmed in many other ways, too: no buying journal subscriptions on a grant (universities have subscriptions), less travel to meetings (we have Skype and Hangout!), shared costly equipment rather than a sequencer in every lab. Grants should be smaller but of longer duration, so investigators can spend their time on research rather than hustling new grants. Junk the use of 'impact' factors and other bean-counting ways of judging faculty. It had a point once--to reduce discrimination and be more objective, but it's long been strategized and manipulated, substituting quantity for quality. Better evaluation means are needed.

These suggestions are perhaps rather radical, but to the extent that they can somehow be implemented, it would have to be done humanely. After all, people playing the game today are only doing what they were taught they must do. Real reform is hard because science is now an entrenched part of society. Nonetheless, a fair-minded (but determined!) phase-out of the abuses that have gradually developed would be good for science, and hence for the society that pays for it.

***NOTES: As this was being edited, NY state has apparently just made its universities tuition-free for those whose families are not wealthy. If true, what a step back towards sanity and public good! The more states can get off the grant and other grant and strings-attached private donation hooks, the more independent they should be able to be.

Also, the Apr 12 Wall St Journal has a story (paywall, unless you search for it on Twitter) showing the faults of an over-stressed health research system, including some of the points made here. The article points out problems of non-replicability and other technical mistakes that are characteristic of our heavily over-burdened system. But it doesn't go after the System as such, the bureaucracy and wastefulness and the pressure for 'big data' studies rather than focused research, and the need to be hasty and 'productive' in order to survive.

Paid To Prey (PTP) journals

In the bad old days if you as a scientist had something worth saying, a journal would (after vetting through a mainly fair confidential review system) publish it. If you had good things to say, whether or not you had grants, your ideas were heard, and you could make a career on the basis of the depth of your thought, your careful results, and so on.

If you needed funds to do your research, such as to travel or run a laboratory, well, you needed a grant to do your work. This was the system we all knew. You had to have funding, but you couldn't just pay your way through to publishing. Also, if you were junior, start-up funds were typically made available if you needed them, to give you a leg up and a chance to get your career going.

Publishing has always had costs, of course, but the journals survived by library and personal subscriptions, often based on professional society memberships, where the fees were modest, especially for the most junior members.

Now what we have is a large pay-to-play (PTP) industry. Pay-to-play journals are almost synonymous with corruption. The mass of nearly-criminal ones prey on the career fears of desperate students, post-docs, and faculty (especially junior faculty, perhaps). Even the honest PTP journals, of which there are many, essentially prey on investigators, and taxpayers, but the horde of dishonorable ones are no better than highwaymen, robbing the most vulnerable. A story in the NY Times exposes some of the schemes and scams of the dishonorable PTPers. But it doesn't go nearly far enough.

How cruel is this rat race? Where does the PTP money come from?
We have every moral as well as fiscal right to ask where the PTP subscriptions are coming from. Are low-paid, struggling post-docs, students, junior or even more senior faculty members using their own personal funds to keep in the publication score-counting game? How much taxpayer money goes, even via legitimate grants, to these open-source publishers rather than to the research costs for which these grants were intended. In the past, you might have had to pay for color figures, or for reprints, and these costs did come generally from grant funds, but they were not very expensive. And of course grants often pay for faculty salaries (a major corruption of the system that nobody seems able to fix and on which too many depend to criticize).

The idea of open-source journals sounded good, and not like a private-profiteering scam. But too many have turned out to be the latter, chickens laying golden eggs even for the better journals, when there is profit to be made. The original, or at least more publicly proclaimed open-source idea was that even if you couldn't afford a subscription or didn't have access to a university library--especially, for example, if you were in a country with a paucity of science resources--you would have access to the world's top science anyway. But even if the best of the open-source organizations are non-profit, non-predatory PTP operations, and how would we know?, we are clearly preying on the fears of those desperate for careers in heavily oversubscribed, heavily Malthusian overpopulated science industries.

There is no secret about that, but too many depend on the growth model for there to be an easy fix, except the painful one of budget cuts. The system is overloaded and overworked and that suggests that even if everyone were doing his/her best, sloppy or even corrupt work would make it through the minimal PTP quality control sieve. And that makes it easy to see why many may be paying with personal funds or submitting sloppy (or worse) work--and too much of it, too fast.

There isn't any obvious solution in an overheated hyper-competitive system. We do have the web, however, and one might suggest shutting down the PTP industry, or at least somehow closing its predatory members, and using the web to publicize new findings. Perhaps some of the open review sources, like ArXiv, can deal with some of the peer reviewing issues to maintain a quality standard.

Of course, Deans and Chairs would have to actually do the work of evaluating the quality of their faculty members' works (beyond 'impact factors', grant totals, paper counts, and so on) to reward quality of thought rather than any quantity-based measures. That would require the administrators to actually think, know their fields, and take the time to do their jobs. Perhaps that's too much to ask of a system now sometimes proudly proclaiming it's on the 'business model'.

But what we're seeing is what we deserve because we've let it happen.

Post-truth science?

This year was one that shook normal politics to its core. Our belief in free and fair elections, in the idea that politicians strive to tell the truth and are ashamed to be caught lying, in real news vs fake, in the importance of tradition and precedent, indeed in the importance of science in shaping our world, have all been challenged. This has served to remind us that we can't take progress, world view, or even truth and the importance of truth themselves for granted. The world is changing, like it or not. And, as scientists who assume that truth actually exists and whose lives are devoted to searching for it, the changes are not in familiar directions. We can disagree with our neighbors about many things, but when we can't even agree on what's true, this is not the 'normal' world we know.

To great fanfare, Oxford Dictionaries chose "post-truth" as its international word of the year.

The use of “post-truth” — defined as “relating to or denoting circumstances in which objective facts are less influential in shaping public opinion than appeals to emotion and personal belief” — increased by 2,000 percent over last year, according to analysis of the Oxford English Corpus, which collects roughly 150 million words of spoken and written English from various sources each month. New York Times

I introduce this into a science blog because, well, I see some parallels with science. As most of us know, Thomas Kuhn, in his iconic book, The Structure of Scientific Revolutions, wrote about "normal science", how scientists go about their work on a daily basis, theorizing, experimenting, and synthesizing based on a paradigm, a world view that is agreed upon by the majority of scientists. (Although not well recognized, Kuhn was preceded in this by Ludwik Fleck, Polish and Israeli physician and biologist who, way back in the 1930s, used the term 'thought collective' for the same basic idea.)

When thoughtful observers recognize that an unwieldy number of facts no longer fit the prevailing paradigm, and develop a new synthesis of current knowledge, a 'scientific revolution' occurs and matures into a new normal science. In the 5th post in Ken's recent thought-provoking series on genetics as metaphysics, he reminded us of some major 'paradigm shifts' in the history of science -- plate tectonics, relativity and the theory of evolution itself.

We have learned a lot in the last century, but there are 'facts' that don't fit into the prevailing gene-centered, enumerative, reductive approach to understanding prediction and causation, our current paradigm. If you've read the MT for a while, you know that this is an idea we've often kicked around. In 2013 Ken made a list of 'strange facts' in a post he called "Are we there yet or do strange things about life require new thinking?" I repost that list below because I think it's worth considering again the kinds of facts that should challenge our current paradigm.

As scientists, our world view is supposed to be based on truth. We know that climate change is happening, that it's automation not immigration that's threatening jobs in the US, that fossil fuels are in many places now more costly than wind or solar. But by and large, we know these things not because we personally do research into them all -- we can't -- but because we believe the scientists who do carry out the research and who tell us what they find. In that sense, our world views are faith-based. Scientists are human, and have vested interests and personal world views, and seek credit, and so on, but generally they are trustworthy about reporting facts and the nature of actual evidence, even if they advocate their preferred interpretation of the facts, and even if scientists, like anyone else, do their best to support their views and even their biases.

Closer to home, as geneticists, our world view is also faith-based in that we interpret our observations based on a theory or paradigm that we can't possibly test every time we invoke it, but that we simply accept. The current 'normal' biology is couched in the evolutionary paradigm often based on ideas of strongly specific natural selection, and genetics in the primacy of the gene.

The US Congress just passed a massive bill in support of normal science, the "21st Century Cures Act", with funding for the blatant marketing ploys of the brain connectome project, the push for "Precision Medicine" (first "Personalized Medicine, this endeavor has been, rebranded -- cynically? --yet again to "All of Us") and the new war on cancer. These projects are nothing if not born of our current paradigm in the life sciences; reductive enumeration of causation and the ability to predict disease. But the many well-known challenges to this paradigm lead us to predict that, like the Human Genome Project which among other things was supposed to lead to the cure of all disease by 2020, these endeavors can't fulfill their promise.

To a great if not even fundamental extent, this branding is about securing societal resources, for projects too big and costly to kill, in a way similar to any advertising or even to the way churches promise heaven when they pass the plate. But it relies on wide-spread acceptance of contemporary 'normal science', despite the unwieldy number of well-known, misfitting facts. Even science is now perilously close to 'post-truth' science. This sort of dissembling is deeply built into our culture at present.

We've got brilliant scientists doing excellent work, turning out interesting results every day, and brilliant science journalists who describe and publicize their new findings. But it's almost all done within, and accepting, the working paradigm. Too few scientists, and even fewer writers who communicate their science, are challenging that paradigm and pushing our understanding forward. Scientists, insecure and scrambling not just for insight but for their very jobs, are pressed explicitly or implicitly to toe the current party line. In a very real sense, we're becoming more dedicated to faith-based science than we are to truth.

Neither Ken nor I are certain that a new paradigm is necessary, or that it's right around the corner. How could we know? But, there are enough 'strange facts', that don't fit the current paradigm centered around genes as discrete, independent causal units, that we think it's worth thinking about whether a new synthesis, that can incorporate these facts, might be necessary. It's possible, as we've often said, that we already know everything we need to know: that biology is complex, genetics is interactive not iterative, every genome is unique and interacts with unique individual histories of exposures to environmental risk factors, evolution generates difference rather than replicability, and we will never be able to predict complex disease 'precisely'.

But it's also possible that there are new ways to think about what we know, beyond statistics and population-based observations, to better understand causation. There are many facts that don't fit the current paradigm, and more smart scientists should be thinking about this as they carry on with their normal science.

---------------------------------
Do strange things about life require new concepts?

1. The linear view of genetic causation (cis effects of gene function, for the cognoscenti) is clearly inaccurate. Gene regulation and usage are largely, if not mainly, not just local to a given chromosome region (they are trans);

2. Chromosomal usage is 4-dimensional within the nucleus, not even 3-dimensional, because arrangements are changing with circumstances, that is, with time;

3. There is a large amount of inter-genic and inter-chromosomal communication leading to selective expression and non-expression at individual locations and across the genome (e.g., monoallelic expression). Thousands of local areas of chromosomes wrap and unwrap dynamically depending on species, cell type, environmental conditions, and the state of other parts of the genome at a given time;

4. There is all sorts of post-transcription modification (e.g., RNA editing, chaperoning) that is a further part of 4-D causation;

5. There is environmental feedback in terms of gene usage, some of which is inherited (epigenetic marking) that can be inherited and borders on being 'lamarckian';

6. There are dynamic symbioses as a fundamental and pervasive rather than just incidental and occasional part of life (e.g., microbes in humans);

7. There is no such thing as 'the' human genome from which deviations are measured. Likewise, there is no evolution of 'the' human and chimpanzee genome from 'the' genome of a common ancestor. Instead, perhaps conceptually like event cones in physics, where the speed of light constrains what has happened or can happen, there are descent cones of genomic variation descending from individual sequences--time-dependent spreading of variation, with time-dependent limitations. They intertwine among individuals though each individual's is unique. There is a past cone leading of ancestry to each current instance of a genome sequence, from an ever-widening set of ancestors (as one goes back in time) and a future cone of descendants and their variation that's affected by mutations. There are descent cones in the genomes among organisms, and among organisms in a species, and between species. This is of course just a heuristic, not an attempt at a literal simile or to steal ideas from physics!

Light cone: Wikipedia

8. Descent cones exist among the cells and tissues within each organism, because of somatic mutation, but the metaphor breaks down because they have strange singular rather than complex ancestry because in individuals the go back to a point, a single fertilized egg, and of individuals to life's Big Bang;

9. For the previous reasons, all genomes represent 'point' variations (instances) around a non-existent core that we conceptually refer to as 'species' or 'organs', etc.('the' human genome, 'the' giraffe, etc.);

10. Enumerating causation by statistical sampling methods is often impossible (literally) because rare variants don't have enough copies to generate 'significance', significance criteria are subjective, and/or because many variants have effects too small to generate significance;

11. Natural selection, that generates current variation along with chance (drift) is usually so weak that it cannot be demonstrated, often in principle, for similar statistical reasons: if cause of a trait is too weak to show, cause of fitness is too weak to show; there is not just one way to be 'adapted'.

12. Alleles and genotypes have effects that are inherently relativistic. They depend upon context, and each organism's context is different;

13. Perhaps analogously with the ideal gas law and its like, phenotypes seem to have coherence. We each have a height or blood pressure, despite all the variation noted above. In populations of people, or organs, we find ordinary (e.g., 'bell-shaped') distributions, that may be the result of a 'law' of large numbers: just as human genomes are variation around a 'platonic' core, so blood pressure is the net result of individual action of many cells. And biological traits are typically always changing;

14. 'Environment' (itself a vague catch-all term) has very unclear effects on traits. Genomic-based risks are retrospectively assessed but future environments cannot, in principle, be known, so that genomic-based prediction is an illusion of unclear precision;

15. The typical picture is of many-to-many genomic (and other) causation for which many causes can lead to the same result (polygenic equivalence), and many results can be due to the same cause (pleiotropy);

16. Our reductionist models, even those that deal with networks, badly under-include interactions and complementarity. We are prisoners of single-cause thinking, which is only reinforced by strongly adaptationist Darwinism that, to this day, makes us think deterministically and in terms of competition, even though life is manifestly a phenomenon of molecular cooperation (interaction). We have no theory for the form of these interactions (simple multiplicative? geometric?).

17. In a sense all molecular reactions are about entropy, energy, and interaction among different molecules or whatever. But while ordinary nonliving molecular reactions converge on some result, life is generally about increasing difference, because life is an evolutionary phenomenon.

18. DNA is itself a quasi-random, inert sequence. Its properties come entirely from spatial, temporal, combinatorial ('Boolean'-like) relationships. This context works only because of what else is in (and on the immediate outside) of the cell at the given time, a regress back to the origin of life.

Is genetics still metaphysical? Part III. Or could that be right after all?

In the two prior parts of this little series (I and II), we've discussed the way in which unknown, putatively causative entities were invoked to explain their purported consequences, even if the agent itself could not be seen or its essence characterized. Atoms and an all-pervasive ether are examples. In the last two centuries, many scientists followed some of the principles laid down in the prior Enlightenment period, and were intensely empirical, to avoid untrammeled speculation. Others followed long tradition and speculated about the underlying essentials of Nature that could account for the empiricists' observations. Of course, in reality I think most scientists, and even strongly religious people, believed that Nature was law-like: there were universally true underlying causative principles. The idea of empiricism was to escape the unconstrained speculation that was the inheritance even from the classical times (and, of course, from dogmatic religious explanations of Nature). Repeated observation was the key to finding Nature's patterns, which could only be understood indirectly. I'm oversimplifying, but this was largely the situation in 19th and early 20th century physics and it became true of historical sciences like geology, and in biology during the same time.

At these stages in the sciences, free-wheeling speculation was denigrated as delving in metaphysics, because only systematic empiricism--actual data!--could reveal how Nature worked. I've used the term 'metaphysics' because in the post-Enlightenment era it has had and been used in a pejorative sense. On the other hand, if one cannot make generalizations, that is, infer Nature's 'laws', then one cannot really turn retrospective observation into prospective prediction.

By the turn of the century, we had Darwin's attempt at Newtonian law-like invocation of natural selection as a universal force for change in life, and we had Mendel's legacy that said that causative elements, that were dubbed 'genes', underlay the traits of Nature's creatures. But a 'gene' had never actually been 'seen', or directly identified until well into the 20th century. What, after all, was a 'gene'? Some sort of thing? A particle? An action? How could 'it' account for traits as well as their evolution? To many, the gene was a convenient concept that was perhaps casually and schematically useful, but not helpful in any direct way. Much has changed, or at least seems to have changed since then!

Genetics is today considered a mainline science, well beyond the descriptive beetle-collecting style of the 19th century. We now routinely claim to identify life's causative elements as distinct, discrete segments of DNA sequence, and a gene is routinely treated as causing purportedly 'precisely' understandable effects. If raw Big Data empiricism is the Justification du Jour for open-ended mega-funding, the implicit justifying idea is that genomics is predictive the way gravity and relativity and electromagnetism are--if only we had enough data! Only with Big Data can we identify these distinct, discrete causal entities, characterize their individual effects and use that for prediction, based on some implicit theory or law of biological causation. It's real science, not metaphysics!

But even with today's knowledge, how true is that?

The inherent importance of context-dependency and alternative paths
It seems obvious that biological causation is essentially relative in nature: it fundamentally involves context and relationships. Treating genes as individual, discrete causal agents really is a form of metaphysical reification, not least because it clearly ignores what we know about genetics itself. As we saw earlier, today there is no such thing as 'the' gene, much less one we can define as the discrete unit of biological function. Biological function seems inherently about interactions. The gene remains in that sense, to this day, a metaphysical concept--perhaps even in the pejorative sense, because we know better!

We do know what some 'genes' are: sequences coding for protein or mature RNA structure. But we also know that much of DNA has function unrelated to the stereotypical gene. A gene has multiple exons and often differently spliced (among many other things, including antisense RNA post-transcription regulation, and RNA editing), combined with other 'genes' to contribute to some function. A given DNA coding sequence often is used in different contexts in which 'its' function depends on local context-specific combinations with other 'genes'. There are regulatory DNA sequences, sequences related to the packaging and processing of DNA, and much more. And this is just the tip of the current knowledge iceberg; that is, we know there's the rest of the iceberg not yet known to us.

Indeed, regardless of what is said and caveats offered here and there as escape clauses, in practice it is routinely assumed that genes are independent, discrete agents with additive functional effects, even though this additivity is a crude result of applying generic statistical rather than causal models, mostly to whole organisms rather than individual cells or gene products themselves. Our methods of statistical inference are not causal models as a rule but really only indicate whether, more probably than not, in a given kind of sample and context a gene actually 'does' anything to what we've chosen to measure. Yes, Virginia, the gene concept really is to a great extent still metaphysical.

But isn't genomic empiricism enough? Why bother with metaphysics (or whatever less pejorative-sounding term you prefer)? Isn't it enough to identify 'genes', however we do it, and estimate their functions empirically, regardless of what genes actually 'are'? No, not at all. As we noted yesterday, without an underlying theory, we may sometimes be able to make generic statistical 'fits' to retrospective data, but it is obvious, even in some of the clearest supposedly single-gene cases, that we do not have strong bases for extrapolating such findings in direct causal or predictive terms. We may speak as if we know what we're talking about, but those who promise otherwise are sailing as close to the wind as possible.

That genetics today is still rather metaphysical, and rests heavily on fancifully phrased but basically plain empiricism, does not gainsay that fact that we are doing much more than just empiricism, in many areas, and we try to do that even in Big Promise biomedicine. We do know a lot about functions of DNA segments. We are making clear progress in understanding and combatting diseases and so on. But we also know, as a general statement, that even in closely studied contexts, most organisms have alternative pathways to similar outcomes and the same mutation introduced into different backgrounds (in humans, because the causal probabilities vary greatly and are generally low, and in different strains of laboratory animals) often has different effects. We already know from even the strongest kind of genetic effects (e.g., BRCA1 mutations and breast cancer) that extrapolation of future risk from retrospective data-fitting can be grossly inaccurate. So our progress is typically a lot cruder than our claims about it.

An excuse that is implicit and sometimes explicit is that today's Big Data 'precision, personalized' medicine, and much of evolutionary inference, are for the same age-old argument good simply because they are based on facts, on pure empiricism, not resting on any fancy effete intellectual snobs' theorizing: We know genes cause disease (and everything else) and we know natural selection causes our traits. And those in Darwinian medicine know that everything can be explained by the 'force' of natural selection. So just let us collect Big Data and invoke these 'theories' superficially as justification, and mint our predictions!

But--could it be that the empiricists are right, despite not realizing why? Could it be that the idea that there is an underlying theory or law-like causal reality, of which Big Data empiricism provides only imperfect reflections, really is, in many ways, only a hope, but not a reality?

Or is life essentially empirical--without a continuous underlying causal fabric?
What if Einstein's dream of a True Nature, that doesn't play dice with causation, was a nightmare. In biology, in particular, could it be that there isn't a single underlying, much less smooth and deterministic, natural law? Maybe there isn't any causal element of the sort being invoked by terms like 'gene'. If an essential aspect of life is its lack of law-like replicability, the living world may be essentially metaphysical in the usual sense of there being no 'true' laws or causative particles as such. Perhaps better stated, the natural laws of life may essentially be that life does not following any particular law, but is determined by universally unique local ad hoc conditions. Life is, after all, the product of evolution and if our ideas about evolution are correct, it is a process of diversification rather than unity, of local ad hoc conditions rather than universal ones.

To the extent this is the reality, ideas like genes may be largely metaphysical in the common sense of the term. Empiricism may in fact be the best way to see what's going on. This isn't much solace, however, because if that's the case then promises of accurate predictability from existing data may be culpably misleading, even false in the sense that a proper understanding of life would be that such predictions won't work to a knowable extent.

I personally think that a major problem is our reliance on statistical analysis and its significance criteria, that we can easily apply but that have at best only very indirect relationship to any underlying causal fabric, and that 'indirect' means largely unknowably indirect. Statistics in this situation is essentially about probabilistic comparisons, and has little or often no basis in causal theory, that is, in the reason for observed differences. Statistics work very well for inference when properly distributed factors, such as measurement errors, are laid upon some properly framed theoretically expected result. But when we have no theory and must rely on internal comparisons and data fitting, as between cases and controls, then we often have no way to know what part of our results has to do with sampling etc. and where any underlying natural laws, might be in the empirical mix--if such laws even exist.

Given this situation, the promise of 'precision' can be seen starkly as a marketing ploy rather than knowledgeable science. It's a distraction to the public but also to the science itself, and that is the worst thing that can happen to legitimate science. For example, if we can't really predict based on any serious-level theory, we can't tell how erroneous future predictions will be relative to existing retrospective data-fitting so we can't, largely even in principle, know how much this Big Data romance will approximate any real risk truths, because true risks (of some disease or phenotype) may not exist as such or may depend on things, like environmental exposures and behavior, that cannot be known empirically (and perhaps not even in theory), again, even in principle.

Rethinking is necessary, but in our current System of careerism and funding, we're not really even trying to lay out a playing field that will stimulate the required innovation in thought. Big Data advocates sometimes openly, without any sense of embarrassment, say that serendipity will lead those with Big Data actually to find something important. But deep insight may not be stimulated as long as we aren't even aware that we're eschewing theory basically in favor of pure extrapolated empiricism--and that we have scant theory even to build on.

There are those of us who feel that a lot more attention and new kinds of thinking need to be paid to the deeper question of how living Nature 'is' rather than very shaky empiricism that is easy, if costly, to implement but whose implications are hard to evaluate. Again, based on current understanding, it is quite plausible that life, based on evolution which is in turn based on difference rather than replicability, simply is not a phenomenon that obeys natural law in the way oxygen atoms, gravity, and even particle entanglement do.

To the extent that is the case, we are still in a metaphysical age, and there may be no way out of it.

Masal ve Hikayeler