Πέμπτη, Απριλίου 04, 2013
On January 6, 2011, 24 year old hacker and activist Aaron Swartz was arrested by police at the Massachusetts Institute of Technology for downloading several million articles from an online archive of research journals called JSTOR.
After Swartz committed suicide earlier this year in the face of legal troubles arising from this incident, questions were raised about why MIT, whose access to JSTOR he exploited, chose to pursue charges, and what motivated the US Department of Justice to demand jail time for his transgression.
But the question that should have been asked is why downloading scholarly research articles was a crime in the first place. Why, twenty years after the birth of the modern Internet, is it a felony to download works that academics chose to share with the world?
The Internet, after all, was invented so that scientists could communicate their research results with each other. But while you can now get immediate, free access to 675 million videos of cats (I checked this number today), the scholarly literature – one of greatest public works projects of all time – remains locked behind expensive pay walls.
Every year universities, governments and other organizations spend in excess of $10 billion dollars to buy back access to papers their researchers gave to journals for free, while most teachers, students, health care providers and members of the public are left out in the cold.
Even worse, the stranglehold existing journals have on academic publishing has stifled efforts to improve the ways scholars communicate with each other and the public. In an era when anyone can share anything with the entire world at the click of a button, the fact that it takes a typical paper nine months to be published should be a scandal. These delays matter – they slow down progress and in many cases literally cost lives.
Tonight, I will describe how we got to this ridiculous place. How twenty years of avarice from publishers, conservatism from researchers, fecklessness from universities and funders, and a basic lack of common sense from everyone has made the research community and public miss the manifest opportunities created by the Internet to transform how scholars communicate their ideas and discoveries.
I will also talk about what some of us have been doing to liberate the scholarly literature – where we have succeeded and where there is more work to be done. And finally, with these efforts gaining traction, I will describe where we are going next.
While I talk, I want you to keep in mind that this is about more than just academic publications. This is about the future of the Internet and what we are willing to do, as individuals and societies, to ensure that information that should be free IS free. If we can’t figure out how to make scientific and scholarly works – most of which were funded by taxpayers and published by authors with no expectation of being paid – freely available, we will struggle to do it in cases where the conditions for free access are less ripe.
One last bit of introduction. I am a scientist, and so, for the rest of this talk, I am going to focus on the scientific literature. But everything I will say holds equally true for other areas of scholarship.
Most people date the birth of the modern scientific journal to the middle of the 17th century, when the Royal Society in England took advantage of the growing printing industry to begin publishing proceedings of their meetings for the benefit of members unable to attend, as well as for posterity.
But scholarly journals as we know them were really a product of the 19th century, when growing activity and public interest in science led to the creation of most of the big titles we know about today: Science, Nature, The New England Journal of Medicine, The Journal of the American Medical Association and The Lancet published their first editions in the 1800’s.
They had noble missions. For example, the preface to the first edition of Science in July 1880 stated that its goal was to “afford scientific workers in the United States the opportunity of promptly recording the fruits of their researches, and facilities for communication between one another and the world”.
Like their predecessor, these journals were enabled by the technologies of the industrial revolution – steam powered rotary printing presses and efficient rail-based mail service. But they were also severely limited by them. Printing and shipping articles around the country and the world was expensive, and because of this, two key features of modern journals were established.
First, journals limited what they printed, choosing for publication only those works deemed to be of the greatest interest to their target audience. And second, they sold subscriptions – sending copies only to those who had paid. While intrinsically restricting, this business arrangement made sense. Every printed copy of a journal incurred a cost to the publisher, and charging readers meant revenues scaled with costs.
As science grew, so too did science publishing, with increasingly specific journals emerging to cater to new disciplines. By 1990 there were around 5,000 scientific journals in circulation, all of them printed and shipped to subscribers. And the costs were skyrocketing. If you were lucky enough to be at a major research university, you could find most of these journals in the library. But most scientists had to make do with a small subset – whatever their library could afford. And the public was all but completely shut out.
Then along came the Internet.
Scientific journals, serving a computer savvy audience with access to fast Internet connections through universities, were amongst the first commercial ventures to take advantage of this new technology. Within a few years – from 1995 to 1998 – virtually all major publishers put versions of their printed journals online.
But in doing so they made a crucial and fateful choice. Rather than adopting their business model to the new medium, they stuck with the same subscription-based system that they used for their print journals. And why not – so long as scientists were still giving them papers, and universities were buying them back, it was a great business. An even better one given that they no longer had to pay for printing and shipping.
But with this major shift in the means of dissemination, what was once a common sense way for publishers to provide a valuable service while dealing with the limitations of available technology became an irrational impediment to achieving this very goal.
To understand just how crazy this system is, you need to understand a bit more about how scientific journals work and what the life cycle of a scientific idea looks like.
Take your typical scientist at my home institution – the University of California Berkeley. She draws a salary from the state of California, and works in a building funded by the state. When she has a new idea, she goes out and raises money to buy equipment and supplies and to pay the salaries of the students and staff who will actually do the work. In all likelihood this money will come from the US government – through agencies like the NIH or NSF. And if not from them, from a public minded non-profit or foundation like the Howard Hughes Medical Institute that funds my lab. This scientist and her students then spend a great deal of time – usually years – pursuing the idea, until they finally have a result they want to share with their peers.
So they sit down and write a paper describing why they were interested in the question, what they did, how they did it, what they found, and what they think it means.
And then they hopefully submit it to one of the 10,000 journals currently in operation – choosing based on scope and importance. With few exceptions, these journals work the same way. The paper is assigned to an editor – sometimes a salaried professional, but usually a practicing scientist volunteering their time. They read the paper and decide who in the field is in the best position to evaluate the authors’ methods, data and conclusions. They send the paper to these scientists – who again are volunteering their time as a service to the community – who read it and render their opinion on the paper’s technical merits and suitability to the journal in question. The editor looks at all these reviews and decides whether to accept, modify or reject the work. If the paper is accepted, the journal takes the manuscript, converts it into a publishable form, and posts it on the web. If the paper is not accepted, the scientists either go back and do some more work and rewrite the paper, or they send it to another journal, triggering a complete reprise of the entire process.
I want you to note just how little the journal actually does here.
They didn’t come up with the idea. They didn’t provide the grant. They didn’t do the research. They didn’t write the paper. They didn’t review it. All they did was provide the infrastructure for peer review, oversee the process, and prepare the paper for publication. This is a tangible, albeit minor, contribution, that pales in comparison to the labors of the scientists involved and the support from the funders and sponsors of the research.
And yet, for this modest at best role in producing the finished work, publishers are rewarded with ownership of – in the form of copyright – and complete control over the finished, published work, which they turn around and lease back to the same institutions and agencies that sponsored the research in the first place. Thus not only has the scientific community provided all the meaningful intellectual effort and labor to the endeavor, they’re also fully funding the process.
Universities are, in essence, giving an incredibly valuable product – the end result of an investment of more than a hundred billion dollars of public funds every year – to publishers for free, and then they are paying them an additional ten billion dollars a year to lock these papers away where almost nobody can access them.
It would be funny if it weren’t so tragically insane.
To appreciate just how bizarre this arrangement is, I like the following metaphor. Imagine you are an obstetrician setting up a new practice. Your colleagues all make their money by charging parents a fee for each baby they deliver. It’s a good living. But you have a better idea. In exchange for YOUR services you will demand that parents give every baby you deliver over to you for adoption, in return for which you agree to lease these babies back to their parents provided they pay your annual subscription fee.
Of course no sane parent would agree to these terms. But the scientific community has.
And the consequences are severe.
Even though the entire scientific and medical literature is, in principle, available at the click of a mouse to anyone with an Internet connection – very few people have access to the entirety of this information.
This is most obviously a problem for people facing important medical decisions who have no access to the most up-to-date research on their conditions – research their tax dollars paid for. In a world where patients are increasingly involved in health care decisions, and where all sorts of sketchy medical information is available online, it is criminal that they do not have access to high quality research on whatever ails them and potential ways to treat it.
Astonishingly, many physicians and health care providers also lack access to basic medical research. Journal subscriptions in medicine are very expensive, and most doctors have access to only a handful of journals in their specialty.
But this lack of access is not just important in the doctor’s office. Scores of talented scientists across the world are blind to the latest advances that could affect their research. And in this country students and teachers at high schools and small colleges are denied access to the latest work in the fields they are studying – driving them to learn from textbooks or Wikipedia rather than the primary research literature. Technology startups often can not afford to access to the basic research they are trying to translate into useful products.
And interested members of the public – like many of you – find it difficult to engage with scientific research. Is it any wonder that such a large fraction of the population rejects basic scientific findings when the scientific community thumbs its collective noses at the them by making it impossible for them to read about what we’re doing with all of their money? Many in the publishing industry dismiss the idea that the public even wants to read scientific papers, pointing to their often highly technical language. But a major reason these papers are so inscrutable is that their authors conceive of their audience very narrowly – basically scholars in their field. And if you have no expectation that the public will read your work, you do not write it to be accessible to the public.
But even if you have no interest in ever reading a scientific paper, you should care deeply about this issue. Because in addition to pay walls, the balkanization of the scientific literature into hundreds of publisher fiefdoms stops researchers from developing new ways to organize, extract information from and improve the navigability and utility of the scientific literature. It is astonishing, for example, that to this day there is no dedicated search engine that allows you to search the full-text of every published scientific paper. This makes researchers less effective and limits the value we all get from the billions of dollars we invest in science every year.
And the greatest tragedy of all is that this is completely unnecessary.
Back in the 1990’s several people began promoting a simple alternative model. The idea was to treat science publishing like a service, with publishers getting paid a fee for the value they provide, but once this fee is paid, the finished product would effectively enter the public domain rather than the publishers private one.
One of the people pushing this new model – now known as “open access” – was my postdoctoral advisor at Stanford, Pat Brown, who enlisted me in his crusade. After failing to convince existing publishers to adopt this model – they generally met this idea with laughter if not outright hostility – the two of us, along with former NIH Director Harold Varmus, launched a non-profit publisher – which we dubbed the Public Library of Science or PLOS – determined to prove that this model would work.
After all, universities were already forking over billions of dollars to support publishers. We were offering them a better deal – access for everyone at a lower price. But, while logic and value were on our side, and we got statements of support from within and outside the scientific community, when push came to shove, only a small group of pioneers joined us. And the reason was that publishers had one very powerful card up their sleeve.
Although scientists do not get paid when the papers they submit to research journals get published, they nonetheless receive something of very high value. Academia is an industry of prestige, and the currency in which prestige is traded is journal titles. In most scientists’ minds, a publication in an elite journal like Nature or Science is as good as gold – a ticket to a job, grants and tenure. And the allure of these publications is so high that most scientists continue to choose journals based entirely on their prestige, even while they acknowledge that their business practices are bad for science and the world.
Realizing that our biggest obstacle was overcoming the prestige of established subscription based journals, PLOS launched with two journals that adopted the same elitist editorial policies of Science, Nature and their ilk – PLoS Biology for basic life sciences and PLoS Medicine for the clinical world. We hired professional editors from others in the industry, built fancy editorial boards and had a suite of Nobel Prize winners singing our praises.
But prestige is a difficult thing to engineer. Colleagues, friends and even family members would stipulate all the flaws in the current system and praise what we were doing, but, when they had a high profile paper, would turn around and send it to the same old subscription journals. It was a very frustrating experience.
I’d like to say that I understood why they made these decisions. But I didn’t. I thought – and still think – they were just being cowardly. And when I suggested they were being chickens by sending papers to Science or Nature they would complain that they couldn’t because their jobs – or their trainees jobs – were at stake.
I didn’t think they were right. But the truth is that I didn’t have a lot of evidence to show them. At the same time we were starting PLOS, I was starting my own lab in Berkeley. Senior colleagues, knowing about my extracurricular activities, took me aside and warned that I would never get grants or tenure if I didn’t publish my work in the old guard high profile journals, and that I would ruin the careers of my trainees if I put my principles over practical realities.
I didn’t want to believe them. I wanted to believe if I did good work people would notice. I wanted to believe that success in science did not require capitulating to stupid, destructive traditions. I also knew I’d look like a total hypocrite if I failed to live up to my own exhortations.
So I made a commitment that every paper from my would go to journals that made them freely available from day one. And, over 13 years, I have stuck completely to my pledge. And you know what? The sky didn’t fall. I got grants. Then I got a tenure track job at Berkeley (I had started out at the National Lab up the hill). Then I got tenure. And then I was named an investigator with the Howard Hughes Medical Institute – a coveted award that now funds most of my research. And the people in my lab have not suffered either. My graduate students have received fellowships and gone on to land plum postdoctoral positions – except for the one who went to Face Book and is now a millionaire – and my postdoctoral fellows have all gotten faculty positions at good schools.
But despite this, most of my colleagues still stand by the “I need to publish in Journal Blah in order to get” whatever goal they were seeking at the time.
Fortunately, publishing decisions are not entirely in the hands of individual investigators. In 2008, under pressure from Congress to provide taxpayers access to work they fund, the National Institutes of Health – who funds about $30 billion dollars of research every year – implemented a public access policy requiring that grantees make their work available through the National Library of Medicine.
This was an important landmark in the history of the access movement, as, for the first time, a major funding agency was making it a condition of receiving a grant that authors make their works available to the public. And the policy has been successful – 80% of NIH funded works published in 2011 are now freely available online – there’s nothing like the threat of losing funding to get people to do the right thing.
Unfortunately, under heavy lobbying pressure from publishers, the NIH policy allows for up to a years delay between publication and the provision of free access. While better than nothing, delayed access to the literature no more provides the public with access to the latest advances in biomedical research than handing out year old copies of the New York Times keeps everyone up to date on the latest World events.
And, again under pressure from Congress, earlier this year the Obama administration weighed in on the matter, directing other federal agencies that fund large amounts of research to develop their own public access policies. The White House said all the right things about the importance of public access – and got a lot of positive press. But unfortunately, if predictably, their actions did not match their words. The new White House policy all but established the one year delay used by the NIH as the law of the land – explicitly citing the need to sustain subscription-based publishing business as their excuse. Another huge missed opportunity in an area that has had tons of them.
But at least the White House did something. The other major player in this arena – the universities who employ the vast majority of academic scientists, and whose policies shape the course of their careers – have been completely silent. As with funding agencies, universities could hasten the transition to full and immediate open access by making it a condition of employment. Few people would turn down a job because it came with such a requirement.
But, while their own libraries sound the alarm about rising subscription costs and diminishing access, university administrators across the country have done next to nothing to promote changes in scientific publishing that would not only save them money, but make the research done on their campuses more efficient and effective. This is an astonishing abdication of their public mission and responsibility as stewards of scholarship.
However, despite these failings from scientists, funders and universities, the facts on the ground are changing rapidly. In 2007, PLOS launched a new journal – PLOS ONE – that not only provided open access to all of its content, but also dispensed with the notion – central to journal publishing since the 17th century – that journals should select only papers of the highest level of interest to their readers.
Rejecting papers that are technically sound is a relic of the age of printed journals, whose costs scaled with the number of papers they published and whose table of contents served as the primary way people found articles of interest.
But we are no longer limited by the number of articles we can publish, and people primary find papers of interest by searching, not browsing. So PLOS ONE asks its reviewers only to assess whether the paper is a legitimate work of science. If it is, it is published. The process is relatively simple – no need to ping pong from one journal to another in order to find the highest impact home.
This idea evidently appeals to the scientific community, because PLOS ONE has grown rapidly. It will publish in excess of 25,000 articles this year, and though only five years old, it is now the biggest biomedical research journal in the world. And it publishes great science – PLOS ONE articles are routinely talked about both by science journalists and the popular press.
And PLOS ONE has not just been a success as a journal, but also as a business, turning a profit that has not only put PLOS on solid financial footing, but attracted the eye of commercial and non-profit publishers worldwide. In the past year several PLOS ONE clones have been launched and there is broad consensus that this sector will grow and ultimately dominate scientific publishing.
But the battle is by no means won. Open access collectively represents only around 10% of biomedical publishing, has less penetration in other sciences, and is almost non-existent in the humanities. And most scientists still send their best papers to “high impact” subscription-based journals.
But as frustratingly slow as progress has been, I believe we are close to a tipping point with most members of the scientific community believing that open access is the future, and a growing and diverse set of publishers engaged in open access businesses.
But being able to access papers is just the beginning. We can now finally start to actually take advantage of computers and the Internet to not just make scientific publishing open, but to make it better.
If the 17th century founders of the Proceedings of the Royal Society went to read a contemporary scientific journal, they would find it disturbingly familiar. Even though we can read papers on a portable computer while flying 35,000 feet over the Pacific Ocean, the only thing that distinguishes a contemporary paper from a 17th century one is the occasional color photograph.
The multilayered, hyperlinked structure of the Web was made for scientific communication, and yet papers today are largely dispersed and read as static PDFs – another relic of the days of printed papers. We are working with the community to enable the “paper of the future”, that embeds not only things like movies, but access to raw data and the tools used to analyze them.
There is also no need for papers to be static works fixed in a single form at their time of publication. Good data and good ideas in science are constantly evolving, and scientific papers should evolve over time as new data, analyses, and ideas emerge – whether they support or refute the original assertions.
But the biggest target of our efforts is peer review. Peer review is the closest thing science has to a religious doctrine. Scientists believe that peer review is essential to maintaining the integrity of the scientific literature, that it is the only way to filter through millions of papers to identify those one should read, and that we need peer reviewed journals to evaluate the contribution of individual scientists for hiring, funding and promotion.
Attempts to upend, reform or even tinker with peer review are regarded as apostasies. But the truth is that peer review as practiced in the 21st century poisons science. It is conservative, cumbersome, capricious and intrusive. It encourages group think, slows down the communication of new ideas and discoveries, and has ceded undo power to a handful of journals who stand as gatekeepers to success in the field.
Each round of reviews takes a month or more, and it is rare for papers to be accepted without demanding additional experiments, analyses and rewrites, which take months or sometimes years to accomplish.
And this time matters. The scientific enterprise is all about building on the results of others – but this can’t be done if the results of others are languishing in peer review. There can be little doubt that this delay slows down scientific progress and often costs lives.
This might be worth it if these delays made the ultimate product better. But it is not the case. While I am sure that some egregious papers are prevented from being published by peer review, the reality is that with 10,000 or so journals out there, most papers ultimately get published, and the peer reviewed literature is filled with all manner of crappy papers. Even the supposedly more rigorous standards of the elite journals fail to prevent flawed papers from appearing in their pages.
So, while it is a nice idea to imagine peer review as defender of scientific integrity – it isn’t. Flaws in a paper are far more often uncovered after the paper is published than in peer review. And yet, because we have a system that places so much emphasis on where a paper is published, we have no effective way to annotate previously published papers that turn out to be wrong.
And as for classification, does anyone really think that assigning every paper to one of 10,000 journals, organized in a loose and chaotic hierarchy of topics and importance, is really the best way to help people browse the literature? This is a pure relic of a bygone era – an artifact of the historical accident that Gutenberg invented the printing press before Al Gore invented the Internet.
So what would be better? The outlines of an ideal system are simple to spell out. There should be no journal hierarchy, only broad journals like PLOS ONE. When papers are submitted to these journals, they should be immediately made available for free online – clearly marked to indicate that they have not yet been reviewed, but there to be used by people in the field capable of deciding on their own if the work is sound and important.
The journal would then organize a different type of peer review, in which experts in the field were asked if the paper is technically sound – as we currently do at PLOS ONE – but also what kinds of scientists would find this paper interesting, and how important should it be to them. This assessment would then be attached to the paper – there for everyone to see and use as they saw fit, whether it be to find papers, assess the contributions of the authors, or whatever.
This simple process would capture all of the value in the current peer review system while shedding most of its flaws. It would get papers out fast to people most able to build on them, but would provide everyone else with a way to know which papers are relevant to them and a guide to their quality and import.
By replacing the current journal hierarchy with a structured classification of research areas and levels of interest, this new system would undermine the generally poisonous “winner take all” attitude associated with publication in Science, Nature and their ilk. And by devaluing assessment made at the time of publication, this new system would facilitate the development of a robust system of post publication peer review in which individuals or groups could submit their own assessments of papers at any point after they were published. Papers could be updated to respond to comments or to new information, and we would finally make the published scientific literature as dynamic as science itself. And it would all be there for anyone, anywhere to not just access, but participate in.
There is nothing technically challenging about building such a system, and it makes so much sense that it can’t help but happen. But, of course, we’ve been there before. Science is oddly conservative, and there is enough money and power at stake to ensure that people will try to stop this from happening. So if you care about making the scientific literature open and accessible, I urge you to do whatever you can to make it happen. If you’re a scientist, get with the program – there are so many open access options around today, you no longer have any excuse. And try to stop looking at journal titles when you evaluate people and their work. It’s a poisonous process that has to stop.
If you’re not a scientist, but are interested in this cause, you can do all the normal things – write your members of Congress and the such. But I also encourage you to find scientists whose work you find interesting, but can not access, and send them an email. Or better yet, give them a call. Let them know you want to – but can not – read their work. And remind them that, in all likelihood, you paid for it.
If we all do this, them maybe the next time someone like Aaron Swartz comes along and tries to access every scientific paper every written, instead of finding the FBI, they’ll find a giant green button that says “Download Now”.