data – The Boreal Beetle

“Back in my day…”

I sort of feel like I’m saying that more and more these days. It must be a symptom of advancing age. Today that geezer sentiment was stimulated by this tweet:

Received an email from a prominent gall biologist asking for a reprint of my paper…I…I’m a real scientist now…*wipes tear*

— Miles (@ymilesz) November 26, 2013

For those of you who haven’t been “in the business” long enough to remember the ritual, it went something like this. I would read a paper of interest and write out various references from it that I needed to get my hands on for deeper understanding of the topic. Then I’d head to the library and do the cart-photocopier shuffle. I’d generally find all of the articles that I was after, but often one or two key papers would be missing. So I’d head back to the department mailroom and would pick up a card that looked something like this. After filling out the card and mailing it, I’d wait a few weeks and would (usually) happily find a copy of the paper in my mailbox sent to me personally from the corresponding author. Sometimes the author would have even taken the time to write a short greeting on the reprint.

Most labs maintained a stock of reprints. When you published a paper, you’d have the option of buying paper reprints in various quantities from the publisher. There was often much discussion to decide about how many you thought you’d need to purchase. If you ran out, you’d photocopy the last one to replenish your pile. Some piles would dwindle quickly. Others would just collect sad no-citation dust.

However I haven’t even thought about reprints for years now, other than occasionally stumbling across my remaining stocks of reprints occupying space in my file cabinet (which I also hardly ever venture into anymore). I haven’t been asked for a reprint in ages. I haven’t asked for a reprint in ages. In fact, I can’t even remember the last time either of those events occurred.

To some extent, this is a good thing. It means:

many people these days have good access to most journals, and open access is having a good effect.
most journals now maintain good archives of even their oldest material.
information is often available immediately and at our fingertips.
I no longer need to rely on hoping that my request gets to a corresponding author (who could have left that institution years ago), or that the author takes the time to send me the paper.
less paper use and happier forests.

On the other hand, there are still many places in the world, and many institutions, without adequate access to scientific literature. Even today not all journals maintain deep archives. And no library, even those that are otherwise well-stocked, subscribe to all archives of all journals. This latter point is becoming more and more the case as subscription costs rise and budgets dwindle. But we have email, and #IcanhazPDF, and open access venues – all of which should help with these issues.

I was reminded of these “on the other hand” points this week when I set out to get my hands on this paper. Surprisingly to me at least, our library only listed the paper version of this article in their stacks. So…

What? I need to physically go to the library to photocopy a journal article? 1998 time warp in progress.

— Dr. Dez (@docdez) November 25, 2013

Once at the library, I located the journal and found that the volume was missing from the shelf. Egads! Back down the circulation desk, where I filled out a form that would send a student assistant scurrying around the library looking for the missing volume. At that point, I’d had about enough fun reliving the 90s, and even though there is a valid debate about the effects of #icanhazPDF, I made my Twitter request. Thanks to Chris MacQuarrie and the magic of the internet, the article was on its way to me in a jiffy. Later on in the day the library notified me that they’d found the truant volume…

So obviously the demise of the old paper reprint/mail system is a good thing, right? Perhaps. For the most part I agree.

However, despite what may be thought of as its shortcomings (shortcomings now due merely to technological advances), a reprint request was much more than a request for a single article. More than simply that, a request used to serve as one more thread in a network between real people. A request represented one more potential conduit to collaborative discussion. It wasn’t the paper in the mail that was important so much as it was the tangible connection to someone else with similar research interests. Thankfully things like Twitter, Google Scholar, and various other up-and-coming services help to reveal linkages and keep the conversation going for those who participate. Participation in the emerging system and getting others to do the same is what is vital. And participation is what we need to be encouraging.

The biggest tragedy of non-participation for all of us is a lack of key influences on the ongoing discussion of our craft. It’s easy to relegate nay-sayers to the dinosaur bin. But their diverse and experienced voices are vital to understand where we’ve been and where we’re going. The sunset of network building via rituals like reprint requests does not represent the end of an era as much as it reveals new and exciting possibilities for even more meaningful connections. The more ideas, data, opinions, and interpretations that we have on board, the better for all of us and the better for the progress of science.

I am fully aware that blog posts like this are the proverbial preaching to the choir. So, how do we convince our colleagues who are still not part of the emerging conversation to join with us? Reprint requests, and many of our previous network building methods, are fading away. We don’t want voices with important knowledge, wisdom, and experience to fade with them.

by Dezene Huber and Paul Fields, reblogged from the ESC-SEC Blog.

Have you ever read a paper and, after digesting it for a bit, thought: “I wish I could play with the data”?

Perhaps you thought that another statistical test was more appropriate for the data and would provide a different interpretation than the one given by the authors. Maybe you had completed a similar experiment and you wanted to conduct a deeper comparison of the results than would be possible by simply assessing a set of bar graphs or a table of statistical values. Maybe you were working on a meta-analysis and the entire data set would have been extremely useful in your work. Perhaps you thought that you had detected a flaw in the study, and you would have liked to test the data to see if your hunch was correct.

Whatever your reason for wishing to access to the data, and this list probably just skims the surface of the sea of possibilities, you often only have one option for getting your hands on the spread sheets or other data outputs from the study – contacting the corresponding author.

Sometimes that works. Often times it does not.

The corresponding author may no longer be affiliated with the listed contact information. Tracking her down might not be easy, particularly if she has moved on from academic or government research.

The corresponding author may no longer be alive, the fate of us all.

You may be able to track down the author, but the data may no longer be available. Perhaps the student or postdoc that produced it is now out of contact with the PI. But even if efforts have been made to retain lab notebooks and similar items, is the data easily sharable?

And, even if it is potentially sharable (for instance, in an Excel file), are the PI’s records organized enough to find it?*

The author may be unwilling to share the data for one reason or another.

Molly (2011) covers many of the above points and also goes into much greater depth on the topic of open data than we are able to do here.

In many fields of study, the issues that we mention above are the rule rather than the exception. Some readers may note that a few fields have had policies to avoid issues like this for some time. For instance, genomics researchers have long used repositories such as NCBI to deposit data at the time of a study being published. And taxonomists have deposited labeled voucher specimens in curated collections for longer than any of us have been alive. Even in those cases, however, there are usually data outputs from studies associated with the deposited material that never again see the light of day. So even those exceptions that prove the rule are part of the rule of a lack of access to data.

But, what if things were different? What might a coherent open data policy look like? The Amsterdam Manifesto, which is still a work in progress, may be a good start. Its points are simple, but potentially paradigm-shifting. It states that:

Data should be considered citable products of research.
Such data should be held in persistent public repositories.
If a publication is based on data not included in the text, those data should be cited in the publication.
A data citation in a publication should resemble a bibliographic citation.
A data citation should include a unique persistent identifier (a DataCite DOI recommended, unless other persistent identifiers are in use within the community).
The identifier should resolve to provide either direct access to the data or information on accessibility.
If data citation supports versioning of the data set, it should provide a method to access all the versions.
Data citation should support attribution of credit to all contributors.

This line of reasoning is no longer just left to back-of-napkin scrawls. Open access to long term, citable data is slowly becoming the norm rather than the exception. Several journals have begun require, or at least strongly suggest, deposition of all data associated with a study at the time of submission. These include PeerJ and various PLoS journals. It is more than likely that other journals will do the same, now that this ball is rolling.

The benefits of open data are numerous (Molloy, 2011). They include the fact that full disclosure of data allows for verification of your results by others. Openness also allows others to use your data in ways that you may not have anticipated. It ensures that the data reside alongside the papers that stemmed from them. It reduces the likelihood that your data may be lost due to various common circumstances. Above all it takes the most common of scientific outputs – the peer reviewed paper – and adds lasting value for ongoing use by others. We believe that these benefits outweigh the two main costs: the time taken to organize the data and the effort involved in posting in an online data repository.

If this interests you, and we hope that it does, the next question on your mind is probably “where can I deposit the data for my next paper?” There are a number of options available that allow citable

(DOI) archiving of all sorts of data types (text, spreadsheets, photographs, videos, even that poster or presentation file from your last conference presentation). These include figshare, Dryad, various institutional repositories, and others. You can search for specific repositories at OpenDOAR using a number of criteria. When choosing a data repository, it is important that you ensure that it is backed up by a system such as CLOCKSS.

Along with the ongoing expansion of open access publishing options, open data archiving is beginning to come into its own. Perhaps you can think of novel ways to prepare and share the data from your next manuscript, talk, or poster presentation for use by a wide and diverse audience.

—–

* To illustrate this point, one of us (DH) still has access to the data for the papers that stemmed from his Ph.D. thesis research. Or at least he thinks that he does. They currently reside on the hard drive of the Bondi blue iMac that he used to write his thesis, and that is now stored in a crawlspace under the stairs at his house. Maybe it still works and maybe the data could be retrieved. But it would entail a fair bit of work to do that (not to mention trying to remember the file structure more than a decade later). And digital media have a shelf life, so data retrieval may be out of the question at this point anyhow.

Category: data

Reprints back then… but what now?

Open data