Close your eyes and picture a scientist. What do you see?
The likelihood is that you will imagine the scientist as an individual of great intellect, grappling heroically with nature’s secrets and looking for the “Eureka!” moment that will transform our understanding of the universe.
This notion of the individual effort is implicit in the everyday language of scientists themselves. We talk of Newton’s Laws of Motion or Mendelian Inheritance. We have the annual pronouncements of the Nobel committee, which awards science prizes to at most three living individuals in each category.
Contemporary popular culture presents us with characters such as Big Bang Theory’s Sheldon Cooper, single-mindedly and single-handedly in pursuit of a theory of everything.
But the practice of science over the last century has witnessed a significant shift from the individual to the group, as scientific research has become more specialised and the nature of research problems have become more complex, requiring increasingly sophisticated approaches.
The lone scientist appears to be almost a myth.
The rise of ‘Big Science’
Much of science, as it is conducted now, is Big Science, characterised by major international collaborations supported by multi-government billion dollar investments.
Examples include the effort to build the next atom smasher to hunt for the Higgs boson, a telescope to uncover the first generation of stars or galaxies, and the technology to unravel the complex secrets of the human genome.
One of the key driving forces behind this wonderful growth in science has been the similarly spectacular growth in computer power and storage. Big Science now equals Big Data – for example, when the Square Kilometre Array starts observing the sky in 2020, it will generate more data on its first day than will have existed on the internet at that time.
Powerful supercomputers are the tool researchers use to sift through the wealth of data produced by observations of the universe, large and small.
At the same time, they are harnessed to provide insights into complex phenomena in simulated universes – from the way atoms and molecules arrange themselves on the surfaces of novel materials, to the complexity of folding proteins, and the evolution of structure in a universe dominated by dark matter and dark energy.
Big Science has resulted in a spectacular growth in our understanding of the universe, but its reliance on cutting-edge computing has presented a number of new challenges, not only in the cost and running expenses of supercomputers and massive data stores, but also in how to take advantage of this new power.
The Big Science bottleneck
Unlike general computer users – who may want to simply check email, social media or browse photos – scientists often need to get computers to do things that haven’t been done before. It could anything from predicting the intricate motions of dark matter and atoms in a forming galaxy, or mining the wealth of genetic data in the field of bioinformatics.
And unlike general users, scientists seldom have off-the-shelf solutions and software packages to solve their research problems. They require new, home-grown programs that need to be written from scratch.
But the training of modern scientists poorly prepares them for such a high tech future. Studying for a traditional science degree that focuses upon theory and experiment, they get limited exposure to the computation- and data-intensive methods that underpin modern science.
This changes when they enter their postgraduate years – these scientists-in-training are now at the bleeding edge of research, but the bleeding-edge computational tools often do not exist and so they have to develop them.
The result is that many scientists-in-training are ill-equipped to write software (or code, in the everyday language of a researcher) that is fit-for-purpose. And just like driving and child rearing, they are likely to get very cross if you attempt to criticise their efforts, or suggest there is a better way of doing something.
This systemic failing is compounded by a view that the writing of good code is not so much a craft as a trivial exercise in the true effort of science (an attitude that drives us to despair).
For this reason, it is probably unsurprising that many fields are awash with poor, inefficient codes, and data-sets too extensive to be properly explored.
Coding the future
Of course, there are those to whom efficient and cutting-edge coding comes a lot more naturally. They can write the programs to simulate the Universe and take advantage of new GPU-based supercomputers, or efficiently interrogate the multi-dimensional genomic databases.
Writing such codes can be a major undertaking, consuming the entire three to four years of a PhD. For some, they are able to use their codes to obtain new scientific results.
But too often the all-consuming nature of code development means that an individual researcher may not uncover the major scientific results, missing out on the publications and citations that are the currency of modern science.
Those that can code are out of a job
Other researchers, those that just use rather than develop such codes, are able to reap the rewards, and this better paves their way into an academic career. The rewards go to those that seek to answer the questions, not those that make it happen.
With fewer publications under their belt, those that develop the tools needed by the scientific community find themselves pushed out the market, and out of academia.
Some senior academics recognise this path to career suicide, and young researchers are steered into projects with a more stable future (as stable as academic careers can be).
But we are then faced with a growing challenge on who will develop the necessary tools for Big Science to continue to flourish.
How to grow an early scientist
So, what’s the answer? Clearly, science needs to make a cultural change in understanding on what makes a good modern scientist.
As well as fertilising links with our computer scientist colleagues, we need to judge early scientists on more than their paper output and citation count. We need to examine their contribution in a much broader context.
And within this context, we need to develop a career structure that rewards those who make the tools that allow Big Science to happen. Without them, supercomputers will groan with inefficient code, and we are simply going to drown in the oncoming flood of data.