UK United Kingdom

The new technologies needed for dealing with big data

While much focus and discussion of the so-called “Big Data revolution” has been on the data itself and the exciting new applications it is enabling — from Google’s self-driving cars through to CSIRO and…

MongoDB co-founder and chairman Dwight Merriman still writes code. TechCrunch/Flickr, CC BY-NC

While much focus and discussion of the so-called “Big Data revolution” has been on the data itself and the exciting new applications it is enabling — from Google’s self-driving cars through to CSIRO and University of Tasmania’s better information systems for oyster farmers — less focus has been on the underpinning technologies and the talent driving these technologies.

At the heart of the Big Data movement is a range of next generation database technologies that enable data to be amassed and analysed on a scale and speed hitherto unseen.

Global online services such as Google, Amazon and Facebook that serve billions of people around the world in real time have been made possible due to new technologies that divide tasks and files across banks of thousands of distributed computers.

Storing the data

Traditional database technologies are built around many tables of information like spreadsheets with rows and columns and a way of asking questions of these tables in a structured way.

The structured way of asking a question of these data collections was originally named SEQUEL (Structured English Query Language), later shortened to SQL. This is the technology that Oracle pioneered in the 1970s and it has served them well to become the undisputed king of database technology ever since.

If you are familiar with Excel, you’d be familiar with the type of information this kind of technology is suited to representing. Company accounts, marketing and sales figures over time are of course perfect.

But there are other types of data that isn’t so easily stored in this way such as storing the relationships in a social network (Facebook), or index of documents stored on the web (Google), or for large collections of digital music and video (Netflix).

Fortunately there are other ways to store information other than in tables such as in trees, graphs, or in lists with an index. And some of these approaches are much better suited for humungous data sets and for data sets that don’t naturally fit into a series of tables.

The growing demand to store and analyse very large bodies of information, and information that is not readily suited to storing in tables (unstructured data), has led to a rapid growth in the popularity of these alternative types of database technologies.

Rising Tide Google Trends.

Collectively they’ve become known as NoSQL technologies. Many of the leading technologies in this category are not developed by one company, such as Oracle or Microsoft, but instead are open source - developed by an open network of companies and independent developers and contributors akin to the way Wikipedia or Linux is developed.

Next-generation database technology

There are five key types of next-generation NoSQL data technologies. They are:

  1. Document Store — suitable for storing large collections of documents
  2. Wide Column Store — for very rapid access to structured or semi structured data
  3. Search Engine — suitable for full text indexing of documents
  4. Key-Value Store – suitable for rapid access to unstructured data
  5. Graph Database – suitable for storing graph type data such as social networks.

New data intensive applications are being made possible with a new generation of database technologies. James Ruffer holding up a Neo4j sticker at TechCamp Memphis. Flickr/Jeremy Kendell

And the leading technologies in each of these categories respectively are:

Note Apache Hadoop, which is also a leading technology, is not included in this list as it is a framework and file system and not a database technology (but can support many of these).

Leading NoSQL Database technologies in each of top 5 categories.

Where there’s talent there’s fire

By looking at the companies around the world who have the most employees with skills in each of these these frontier technologies, we can get a unique insight into organisations at the forefront of next generation big data applications.

Based on more extensive study, below is a map covering 40 leading global organisations that have the greatest number of specialists in each of the top five next-gen database technologies.

The Conversation
Click here to open in new window or republish.

The more detailed country-by-country analysis has revealed some organisations such as Sky in the London, Goldman Sachs in NYC are leaders in the number people they have with skills in these emerging areas.

Articles also by This Author

Sign in to Favourite

Join the conversation

4 Comments sorted by

  1. Walter Adamson

    logged in via Twitter

    Paul does the table mean that Australia is in the Top 8 of each of those technology user counts or did you always choose to put Australia in each segment. I mean are they the Top 8 in the world, or a selected list of countries. I asking because I'm surprised that Australia would rank in each. Thanks.

    1. Paul X. McCarthy

      Adjunct Professor at UNSW Australia

      In reply to Walter Adamson


      Very good question. It's not a comprehensive global ranking but a sample compilation of leading companies by regional sampling. For each technology I looked at each of the "Big 5" for the US (SF and NYC separately); the UK, Canada, Australia and NZ. In the main these regions do correspond to the leaders when looking at global view but there were some real other surprising notable strengths in other geographies when looking globally hence the inclusion of selected companies from the Ukrane, Poland and Israel.

  2. Paul Hawking

    Associate Professor Information Systems

    Hi Paul

    Interesting article. From a corporate perspective SAP has developed a new database (HANA) which utilizes a combination of in-memory, in column, compression and partitioning technologies to quickly process large data sets. HANA underpins future releases of SAP's corporate solutions (ERP Systems, CRM, SCM, BI). Considering that SAP is the market leader in enterprise systems then by default HANA will become a major technology for Big Data in companies that use SAP solutions. As HANA is a database technology it is not dependent on SAP corporate solutions. SAP has established a "Startup Program" whereby they are making the database technology available to startups to underpin their technology needs.

    The popularity of HANA is evidenced by a recent 5 week MOOC on HANA which attracted more than 40,000 registrations with 12,000 sitting the final exam.


  3. Michael Shand

    Software Tester

    Really interesting, had not heard of this before, thanks for sharing