tagged by: big data


API design · academia · agile · agile adoption · analysis patterns · application architecture · application integration · bad things · big data · board games · build scripting · certification · clean code · collaboration · computer history · conference panels · conferences · continuous delivery · database · design · dictionary · distributed computing magazine · diversions · diversity · documentation · domain driven design · domain specific language · domestic · encapsulation · enterprise architecture · estimation · event architectures · evolutionary design · expositional architectures · extreme programming · gadgets · ieeeSoftware · infodecks · internet culture · interviews · language feature · language workbench · lean · legacy rehab · legal · metrics · microservices · microsoft · mobile · model-view-controller · noSQL · object collaboration design · parser generators · photography · podcast · popular · presentations · privacy · process theory · productivity · programming platforms · project planning · projects · recruiting · refactoring · refactoring boundary · requirements analysis · retrospective · ruby · scrum · security · software craftsmanship · talk videos · team environment · team organization · technical debt · technical leadership · test categories · testing · thoughtworks · tools · travel · uml · version control · web development · web services · website · writing

2018 · 2017 · 2016 · 2015 · 2014 · 2013 · 2012 · 2011 · 2010 · 2009 · 2008 · 2007 · 2006 · 2005 · 2004 · 2003 · 2002 · 2001 · 2000 · 1999 · 1998 · 1997 · 1996

All Content

The Evolving Panorama of Data

Rebecca Parsons and Martin Fowler

Our keynote at QCon London 2012 looks at the role data is playing in our lives (and that it's doing more than just getting bigger). We start by looking at how the world of data is changing: its growing, becoming more distributed and connected. We then move to the industry's response: the rise of NoSQL, the shift to service integration, the appearance of event sourcing, the impact of clouds and new analytics with a greater role for visualization. We take a quick look at how data is being used now, with a particular emphasis from Rebecca on data in the developing world. Finally we consider what does all this mean to our personal responsibilities as software professionals.

18 April 2012


A Proof-of-Concept of BigQuery

by Ian Cartwright

Can Google’s new BigQuery service give customers Big Data analytic power without the need for expensive software or new infrastructure? ThoughtWorks and AutoTrader conducted a weeklong proof of concept test, using a massive data set. The tests showed consistent query performance on a 750 million row data set in the 7-10 second range. We used the REST APIs with Java, JavaScript and Google Charts to create a web front-end with interactive visuals of query results. The entire exercise was conducted with three people in five days. The verdict: BigQuery performed well, and could benefit organisations with Big Data and smaller budgets—especially those without a data warehouse, or whose data warehouse has restricted use.

4 September 2012



With the growing interest in data analytics and visualizations, we're seeing more effort put into interesting visualizations that allow people to draw insight from data floating around in an organization. Most of these dashboards are aimed at individual usage, but there is a growing tendency to use them for a more communal purpose.

22 August 2012



Datensparsamkeit is a German word that's difficult to translate properly into English. It's an attitude to how we capture and store data, saying that we should only handle data that we really need.

12 December 2013



As I write this towards the conclusion of the US presidential election , there's a side debate that's appeared about the forecasts produced by Nate Silver. Many Republicans claim he's a shill for the democrats and his forecast of an 85% chance of an Obama win is bogus . Part of me wishes I knew more innumerate Republicans that I could make side-bets with. Perhaps a better wish would be that the polls were the other way around as I have more Democratic-leaning friends. In reality either way I wouldn't gain too much as most people I know are numerate. Sadly this isn't true in general - this side-show is an illustration of the deep illiteracy most people have for probability, which has some important ramifications for society in general and software development in particular.

5 November 2012


Thinking about Big Data

"Big Data" has leapt rapidly into one of the most hyped terms in our industry, yet the hype should not blind people to the fact that this is a genuinely important shift about the role of data in the world. The amount, speed, and value of data sources is rapidly increasing. Data management has to change in five broad areas: extraction of data from a wider range of sources, changes to the logistics of data management with new database and integration approaches, the use of agile principles in running analytics projects, an emphasis on techniques for data interpretation to separate signal from noise, and the importance of well-designed visualization to make that signal more comprehensible. Summing up this means we don't need big analytics projects, instead we want the new data thinking to permeate our regular work.

29 January 2013


Introduction to NoSQL

At goto Aarhus, we had a track on some practical experiences with NoSQL. I was asked to give an initial talk to explain the basic principles of NoSQL datastores. I talk about the origins of NoSQL, forms of NoSQL data models, the way many NoSQL databases consider the problem of consistency, and the importance of Polyglot Persistence.

3 October 2012



Data Lake is a term that's appeared in this decade to describe an important component of the data analytics pipeline in the world of Big Data. The idea is to have a single store for all of the raw data that anyone in an organization might need to analyze. Commonly people use Hadoop to work on the data in the lake, but the concept is broader than just Hadoop.

5 February 2015



I remember in my teens being told of the wonderful things Artificial Intelligence (AI) would do in the next few years. Now several decades later, some of these seem to be happening. The most recent triumph was of computers teaching each other to play Go by playing against each other, rapidly becoming more proficient than any human, with strategies human experts could barely comprehend. It's natural to wonder what will happen over the next few years, will computers soon have greater intelligence than humanity? (Given some recent election results, that may not be too hard a bar to cross.)

But as I hear of these, I recall Pablo Picasso's comment about computers many decades ago: "Computers are useless. They can only give you answers". The kind of reasoning that techniques such as Machine Learning can result in are truly impressive in their results, and will be useful to us as users and developers of software. But answers, while useful, aren't always the whole picture. I learned this in my early days of school - just providing the answer to a math problem would only get me a couple of marks, to get the full score I had to show how I got it. The reasoning that got to the answer was more valuable than the result itself. That's one of the limitations of the self-taught Go AIs. While they can win, they cannot explain their strategies.

14 November 2017