Big Data Journey: A few battle scars later

In this vodcast interview – three of the main people involved in our Big Data Journey discuss battle scars and what we’ve learned. They discuss everything from the 4vs, data capture, storage and security to Hadoop, Redshift and Elasticsearch. Interview text below.

It’s been two years and we’ve now delivered our third big data project. Two years ago we left a client who was looking to undertake some analysis. They are a major online business model, data is their business and they gave us an analytical challenge with all that data. The big data journey has taken us across the four Vs – volume, variability, variety and velocity. Some of that stuff comes at you pretty fast.

Q: What do you think has been the biggest challenge? Has it been the integration and unification of some of that data? And the different variety of data we’re dealing with? Is it being able to handle it fast? Is it being able to store it?

I think the first challenge was actually getting our head around where the best place to start was. There’s quite a lot of buzz words and quite a lot of different ideas which are all related to big data. Each project is different. It took us a long time to get to understand all the different components that make up that ecosystem and decide which bits are used first. So when you go into a certain type of project should you start with Hadoop or should you start with Elasticsearch? They do different things. It’s getting yourself up that learning curve of what each of those niches are and where you use them for different things. That was a real learning curve for us. Now we feel that we’re there and we’ve done a lot with different systems and feel a lot more comfortable to choose the right tools for the job.

Would you choose Hadoop? Would you choose Redshift? Or if you’re doing visualisation maybe you choose to base it on Elasticsearch because of the quickness. The speed of return of the queries and things like that.

Ecosystem is a great description for working with big data because it’s not a linear process.  It’s an ecosystem you have to grow. And there isn’t one tool that solves everything. As we’ve been evaluating different tools they’ve been on a journey too. Like Elasticsearch, they’ve come a long way from when we first started using them. They’ve learnt from our use cases in which they saw our speed issue, our storage issues and in terms of some of the adhoc querying as well. But then in terms of ecosystem, we have a lot of SQL analysis tools here as well. People who know SQL. So we have played quite a bit with Redshift as well.

There are different stages in projects. We often start with getting things going really quickly and trying to understand things. And there are certain tools that are a lot quicker to get going with. We do some map reduce analysis and batch queries with Hadoop and Redshift. Redshift is a really great new tool which is SQL based.  We’ve learnt that within a project you start with your adhoc understanding of what you need to do and get going with it and to process or understand that large amount of data. Then to put it into a production system you might use a different set of tools entirely and we often use things like Elastic in the final step to actually output the analysis to the users.

Q: So all of this is constantly changing – we’re constantly evaluating. But how has it been with clients? In my experience it’s been trying to get clients to understand that their data policies need to change. Changing their perception of storage and retention so they are able to defensively delete certain aspects that they don’t need. But also being able to capture things that they might not have captured in the past so as to give a rich story, what they want to read from their analysis. How’s your experience been from a technical perspective?

Even beyond a technical perspective it’s about understanding what data you need to capture and how long you need to store it for. We found that actually a lot of companies need to go through the process of trying to understand what they need to get the best value from their data. What they need to keep, how they best store it and the security they need to put in place. Deciding the teams as well, to work with it. With the technical aspects, things like security are not as hard as deciding policies in the beginning and getting that structure in place. We found that going through that process of understanding what you need can be the most difficult thing.

The questions come. Why do we have to do certain things? Why do you want that much data? How are you actually going to transfer that data? So all 4 Vs come into play, not just for us to understand but for the client to understand.

Tackling how you use cloud computing if you have sensitive data. Do you mask the data? What actually is or what isn’t sensitive? The IT infrastructure of a company will have their policies but how do they evaluate if something is secure or not?  It’s been an issue we’ve had to work through a number of times.

How you move these massive sums of data between different cloud solutions or even within regions within cloud solutions can often be a problem if you have terabytes and terabytes of data. It can be pretty difficult when keeping security in mind and ensuring that process is secure.  We came across some products and tools which can move absolute bucketloads of data in seconds and with security in place which we wouldn’t have come across if we weren’t in this space. That sort of thing’s only possible once you get in there and go on that journey.

Q: How has the leap from traditional SQL to dealing with big data been?

I think everyone here’s much happier since Redshift arrived. Redshift makes people happy.

We’ve been through a bit of a transition here with other people here trying to understand what is and what we’re doing because it was mostly the same people were on these projects. There’s also been a lot of work around different technologies which aren’t directly related to the big data space which other people have been working on. JavaScript visualisation for example which definitely compliments our work. There’s a lot of different work going on in different areas that all come together in the same sort of field. We’ve progressed as a company in massive leaps and bounds.

Q: Are statistics capabilities easier with the big data ecosystem?

We have a number of stats guys and they’re getting into that role of doing things not just in small scale but on large scale and using the whole set available rather than sampling. Definitely advances in that area as well.

There’s the adage that big data is a bit like teenage sex – everyone’s talking about it, nobody really knows how to do it. I like to think we’ve graduated.

In the last 2 years we’ve gone from talking conceptually about big data and all these tools to ticking a lot of different boxes in terms of use cases. The fact that we have different teams working on different projects means we get a large variety of use cases. In that sense we almost know what we’re talking about. The reason I say almost is because it’s an ongoing journey. There are certain things we haven’t touched upon yet such as using Hadoop in a cluster on its own because that will add a lot more to our stats capabilities. 90% of NoSQL technologies we have tried and tested but some of the computation stuff we can still do more. We can always do more.

Of the three people we interviewed experienced with big data, Pamela Edmond – Associate Director, was on her last week with us at the time of filming. We wanted to capture her input into a discussion about a journey she was very much instrumental to realising.

Analytics in M&A and Market Intelligence

We catch up with David Crout, Director of the MAMI team at PMSI to find out what his team do and how analytics can play a role in M&A and Market Intelligence.

Goldilocks and the D3 Bears

Author: John Kiernander

As open-source software goes from strength-to-strength, one area particularly close to my heart is that of charting libraries and APIs.  Like many areas of software, the open-source community has revolutionised the way that data is presented on the web.
Arguably the most significant open-source visualisation project is called D3 (data-driven documents), however many an inexperienced JavaScript developer’s hopes have been dashed against the rocks of D3. It is NOT a charting API.  It’s a library which helps a developer map data to elements on a web page (think JQuery for data if you are technically minded).  That means you CAN create charts, in fact I’d say you can create any chart you could ever imagine and you can do it much more easily than with raw JavaScript, however the spectrum of JavaScript programmers is broad and I would argue that D3 still requires a fairly high level of skill if you want to do something from scratch.  In the hands of an expert, the results are magnificent but sadly the majority of D3 implementations I have seen appear to have been built by taking an example and hacking at it until it fits the required data.
To address this, a number of open-source projects have emerged with the specific goal of drawing charts in D3.  Their restricted reach leads to a greater simplicity and opens the door to many more users.  However when I came to look for an API which meets the needs of our analysts – many of whom come from an Excel rather than JavaScript background – we couldn’t find that crucial Goldilocks zone between complexity and limitation.  This was what spurred us to create our own. The result is dimple, a JavaScript library which allows you to build charts using a handful of commands.  The commands can be combined in myriad ways to create all sorts of charts and the results can be manipulated with D3 if you need to do something really unusual.  The main limitation is that it only supports charts with axes for now (pie charts are in the works), but it works in a way which ought to be easily understood by anybody with some basic programming knowledge.
Dimple Price Range Chart
The example above is from the advanced section of the site, but still has less than 20 lines of JavaScript.  To get started with a simpler example, why not copy and paste the code below into notepad, save it as MyChart.html, open it in your favourite browser and then sit back admiring your first bar chart.
  <script type="text/javascript" src=""></script>
  <script type="text/javascript" src=""</script>
 <script type="text/javascript">
   var svg = dimple.newSvg("body", 800, 600);
   var data = [
     { "Word":"Hello", "Awesomeness":2000 },
     { "Word":"World", "Awesomeness":3000 }
   var chart = new dimple.chart(svg, data);
   chart.addCategoryAxis("x", "Word");
   chart.addMeasureAxis("y", "Awesomeness");

The brevity is good but it’s the flexibility and readability which we were really shooting for.  So try switching the letters “x” and “y” on the add axis lines and you get a horizontal bar chart, change “bar” in the “addSeries” line to “bubble”, “line” or “area” and you’ll get those respective chart types.  Or better still copy the “addSeries” line and change the plot to get a multiple series chart.  You can go on to add multiple axes, different axis types, storyboards (for animation), legends and more.  For ideas see the examples, or if you are feeling brave the advanced examples which I try to update regularly.

A Single Version of the truth – Is it just a myth?

Author: Nadya Chernusheva

Why do we hear companies talking about a “single version of the truth”? It is because of the frustration they have experienced when multiple people argue about which numbers are correct rather than focusing on what the metrics mean. Finding out what the metrics really mean would allow them to improve operational performance and business results. They want data consistency so they can understand trends, variances, causes and effects. They want to be able to have easy and quick access to information that they can trust. They do not want to wait for days to get hold of data they need but may not even be able to rely on from IT or an overworked analyst.
In many companies existing data warehouses and reporting systems are so fragmented and widely dispersed that it’s impossible to determine the most accurate version of information across an enterprise. A proper information strategy with a solid MDM and infrastructure often takes years to develop and requires substantial upfront investment. In the meantime, companies’ departments are left to develop their own short-term solutions resulting in too many data sources reporting different information. This information is incomplete, lacks structure and is sometimes even misleading.
A Single Version of the truth – Is it just a myth?

A Single Version of the truth – Is it just a myth?

Imagine a mid-sized and fast growing international business with a strong portfolio of brands. They know very well how to manufacture a good quality product and effectively pitch it to a consumer. However, they struggle with large amounts of data sitting in multiple Excel spreadsheets and legacy systems without having access to analytics and consistent reporting. Central management does not often have much of an in depth view of what is happening in local markets. It takes a long time and some frustration to get a simple market-share data point. Not to mention the time and frustration to gain insight into competitor and product performance on a regular basis.
Would it not be great to have a central single source of reliable & consistent information enabling quick and easy access and reporting, reducing manual work and delivering performance results quicker? It is possible. You don’t have to ‘boil the ocean’ and to try to incorporate all existing data at once. Start with market or sales data to get a consistent and accurate view of the critical KPIs, to improve segmentation and to get an insight into key areas before adding-on more…
With new technology, methodology for data unification and the emergence of data visualisation tools that are revolutionising decision making, that panacea of the “single-version” isn’t just a dream.  Whether it’s your legacy vendors or open-source options, this new wave of technology enables delivery of the right information and analysis to the right person, at the right time and on a regular basis. It can help to overcome the need for large infrastructure investment while developing your metrics; stakeholder strategy and reporting requirements.
It’s not an impossible dream and although the wave of options, methodologies and technologies might feel like an overwhelming wave, you can ride the swell towards an optimal solution.

Scoring an A in Analytics

Author: Nick Petrow

Remember when you were in school and your teacher assigned a big group project? What was the first thing you and your group did? I am willing to bet that, at least once, you found the easiest way was to split it and assign one segment to each group member. You then went home, completed your “fair share” and brought it in the next day without collaboration (or interruption) from your teammates.   I don’t blame you, we’ve all done it. Plus, working alone is much easier than having the headache of amalgamating the different standards and work styles of the group.

Scoring an A in Analytics

Scoring an A in Analytics

Although this “divide and conquer” technique may be easier, it is not necessarily the best strategy. Not only does this tactic disturb the continuity of the final product, but it also ignores the most valuable asset a team has to offer: Collaborative knowledge.

Knowledge capture is increasingly seen as a priority by executives using different social media and cloud-sharing tools in order to inspire teamwork.  It generates better results. When members of a team work in silos, the ability to share knowledge is inhibited and that resource is wasted.

Trying to operate without all members of a team working together is like trying to pedal a bicycle with misaligned gears. However a misaligned business is much harder to diagnose and fix than a broken bike. Generally, executives do not realize the problem until it has adversely affected profitability. Personally, I did not realize the danger of working in silos until I underperformed on a group project. For a business, the moment of realization may occur when revenue is below the internal forecast for a quarter. This moment of realization will often function as the kick-off point for alignment within the silos of a business.

No one likes to underperform, but it may be just the wake-up call your business needs to trigger alignment.   Like the bicycle with the misaligned gears – only after a stressful day when the bike fails completely causing you to be late for work, is the initiative to fix it sparked.  Once the bike is fixed your commute to work will be much more efficient.  That one day you were late for work will be well worth the strain.

So, next time your team underperforms, do not view it as a hassle or a failure. Instead, view it as an opportunity to motivate your team to abolish silos and begin working together.  In the end, it could be the most valuable quarter all year!

Would you rather skip the underperforming step? Check out how our Business Analytics Strategy Experience Workshop can kick-start your team’s alignment in this cool video!


“What is?” Series – Data Warehousing

Associate Director Pamela Edmond chats with Richard Jolly our BI Architect and expert on Data Warehousing.

Making Your Business Analytics System A Blockbuster Hit


Author:  Vivek Kadiwar

When it comes to making a successful movie, there is no secret formula to assure you’ll produce the next blockbuster hit. That said, there’s a range of actions that all producers take during the film making process.

So what separates the good from the great? The winners from the losers?

The key to success lies in how well these series of steps are implemented by the producers. It’s about finding the right script to match your vision, hiring a director that understands your vision and finding the right set of actors to carry out your vision. When the whole team is on the same page, as a producer, it becomes much easier to execute your plan.

In the last decade, the benefits of analytics have become clear throughout most businesses, yet companies still face multiple hurdles in maximizing the value of the analytic systems they have in place. Most business analytics problems within an organization come as a result of a lack of collaboration, alignment, and communication. In a large business organization it’s not always an easy task to get everybody on the same page. The solution to these problems is creating a clear “line of sight” (click link below) from the top to the bottom of the company. Thus, the key to a powerful business analytics system is very similar to creating a successful movie. There truly must be an aligned goal for every member involved in the process. Just as the producer and director should be on the same page to increase the success of a film; the same goes for a CFO and CIO of a company trying to improve the value of analytics.

Here are the 7 steps to having a well-connected business analytics system. Read the article below to better understand how linking decisions to drivers to outcomes can connect your strategic, financial, and operational goals within your organization

A ‘How To’ Guide: Using a Performance Driver Approach PDF

Companies are set to benefit from the growth wave of energy efficiency works in housing

Author: Gary Engelbert

UK targets for carbon reduction in housing have lead to a number of new government initiatives to support spend on energy efficiency works

The UK Government has committed to an ambitious target of 30% carbon reduction by 2020. Housing plays a significant role in meeting this target as it contributes just over a quarter of overall UK carbon emissions. As part of this drive for energy efficiency a number of government initiatives are in place that help fund relevant works in domestic properties:


We estimate that the incremental spend on measures to be worth £10.8Bn to 2017/18 based on modelling the profile of each initiative


The most significant initiative is the new Energy Company Obligation (ECO)…

ECO took effect from January 2013 and is an obligation placed on the big 6 energy companies to finance energy efficiency measures with specific targets on energy savings. The scheme is paid for by energy companies who typically increase energy bills to their customers in order to fund it. Spend is likely to be back-loaded in 2 yearly cycles as the deadlines for meeting carbon savings approach. ECO will focus on providing energy efficiency measures to low income and vulnerable consumers and those living in hard-to-treat properties. Therefore, there will be a sharp rise in the need for contractors and specialists involved in insulation measures (specifically solid wall insulation) in areas of social depravation. The initiative is envisioned to last for at least 10 years at a broadly similar level of investment.

…whilst Feed-in-tariffs (FiT), the Green Deal and the Renewable Heat Incentive (RHI) will increasingly support spend in later years

Feed-in-tariffs (FIT) are regular payments from energy companies to householders and communities who generate their own electricity. During 2011 uptake increased dramatically culminating in early December when everyone was rushing to install Solar PV while the payment rates were high. Installation of Solar PV will likely drop off in 2013/14 and 2014/15 driven by the reduction in generation rates, but then start to grow again (although never recovering to the ‘boom’ levels of 2011).

The Green Deal is a further, potentially significant source of funding. It enables private firms to offer consumers energy efficiency improvements to their homes, at no upfront cost, with payments recouped through a proportion of the reduction on their energy bills. Although slow to start, the Green Deal will undoubtedly be a driver of spend in future years. There are 45 measures approved to receive funding under the Green Deal, covering insulation, heating and hot water, glazing and micro-generation. Installers must be authorised and may specialise in one or multiple measures.

The Renewable Heat Incentive provides a continuous income stream over 20 years to anyone that installs an eligible renewable heating system. Unlike FiT, it will be paid for by the Government not by energy users. RHI has been delayed for domestic customers, the Government now intending to announce the final details in summer 2013 and open the schemes for payment from spring 2014. We expect spend to grow over time, as the scheme kicks off and as technologies such solar water heating and air source heat pumps become cheaper and more effective.

This is a new and exciting market that PMSI have worked in consistently while it has developed over the last 5 years. We see huge opportunities for the companies able to exploit this “growth wave” and have built up a database of potential investment targets.

Keeping House with an Analytics Center of Excellence

Author:  Nick Petrow              

When I was younger my parents went to Paris for their 20th anniversary and left my two sisters and I at home alone for the first time. My Mother had two rules: we must keep the house clean – and we absolutely must not kill each other – both of which proved equally difficult.  However, I distinctly recall thinking to myself: Keeping the house clean is going to be so easy without Mom making me do unnecessary work. Spoiler Alert: I was very wrong.


It should not have been that much work.  Between the three of us we did 90% of the cleaning when my Mother was home anyways.  But without her, everything seemed to take twice as long. Why was it such a difficult task without our Mother?  Without her, we were unable to see the big picture. We did not clean up messes as they occurred and they became bigger messes.   We did not communicate with each other and ended up duplicating tasks (we each vacuumed the floor once not knowing that it had already been done). Finally, we failed to set standards for what constituted as“clean.”

My Mother effectively kept the team working together.  She put procedures in place, ensured communication, and set the standard and strategy for cleanliness. These seemingly simple structures ultimately enabled us to keep the house clean in the most efficient manner. The same thought applies to business.  If something as simple as keeping a house clean is so fragile and dependent on teamwork and communication, what are businesses to do with large teams and complex priorities such as adopting business analytics capabilities?  How do businesses guarantee that their strategic priorities are being driven forward?

Answer:  The Analytics Center of Excellence.

Too many companies read about the miraculous solutions of analytics and attempt to invest heavily in tools and technology. , However, without the strategy and team in place it often results in low value and an inability to yield true insight.  When it comes to Business Analytics, organizations need to have a guiding force in place to set the priorities and get the teams collaborating to achieve the results they are looking for.  They need a way to implement all the necessary parts of an analytics strategy while taking into account the big picture.  Enter The Analytics Center of Excellence.

What is an Analytics Center of Excellence?   It depends on the company, its goals and its culture.   It may be a team of business and IT specialists working together or a virtual network of Domain Experts from each line of business headed by a Chief Analytics Officer.  However you exactly decide to set up your Analytics Center of Excellence is dependent on a number of factors; but there are some things all successful ones have in a common.   A successful Analytics Center of Excellence has the right resources, role definitions, skills, and expertise to establish processes, facilitate proper communication, and develop and maintain a set of standards.  A successful Center of Excellence will ensure that your company’s analytics program is aligning business goals with meaningful insight, so you are getting the right information you need to make better business decisions.

A company without a Center of Excellence is no different than my family working in silos to keep the house clean without the driving force of our Mother keeping us communicating and on-target. So, before you try to open the door to the exciting new world of analytics, first take a step back and think about how you set up your strategy and people with the Analytics Center of Excellence. Otherwise you may find yourself spending countless unnecessary hours putting everything back where it belongs before Mom gets home.

Our team at AlignAlytics has helped to author books on putting together a successful Business Analytics Program and setting up your strategy and Center of Excellence.    You can download the executive overview of “5 Keys to Business Analytics Program Success” here , get started today on your analytics strategy with our Roadmap Services or reach out directly to @tracyleeharris @alignalytics or to learn more.



Analytics, Decision Making & Wine

Author: Gabe Tribuiani 

As our society and economy has evolved, we’ve become accustomed to having an abundance of options in just about any decision we must make.  However, it’s the excessive alternatives we are constantly confronted with that often complicate and delay decision making in our personal and professional lives.  For example, I went out to dinner the other night and wanted to have a glass of wine with my meal.  The waiter handed me a book an inch and a half thick containing their vast array of wine selections. Instead of wading through the pages, I quickly came up with a set of criteria to help me focus and determine my selection.


To start, white and rosé wines were immediately eliminated. I only drink white wine if I’m eating fish. Since I knew that I wasn’t going to order fish, it was simple for me to eliminate the whites (I ordered a pasta appetizer and beef entrée in case anyone is interested). Rosé isn’t really my thing unless I’m at an outdoor party in the summer and it’s mixed with fruit (à la homemade sangria).

I then narrowed my selection according to the type of taste & texture I wanted to experience on this particular night, I was in the mood for a smooth, even balanced, medium bodied, but not too fruity taste. This criterion narrowed my quest to the great varietals of Pinot Noir and Chianti. Because I had ordered a pasta based appetizer, my search led me to select a glass of Chianti (this also went great with the wood fired Tuscan style bread and homemade olive oil).

Finally, I assessed the value and cost (this is often where most people start every decision, particularly in business). I selected a $13 glass which was about middle of the road for the Chianti price point range.  Boom! I just solved my wine selection problem in less than a minute using simple qualitative analytics and all I had to do was establish a core set of criteria that fit my personal needs.

Businesses should approach decision making in a similar fashion. By establishing a list of factors that matter to your organization today and that will also matter in the future, it will allow you to differentiate yourself amongst competitors and result in continuous growth.  Begin collecting data surrounding these factors, constantly evaluate the outcomes of your decisions and modify/tweak your approaches.  Let’s put this into some context.

Say for instance you’re an executive at a multinational manufacturer and part of your strategy is to strive for continual efficiency through operations.  You may decide to invest in multiple Business Intelligence (BI) tools in order to meet this strategic initiative. The question then becomes who, where, and how should your dollars be invested to maximize the greatest return? (See performance driver services) Again, an abundance of alternatives exist.

In order to solve this problem, the organization may decide to embark on creating a BI roadmap (check out our roadmap services) and assess factors that will determine the analytical capabilities of the current operation and where they should go in the future.  For instance, the manufacturer may want to assess the availability/timeliness of information. This factor will determine if the information is delivered to the users when required in order to do an effective job. Drilling down further, you may then assess the information’s relevancy. Does what I receive even matter in the context of my operating unit? If not, why would I continue to receive such information and what solutions are out there for me to resolve this issue would be typical follow up questions upon further evaluation. Asking simple yes/no questions such as “does the current technology allow me to view information in real-time?” can be just as insightful, particular for a manufacturing production facility.

Decision-making doesn’t have to be challenging or scary. If you take the time to set up a repeatable model, subject to regular evaluation and refinement, which fits your needs you can now begin to solve, simple (i.e. what am I going to eat for dinner tonight?) or complex (i.e. what new markets should we be competing in during the next 1, 3, or 5 years?) issues with greater speed and accuracy.

So, now that you’ve decided that analytical decision making is vital to your personal and professional success, let’s toast over a glass of wine (red preferably)!

PMSI Consulting Twitter


"pathway to Pricing Zen" "sales led" "know your customer" "customer segmentation strategy""pricing strategy""mastering the art of positioning and negotiation" Advanced Analytics Analytics analytics is widely used is in sport Artificial Intelligence Balfour Betty Big Data and BI Big Data Processing & Storage BI Software Business Intelligence Challengers classic AMIGA COGNOS corporate strategy Cricket Analysis data-driven strategy and insight Energy Consultant Management Energy Giants Energy Procurement energy services sector enterprise-it ESAB exciting worlds of consumer goods and utilities FRANCE Gabe Tribuiani gaming geocoding Honeywell industrial businesses Innovators Inspired Energy Plc international expansion interpretation of data investment and growth strategies john kiernander Johnson Controls Justin Bieber Kingston University M&C Energy Manufacturing MAPPING market intelligence services merger agreement mobile old computer games outsourcing PMSI and AlignAlytics PMSI Consulting Power Efficiency Premier League pricing strategy and leveraging Private Equity profitable and sustainable operations programming language remote developers Renewable Energy Ricky Gervais & Steve Carell RUnit Schneider sector expertise segmentation and advanced analytics six degrees of separation game software-development Statistical-Programming Language Stock Spirits strategic performance String-utils.R: taubleuPublic Team Dynamics technology television and via digital media Testing in R The office tracking Stuart Broad’s Test career Twilight movies


Get every new post delivered to your Inbox.

Join 1,073 other followers

%d bloggers like this: