This is the fourth article in our series on “How to become a more data-driven organisation”, and we are going to be focusing on Data Platforms.
It is at this point that most people start to dive deep into the technical aspects of Data Lakes vs Data Warehouses, but we want to bring us back up a level and ask “what is it we need our Platform to do for the business?”.
The key is in the term ‘Platform’. It will be responsible for collecting, migrating, transforming and analysing data. It will be many things to many people across the business. The best way to think of the Platform is as a service. And as a result the users of that service are the customers. We need to listen to them and their needs before we start building anything. Users can include a range of people from across the business, everyone from operations; to analysts; to data scientists, and they will all need to access the data in different ways.
What is it for?
The data that is collected and collated on the platform will fit into 2 major categories. It will either be used for Business Intelligence or Product enrichment. Business Intelligence will be providing the business with insight via analysis and reporting and can be thought of as the statistical output of the platform. Product enrichment could include personalisation of features, recommendations and bespoke enhancements to the product – this is the area where we would look to implement things like Data Science, AI and Machine Learning to enrich and evolve the user experience.
Make it extensible
Wherever you choose to start, make sure you leave room to expand. This includes both scale and capability. While there is value in not over-architecting your platform, it is definitely worth taking into account the ambitions of the business and knowing that while you may not need terabytes of storage right now – you may need it in the future. The same goes for how and where the data is stored, things like target daily users and geographical expansion of the business could influence these decisions.
Rather than “Automate the boring stuff” we like to set the goal of “Automate everything”. Especially in data, there will be a lot of repetition and often on a schedule. Ensure that you build automation into the core of your data platform. This is true not only of the reports and analysis – but also of the flow of data itself. A recurring problem we encounter is the number of manual hours spent transferring and transforming data by hand. By automating these jobs throughout the system you free up the team to do the valuable work they were hired for.
Involve the users
Make sure that the users of the platform are involved in the whole process, from planning to execution. Even those that will be consuming any reports. The default position seems to be “collect everything and work it out later” – but this generates a lot of waste and the users seldom get what they want. If you set out with a specific business goal or deliverable OKR in mind it will make the early iterations of the platform easier, and your users will have tangible results.
But what if we already have something?
Most organisations will have data in some shape or form. Some will have got it right from the outset, but this is very rare. If data is not treated as a first class citizen, it will soon become unmaintainable or unusable and the users will become unhappy. This is an eventuality most companies will face at some point. At this point you have the choice to create something new, replace or improve your existing platform.
How do I choose what to do?
There are no silver bullets when it comes to designing a Data Platform as they are often a combination of tools and capabilities. Each organisation’s needs will be different and will require differing levels of complexity. Organisations with simpler needs may choose to use an off-the-shelf product, versus more complex data ecosystems where data can be extracted at many levels. Depending on the skills and experience in the organisation – it is often advantageous to bring in experts to help define the platform in order to meet the varying needs of the users.