Data Architecture strategy for Analytical Environments
I worked for around 10 years in projects and technologies related with data (Business Intelligence, Big Data, Data Integrations, etc) and the last four years in the Data Architect Role. Many times I saw how the Analytics / Business Intelligence / Big Data projects explode and simple fail to arrive to the objectives for many reasons.
The main ones are the absence of Data Architecture strategies related with this kind of projects.
So, the context is how the Data Architecture can help to the analytics teams to make strong data models, help the business and survive in the way to complete this objective.
We have a lot of books related with this topic, but I think this books does not help when we try to create a Big Data or business intelligence strategy, this is beacause the major of this books try to explain the business intelligence from the data modeling and not from the corporate strategy and the reasons because some companies needs one Big Data or Business Intelligence systems. Another point is the problem where does not exist the union between this two vital concepts: The strategy and the good practices to make an analytics environment.
Before writing this post, I read several books related to this topic and I did not find any that answered my questions ----> "How to think an Analytics Strategy in relation with the Corporate Data Architecture Plan and Corporate Strategy".
These bibliographies are very good and talk about concepts related to the practice related to Analytical environments and technology associated to this environments. For example:
- Think your model in relation with your Business Process.
- Save the data history and capture your data changes and save its too in the dimensions (Slowly changing dimension).
- Use subrogated keys.
- Your Strategy, has not to be a part of your Infrastructure diagram. Your infrastructure diagram has to be a part of your strategy.
So, thinking in this context, and writing this not short introduction, I will write about this and take some examples using Architectures Frameworks and try to think the Data Architecture Strategy to make the Big Data or Business Intelligence Strategy. In my case, I'll use de TM Forum framework to explain and make this post.
The architecture frameworks are a real headache, but over the years I made my friendship with them. When you understand how they work, it is at that moment that you understand the value you can get from them.
In my case I started with the Frameworks in my job in the Data Architect Role. My role was Data Architect, and my function was all the things related to the data architecture environments, where Big Data and Analytics was one of them.
I started to learn and work with TmForum framework to map the data entities, the relation with de application that manage this data entities and next of that the business process that manage this entities.
The TmForum architecture framework consists of three great universes:
- The universe of applications (TAM ): Where for each domains that exist in the company, maps the applications related to each domain.
- The universe of business processes (Etom): Where documents the processes that make the organization work.
- The universe of data (SID): Where documents the data entities that make up the organization by its different domains.
In this three points we have the key to think in a new way to solve the Big Data and business intelligence strategy and the way about how we think in that.
First, we have to response the question what is Analytics Strategy:
The Analytics strategy, is how you manage your data to transform it in Information. To transform isolated data in information, you have to think your data in terms about how these respond about your business processes. Next of that, you will be able to model your data in this terms, but you have to respond:
- What processes does my company have?
- What the business want to know
- Which data represent the processes.
- In which applications I have the data I need.
To make this points posible, you have to make the relation between:
- The data Entityes.
- The applications that is the owner of this data.
- The process responsible to manage this data.
Lets take an example with the sale process related to product activation.
The Etom Framework define many processes, one of this process is "Order to Payment process" that define all the different steps where the customer orders the product and its activate it. Look:
tmForum official doc
Order to payment process (complet step by step)
We can see, how the process make to many things between the step where the customer accept the proposal and the service is ready to use and the invoice is received.
When the people from business says "I want to know how much we sold", they are trying to say that they want to see the process by which the company:
- Sell their products,
- The performance of the sale process,
- The quantities of products from the commercial offer that the company sold,
- At what moment after accepting the proposal the customer decides to abandon the process.
In our language, in this case the business staff wants to know about the "order to payment" process (for this example).
Each step off this process is related with an specific Data Entity/Domain (customer, product, Resources, Services, Market/Sales)
So, once we have identified our process we have to understand and map the relations with each process step with the corresponding data entity.
Look in the following image how the Domains have data entities and they are grouped with the same methodology as eTom:
Each Domain has entities, and each entity is composed with data objects. For example, in the Customer domain you will find all data entities (Abe's) related with the customer (customer, customer order, customer bill, etc).
And each Abe (data entity´s group) has the definition about that entities and the objects which that is composed it. In this case, the entity "customer" is a partyRole composed by party and these is could be individuals or organizations:
This Definition for each data entitye, not only will give to the organization the definition about how is compose each data entitie, will give you one absolute and very important concept ---> THE COMMON LENGUEAGE ABOUT YOUR DATA, because in this example you are defining what is the customer and how this is composed.
So, once you have the mapping for all your data entityes and process, the work is not finished, you need to merge this three universes to make the relations between them (data, applications and proceses) and think next your strategy about how you will model your analytics environment.
Let's put our hands on some examples to make it more clear:
In this example I´ll work with customer entity for a data entity example and the "order to payment" process to make the relation between processes, apps and data entities.
Customer Data entities:
Here we have what it means when we talk about "Customer". Customer is a PartyRole, composed for "partys" and this "partys" could be Organizations or Individuals. Customer is a relationship between data entities and when we talk about Individuals, we talk about not only customers, this individuals could be partners, employees or another PartyRole.
Order to payment process:
The "Order to payment" process (in this example), describe the "step by step" to make posible the customer request into "ready to use" product. Each step is grouped by domains.
So, to define the correct strategy before to start to design the analytical data models and the technical requirements, we have to think about "what the business want to know" and detect that questions reflected in which business process in the company can resolve that question.
Next, we have to solve inside the business process, which data entities are involved in each step, and what mean each step in this process.
We have to make the relations between this two universes (data entities and business processes) and we´ll get all the data entities we need to get from the operational data sources to make the analytical model. Also, we will obtain the data entities that are managed by each of these processes together with their relationship, operation and of course the entities that represent the answers that the business needs to answer.
- In Yellow: Business Processes Steps.
- In Orange: Data Entities.
If you think your Analytical platform in this way, you are thinking in a strategy, where you set your action points centered in the business and the requirements they need to solve.
You are thinking your data models in relation with the business processes and not in relation about the reports the users are making in that moment (commonly a mistake).
If you use the methodology where the data lakes and the analytical models represent the business processes in the company and the cannonical representation of the data entities, your analytical platform will be ever compliance with the business needs.
You can start to define your architecture and the models without any person from the business, because the business has defined the business processes. So, you need to understand that and start to think on:
- What my users want to know?
- Which business processes exist in my company and how can I make the merge between point one and the B.P?.
- What processes should I prioritize?.
- Start to define the data structures in the way to represent the data entities and the relations creating this data entities in the canonical form of his definitions.
- Linkedin: https://www.linkedin.com/in/martingatto/
- Twitter: https://twitter.com/gattom83
Subscribe to Martin Gatto
Get the latest posts delivered right to your inbox