Data Integration Best Practices – How to set the framework for data integration projects

Jacob Horbulyk Integration best practices, know-how

data integration project

So far, in our blog series Data Integration Best practices, we have covered the different types of high-level and low-level problems occurring in data integration projects. We have also addressed the different types of integration, the systems that move data and even the pricing aspect of such a project. Ten articles later, we arrived at best practices moving forward. 

In this last chapter, we are going to talk about some tips that revolve around preparing for and running an integration project. In the first article of this chapter, we will have a look at how to describe integrations in such a way that everybody is on the same page: the IT folks, the business people and any other stakeholders that might be involved in it. 

Describe Integrations Better

Describe business logic first, and then mechanics

There is a tendency to look at the available mechanics first, and then define the business logic in terms of the mechanics. However, a far better approach is to clearly describe the business logic and then look at the available mechanics to implement that business logic. 

For example, we have a SOAP API that can do a number of actions. The “mechanics first” approach would be to document that we are going to connect action A to action B, and then action D to action C. Easy to understand for a developer, harder so for non-techies. 

The “business logic” approach first, on the other hand, looks at what you want to have happened. For instance, you want to update a contact in the application X; and then you want to see this update automatically appear in the application Y. After you define that – then you can look at whether a particular API can support this particular use case. 

Use verb + noun descriptions

When describing an interaction with a system, business requirements focus on the noun over the verb. For instance, a business user might submit a request “I need Salesforce contacts”. However, this is not exactly helpful to understand what it is that they want to achieve with Salesforce and contacts. 

In addition to that, there are other problems with such approach. For one, when we are talking about describing the integration needs in the context of an API design, adding or changing nouns is lower cost than adding or changing verbs. In concrete terms, if we have ‘read contact’, it’s easier to create ‘read order’ than ‘write contact’. It has to do with how most APIs are designed – one action is created to be similar to other actions. Hence, ‘reading a contact’ and ‘reading an order’ are pretty similar, whereas ‘reading a contact’ and “writing a contact’ are not. 

In addition to that, verbs indicate the direction of the data transfer and can better describe dependencies. That is why it is necessary to keep in mind that the most important part of the description is the verb (e.g. read, write) and only then the noun (e.g. contact, order). 

Categorize integrations

For each integration in your ecosystem, you should categorize it based on the following axes:

  • Type of integration. Is it shared authentication or integration between parts of a system? Or maybe event propagation vs. data synchronization?
  • Type of communication. Does the integration require the request-reply or async communication pattern?
  • Identify the system moving the data
  • Describe each interaction with a system using the verb + (if necessary) adjective + noun pattern

Formalize ID linking strategies

Last but not least, make it clear how the IDs of related records are linked. This is something we’ve covered in more depth in our earlier article on data duplication and ID linking

Consider using an iPaaS / Integration Layer

It might sound like a very predictable recommendation coming from an iPaaS vendor, but let’s look at this objectively. The more systems you add to your IT infrastructure, the larger the company grows, the more sophisticated the business needs become – the more “interactions” between various systems, applications, databases and what not across the whole organization, and maybe even beyond it, take place. 

First it was, say, simply automating the order fulfillment. Then came synchronizing customer and purchase data between your online and physical shop. Next thing you know, you need the notorious 360° view of the customer; you have some intelligent algorithms running to meet your customers’ subconscious desires; and your network of partners is rapidly growing. All that needs consistent data exchange and synchronization.

As the number of systems and the complexity of the interactions between them increases, trying to manage them the old-fashioned way will quickly lead to bottlenecks in IT projects, hardly traceable bugs and errors, unclean or inconsistent data, and so on, and so forth. 

Eventually, you will need some kind of an integration layer – be it iPaaS or any other comparable system – to be able to keep track of integrations and integrated systems; to find and resolve errors quickly, to add new integrations within a reasonable timeframe; and to enable teams across the organization and beyond to work on the same data integration projects. 

So, here’s the deal: You can regard this recommendation as coming not from an iPaaS vendor, but from a company that has accumulated extensive insights into the complexity and challenges of integration projects. And in our next article, we continue looking at some best practices regarding the overall implementation of such projects.

Specifically, we will go through three types of project environments and what part of the project each type includes. In addition to that, we will review some best practices for log collection. Stay tuned by following us on Twitter and LinkedIn


About the Author

Jacob Horbulyk

Twitter

Pre-sales & Professional Services Engineer at elastic.io. Casual board gamer, language learner, loves a good weekend road trip.


You might want to check out also these posts