Data Integration Best Practices – Data synchronization vs. other types of integration

Jacob Horbulyk Integration best practices, know-how

data integration best practices - data synchronisation vs. other types of integration

In the current chapter of our blog series on data integration best practices, we have already talked about different communication types for data integration – synchronous vs. asynchronous. We have also talked about the main difference between the types of systems that move data – direct data synchronization vs. an integration layer. In this installment of the series, we would like to take a step back and review some differences in what exactly we are going to integrate.

As an exception, we are going to use the word “integration” here in a more high-level sense. Some systems are designed to interact over a fixed protocol – in which case we are talking about integration of permissions. Others are subsystems of a larger system, meaning that here, we deal with integration in the sense of configuration. There are yet other systems that by design, have no integration capabilities with each other.

Integration with a Shared Authentication Mechanism

In this situation, you typically have a system that should solve a problem. In other words, a common business application such as Microsoft AX or Navision. To use this system, users need authentication and authorization. Usually, you can configure such a system to keep track of its users with its own username/password mechanism.

Integration with a Shared Authentication Mechanism

However, these systems are also designed to exist within a corporate ecosystem and as a result, to connect to one or more single sign-on systems. Common protocols for connecting between individual applications and a sign-on system include:

  • LDAP (Active Directory is an implementation of LDAP)
  • OpenID / OpenID Connect (OAuth-based)
  • SAML

In general, the application should work with one of these protocols. In most cases, it is not possible for an outside provider to add these abilities if they don’t exist. For instance, all your business applications use LDAP but you want to buy a software that doesn’t support this protocol and uses OpenID instead.

In such a scenario, it will be nearly impossible, and also impractical, to try to map between the new application and the existing ones. So, you should find out in advance what protocols your desired application supports.

Generally, if properly configured and implemented, these protocols can define error handling by design.

Shared authentication mechanisms introduce, however, a data synchronization problem. Systems that work with such mechanisms fetch information about their users from a single authorisation service. This means that this service will receive the user information from the protocol upon his sign-up. However, if the user was changed or deleted, the system wouldn’t learn of that change unless we have set up a certain data integration flow to detect and apply that change.

Integration between different parts of a system

In this case, you have one application or system which consists of smaller sub-systems – like in the picture below:

Integration between different parts of a system

By default and by design, the application is optimized to work with each subsystem being connected in the way this application “needs” it to be. Consequently, these different sub-systems are responsible for connecting to each other and ensuring that this connectivity remains.

Considering this, it would make no sense for an outside provider to interfere with this structure for whatever reason. You know what they say – “Never change a running system”.

Event Propagation Between Systems

This scenario assumes that you have two different systems that work independently, likely from different vendors to solve two different problems. As a result, there is no “native” way for these systems to talk to each other. And so if an event occurs in one system, the other system needs to become aware of this event.

This scenario is similar to the data synchronization case following bellow, except that it is simpler and lower-level. Once an event has happened, the information about it doesn’t change. For example, once you received an email, its contents stay the same.

Data Synchronization Between Systems

Just like with the previous scenario, we have here two independent systems, likely from different vendors, that solve two different problems. And just like previously, these systems don’t speak to each other by default.

Data Synchronization Between Systems

Therefore, if some data appears in one system, the other system needs to learn about that. The same applies to any updates that happen to this data. For instance, data could be all customers who have bought a particular product. While buying a product is an event (see the section above), the up-to-date information about all these customers is data. To get back to the data integration topic, often, you would care more about synchronizing data and not events.

The next article, which will also close the chapter “Different Types of Integration” of our blog series on data integration best practices, will be shorter than usual but no less important. We will talk about costs calculation for building, operating and maintaining integration projects. Something to think about when deciding for or against a third-party integration solution. Stay tuned by following us on Twitter and LinkedIn!


About the Author

Jacob Horbulyk

Twitter

Pre-sales & Professional Services Engineer at elastic.io. Casual board gamer, language learner, loves a good weekend road trip.


You might want to check out also these posts