Replies: 2 comments
-
I am using a few connections that often find themselves in very long syncs in initial sync -- Facebook Marketing and Amazon Seller Central. Due to poor source APIs, it can frequently fail during a longer sync. The ability to flush data to the final tables and check the incremental sync progress (based on date) is paramount. |
Beta Was this translation helpful? Give feedback.
-
This feature is very important to us. We really need this feature for initial syncs. Our environment is in Kubernetes and we have to provide a very large amount of resources to all pods. Sometimes, the sync can take 3-5 days, after which all data must be saved at once. We are using Postgres destination. |
Beta Was this translation helpful? Give feedback.
-
When we were developing Destinations V2 we initially thought that a feature the community would like was the ability to load data incrementally into the final tables, so that longer running syncs would have some data available even if the sync had not finished. After releasing destinations V2, we realized that incrementally loading this data can actually slow down the sync and can increase your data warehouse costs if you're using a managed service. We made the decision to disable this option by default and are considering removing this option completely as it is not popular.
This also impacts how we're thinking about refreshing data. For various reasons, we occasionally need to "refresh" the data from a stream by overwriting what is in the final table with a fresh copy. There are various failure scenarios which can lead to an empty final table, which we don't think is the right experience - instead, we believe that stale data is better than no data. Incrementally loading adds complexity to this situation, which is why (in addition to be an unpopular config option) we're considering removing it completely.
Is loading data incrementally to your final table important to you?
Beta Was this translation helpful? Give feedback.
All reactions