Idempotent update of projection query

On startup our application checks the projections running in EventStore. If the ones we need are missing they are added. If they already exist they are updated. If there are projections we don’t use they are deleted. The issue we seem to have is that we don’t have a way to check the query for existing projections, and so we have to always update it, even though it is not needed 99% of the time. And if we have 10 application instances starting up at roughly the same time, we are not sure what are the implications of having the projection updated that many times “in parallel”. We seem to be getting issues with “Multiple projections emitting to the same stream detected”, even though that is not the case, we only have one running projection.

Thoughts on how to manage projections, and if there are good patterns for doing projection management?

Given the asynchronous nature of projections, that is they tail the events as they are appended and project them using the code provided (either built-in or custom), there is less control over when administrative operations take effect (but still with an assumption that it is almost immediate). Now, you could move the responsibility to a single process, separate from the application instance, which you deploy along with your application instances, making sure it only runs once until completion (after deployment), as a single instance. That would do away with concurrency. I tend to call this an automation host, one that takes care of all administrative operations that I’d like to perform on a running cluster (think projections, persistent subscriptions, etc …).

With developers tools in your browser open and on the network tab, if you navigate to the web interface of your cluster, switch to the projections tab, and click on one of your custom projections, you will notice the following projection endpoint being hit at regular intervals:

http://localhost:2113/projection/name-of-your-projection-here/query?config=yes

The json response of that request contains the query as part of its payload. If you set config=no then you’ll get the query, the javascript, as the response body. Combined with the API to list projections that should enable you to perform conditional updates.

Rickard, managing projections is something I’ve been thinking about for a while. How relevant do you think for it to be part of the application code? Wouldn’t it be better to manage custom projections using DevOps tooling, pretty much like database migrations?

@alexey.zimarev there are a few scenarios to cover, so let’s look at them.

First, I need to be able to run tests during development. I do this using JUnit and TestContainers, so ES is started as a Docker container, then my test code runs. If the projection configuration and setup is included in my code it just runs as part of the startup, and after that I know that the config is correct for the code I’m running. If there’s a separate tool this becomes much more complex.

Second, when I am running my application locally during development I often wipe the databases (including ES), and startup my application. I think of databases essentially as a persistent storage, whether it’s a file directory, ES, or some other db. So by running “schema scripts” on startup the application can be sure that after startup the persistent storage is in the state it needs to be given the assumptions of the application. Database migrations can also be done on startup, see LiquiBase for example. If there are major changes such that many application instances are accessing the same ES with very different assumptions, this is when you would do a blue migration instead, so that’s a separate situation.

For development, staging, and production environments, again I still treat ES and other databases as a peristent storage, and it is the application that decides what the schema/assumptions should be. Since the assumptions are encoded in the application it makes sense that the application enforces these assumptions on startup. It makes it easy to test older versions, newer versions, patches, etc. without having to worry about syncing schema management with a separate tool. That separation to me just creates accidental complexity that is unnecessary.

All in all, personally I have always preferred to look at persistent storage as an internal implementation detail of an application, which it therefore manages as per above. It makes life easier.

In this case it is very close to being doable, it is just missing to either allow idempotent changes, where updating a projection to the same query is a no-op, or allowing the client to see what the query currently is and make the decision based on that, since in some cases we have 6+ app instances to the same ES cluster. Whichever of these instances comes up first should do the setup, and the others are no-op’s or skipped.

Makes sense?