$all stream does not exist? FRAMING ERROR?

jayme.davis · May 21, 2025, 7:35pm

Hi team.

We are on a very old version of EventStore (4.1.1). We will be upgrading soon with your team. However, we have an outage that can’t really be explained:

We have two different instances of EventStore running. Instance 1 works fine. You can go to the UI and request the $all stream, and see all of the events.

However, instance 2 does not work (for the past few days). If you go to the UI and request the $all stream, it says “Could not open stream $all. This usually means the stream does not exist or you do not have permission to view it”.

Going to the logs of instance 2, it looks like this (and has every day since it started)
[PID:04860:038 2025.05.17 18:38:58.031 ERROR LengthPrefixMessageF] FRAMING ERROR! Data:
[PID:04860:038 2025.05.17 18:38:58.031 ERROR LengthPrefixMessageF] 000000: 03 00 00 2F 2A E0 00 00 00 00 00 43 6F 6F 6B 69 | …/* …Cooki
000016: 65 3A 20 6D 73 74 73 68 61 73 68 3D 41 64 6D 69 | e: mstshash=Admi
000032: 6E 69 73 74 72 0D 0A 01 00 08 00 03 00 00 00 | nistr…

there’s all sorts of other logs with weird messages, such as:
[PID:00108:032 2025.05.21 09:38:06.105 ERROR LengthPrefixMessageF] FRAMING ERROR! Data:
[PID:00108:032 2025.05.21 09:38:06.105 ERROR LengthPrefixMessageF] 000000: 66 6F 78 20 61 20 31 20 2D 31 20 66 6F 78 20 68 | fox a 1 -1 fox h
000016: 65 6C 6C 6F 0A 7B 0A 66 6F 78 2E 76 65 72 73 69 | ello.{.fox.versi
000032: 6F 6E 3D 73 3A 31 2E 30 0A 69 64 3D 69 3A 31 0A | on=s:1.0.id=i:1.
000048: 7D 3B 3B 0A | };;.

thoughts?

yves.lorphelin · May 22, 2025, 10:45am

That looks like a corrupted file.

Instance 1 , Instance 2 are those from the same cluster ?

jayme.davis · May 22, 2025, 4:01pm

No, they are completely separate clusters. Is there a mechanism to resolve the corrupted file, or locate the corrupted chunk so I can try to replace on a backup?

yves.lorphelin · May 23, 2025, 3:39pm

The easiest is to restore that specific node from a backup.

Let us know how that goes .

jayme.davis · May 23, 2025, 3:58pm

unfortunately our app uses another sql database which tracks the event numbers. if we restored the entire thing, i worry there could be clashes with things being overwritten.

is there anyway to find out which block is faulty? perhaps do a partial backup restore? is that not a thing?

edit: I will at least look into this. thank you so much!

yves.lorphelin · May 24, 2025, 12:13pm

if it’s a cluster, just restore that node, it will catch-up from the other nodes