It’s not. Imagine a web app that stores your user information in a session store, mapped by your cookie-provided session ID. Your web app searches redis 1 for the session id, but since that key is on redis 2, the lookup fails and the application thinks there is no such session, and rejects the request.
Now you could solve this specific case by sharding by prefix, or by querying all instances, but then you still do not have high availability: if the instance a specific session is on is down, these users cannot authenticate. At that point you’re better off with a single instance.
But that is his point.
If you cannot find the session id in redis, you login again.
If your Redis server crash, you start a new one and everyone just login again. No data is lost.
No two processes can guarantee data consistency unless using shared memory with some kind of locking on update. And given two servers don't share memory, two processes running on these servers can not guarantee consistency either.
To put the simple terms...
App writes to node-A, node-A (/process on node-A) crashes before change is synced from node-A to node-B, data is lost.
This is true for redis and true for postgresql/ mysql or any similar database. Difference between redis and a "database" is that database protects against this problem by writing change to durable storage before telling app that write is successful. Redis
First up, if I wanted to talk to a machine, I would've asked one myself.
Then, I don't understand your point really: Yes, the CAP theorem is a thing. There are compromise solutions available however to enable highly available data storage. Some of them for Redis too, but they are more complicated than those for other database engines. Which is the point of this discussion.
Point is... with AOF and RDB enabled, and wait command used in sane manner, one can get reasonable consistency with a significant speed tradeoff and increase in application complexity. So if consistent cache is needed, one can have that with some compromises, but then probably one could use a database straight away.
Again: Redis is a database, not just a cache - it doesn’t care if you store ephemeral cache artifacts or customer records within it. Redis doesn’t pose any constraints on the type of data. Inversely, you can use Postgres as a semi or fully consistent cache.
And yes, what you’re saying is technically correct, even a well-tuned single node doesn’t solve the availability problem: if it goes down, you have an outage. To avoid that, you need multiple instances to provide the same data, avoiding downtime if one of the nodes breaks eventually.
> doesn’t pose any constraints on the type of data.
logic, raw disk is also a database. One just need to add block level replication to other nodes to build a replicated/ HA database.
We may not agree, but anything not providing transactions across logically related multiple data read/ update operations is not a database.
> multiple instances to provide the same data
is easy and done by a bunch of software out there, but
> multiple instances to provide the same data on non-shared memory computers, with consistency
is a really hard problem, and no one has been able to solve it yet without introducing other problems to be considered (giving up on fast performance being one of the most visible one) by architects/ developers.
This discussion is a bit weird. We started off from, Redis should have better availability guarantees. Specifically to avoid the degradation of service you described.
But that requires running on multiple instances, which in turn requires to share the data across all replicas.
These two concerns are not mutually exclusive, the kind of database or data stored within it doesn't give any availability guarantees on its own. Even a single Postgres instance, which I suppose fits your understanding of a real database, is a single point of failure and not a highly available setup: If your database server goes down, clients get errors and the database is thus unavailable.
> The app would look up in both databases. If it exists in any, there would be a session.
And if you find the session with differing values in both databases, how do you know which one is up-to-date?
You need an algorithm to pick which data is right, such as electing a master instance.
And that brings us back to the original discussion: to manage sessions (unlike caches) in a highly available way, you need to setup HA (or reimplement it, which obviously is a bad idea). You can't read round robin from multiple non-HA instances.
For example if you use it for session storage, you can't have your application read from a random instance that may or may not contain the session.