Scalability is the property of a system to handle a growing amount of load by adding resources to the system
Article Link: https://blog.algomaster.io/p/scalability
How to scale a system
Vertical Scaling (Scale Up)
Adding more power to existing machines by upgrading server with more RAM, faster CPUs, or additional storage
Good approach for simpler architectures but has limitations in how far you can go
Horizontal Scaling (Scale out)
Adding more machines to your system to spread the workload across multiple servers.
Considered the most efficient way to scale for large systems
Load Balancing
Load balancing is the process of distributing traffic across multiple servers to make sure no single server becomes overwhelmed
Caching
Caching is a technique to store frequently accessed data in-memory (like RAM) to reduce the load on the server or database
Implementing caching can dramatically improve response times
Content Delivery Networks (CDNs)
CDN distributes static assets (images, videos, etc.) closer to users. This can reduce latency and result in faster load times.
Sharding/Partitioning
Partitioning means splitting data or functionality across multiple nodes/servers to distribute workload and avoid bottlenecks
Asynchronous Communication
Asynchronous Communication means deferring long-running or non-critical tasks to background queues or message brokers
Ensures main application remains responsive to users
Microservices Architecture
Micro-services architecture breaks down application into smaller independent services that can be scaled independently
Auto-Scaling
Auto-Scaling means automatically adjusting the number of active servers based on the current load
Ensures system can handle spikes in traffic without manual intervention
Multi-region Deployment
Deploy the application in multiple data centers or cloud regions to reduce latency and improve redundancy