Distributed processes in software architecture




















Even if a process crashes abruptly, it should preserve all the data for which it has notified the user that it's stored successfully. Depending on the access patterns, different storage engines have different storage structures, ranging from a simple hash map to a sophisticated graph storage.

Because flushing data to the disk is one of the most time consuming operations, not every insert or update to the storage can be flushed to disk. So most databases have in-memory storage structures which are only periodically flushed to disk. This poses a risk of losing all the data if the process abruptly crashes.

A technique called Write-Ahead Log is used to tackle this situation. Servers store each state change as a command in an append-only file on a hard disk. Appending a file is generally a very fast operation, so it can be done without impacting performance. A single log, which is appended sequentially, is used to store each update. At the server startup, the log can be replayed to build in memory state again.

This gives a durability guarantee. The data will not get lost even if the server abruptly crashes and then restarts. But clients will not be able to get or store any data till the server is back up. So we lack availability in the case of server failure. One of the obvious solutions is to store the data on multiple servers. So we can replicate the write ahead log on multiple servers.

When multiple servers are involved, there are a lot more failure scenarios which need to be considered. It can vary based on the load on the network. For example, a 1 Gbps network link can get flooded with a big data job that's triggered, filling the network buffers, which can cause arbitrary delay for some messages to reach the servers.

In a typical data center, servers are packed together in racks, and there are multiple racks connected by a top-of-the-rack switch. There might be a tree of switches connecting one part of the data center to the other. It is possible in some cases, that a set of servers can communicate with each other, but are disconnected from another set of servers. This situation is called a network partition.

One of the fundamental issues with servers communicating over a network then is how to know a particular server has failed. To tackle the first problem, every server sends a HeartBeat message to other servers at a regular interval.

If a heartbeat is missed, the server sending the heartbeat is considered crashed. The heartbeat interval is small enough to make sure that it does not take a lot of time to detect server failure. As we will see below, in the worst case scenario, the server might be up and running, but the cluster as a group can move ahead considering the server to be failing.

This makes sure that services provided to clients are not interrupted. The second problem is the split brain. With the split brain, if two sets of servers accept updates independently, different clients can get and set different data, and once the split brain is resolved, it's impossible to resolve conflicts automatically.

To take care of the split brain issue, we must ensure that the two sets of servers, which are disconnected from each other, should not be able to make progress independently. To ensure this, every action the server takes, is considered successful only if the majority of the servers can confirm the action. If servers can not get a majority, they will not be able to provide the required services, and some group of the clients might not be receiving the service, but servers in the cluster will always be in a consistent state.

The number of servers making the majority is called a Quorum. How to decide on the quorum? That is decided based on the number of failures the cluster can tolerate. So if we have a cluster of five nodes, we need a quorum of three. Quorum makes sure that we have enough copies of data to survive some server failures. But it is not enough to give strong consistency guarantees to clients. Lets say a client initiates a write operation on the quorum, but the write operation succeeds only on one server.

The other servers in the quorum still have old values. When a client reads the values from the quorum, it might get the latest value, if the server having the latest value is available. But it can very well get an old value if, just when the client starts reading the value, the server with the latest value is not available. To avoid such situations, someone needs to track if the quorum agrees on a particular operation and only send values to clients which are guaranteed to be available on all the servers.

Leader and Followers is used in this situation. One of the servers is elected a leader and the other servers act as followers. The leader controls and coordinates the replication on the followers. The leader now needs to decide, which changes should be made visible to the clients. A High-Water Mark is used to track the entry in the write ahead log that is known to have successfully replicated to a quorum of followers. All the entries upto the high-water mark are made visible to the clients.

The leader also propagates the high-water mark to the followers. So in case the leader fails and one of the followers becomes the new leader, there are no inconsistencies in what a client sees. Even with quorums and leader and followers, there is a tricky problem that needs to be solved.

Leader processes can pause arbitrarily. There are a lot of reasons a process can pause. For languages which support garbage collection, there can be a long garbage collection pause. A leader with a long garbage collection pause, can be disconnected from the followers, and will continue sending messages to followers after the pause is over.

In the meanwhile, because followers did not receive a heartbeat from the leader, they might have elected a new leader and accepted updates from the clients. If the requests from the old leader are processed as is, they might overwrite some of the updates. So we need a mechanism to detect requests from out-of-date leaders.

Here Generation Clock is used to mark and detect requests from older leaders. The generation is a number which is monotonically increasing. The problem of detecting older leader messages from newer ones is the problem of maintaining ordering of messages. It might appear that we can use system timestamps to order a set of messages, but we can not. The main reason we can not use system clocks is that system clocks across servers are not guaranteed to be synchronized.

A time-of-the-day clock in a computer is managed by a quartz crystal and measures time based on the oscillations of the crystal. This mechanism is error prone, as the crystals can oscillate faster or slower and so different servers can have very different times. The UML 1. It serves as a standard for software requirement analysis and design documents which are the basis for developing a software. UML can be described as a general purpose visual modeling language to visualize, specify, construct, and document a software system.

Although UML is generally used to model software system, it is not limited within this boundary. It is also used to model non software systems such as process flows in a manufacturing unit. The elements are like components which can be associated in different ways to make a complete UML picture, which is known as a diagram.

So, it is very important to understand the different diagrams to implement the knowledge in real-life systems. We have two broad categories of diagrams and they are further divided into sub-categories i. Structural Diagrams and Behavioral Diagrams.

Structural diagrams represent the static aspects of a system. These static aspects represent those parts of a diagram which forms the main structure and is therefore stable. These static parts are represented by classes, interfaces, objects, components and nodes. Represents a set of objects and their relationships at runtime and also represent the static view of the system.

Describes the package structure and organization. Covers classes in the package and packages within another package. Deployment diagrams are a set of nodes and their relationships. These nodes are physical entities where the components are deployed.

Behavioral diagrams basically capture the dynamic aspect of a system. These controllers are known as actors. Describes the flow of control in a system. It consists of activities and links. The flow can be sequential, concurrent, or branched. Represents the event driven state change of a system. Parallel Processing There used to be a distinction between parallel computing and distributed systems.

Distributed artificial intelligence Distributed Artificial Intelligence is a way to use large scale computing power and parallel processing to learn and process very large data sets using multi-agents. Distributed System Architecture Distributed systems must have a network that connects all components machines, hardware, or software together so they can transfer messages to communicate with each other.

That network could be connected with an IP address or use cables or even on a circuit board. The messages passed between machines contain forms of data that the systems want to share like databases, objects, and files. Distributed systems were created out of necessity as services and applications needed to scale and new machines needed to be added and managed. In the design of distributed systems, the major trade-off to consider is complexity vs performance.

Types of Distributed System Architectures: Distributed applications and processes typically use one of four architecture types below: Client-server: In the early days, distributed systems architecture consisted of a server as a shared resource like a printer, database, or a web server. Today, distributed systems architecture has evolved with web applications into: Three-tier: In this architecture, the clients no longer need to be intelligent and can rely on a middle tier to do the processing and decision making.

Most of the first web applications fall under this category. The middle tier could be called an agent that receives requests from clients, that could be stateless, processes the data and then forwards it on to the servers. Multi-tier: Enterprise web services first created n-tier or multi-tier systems architectures. This popularized the application servers that contain the business logic and interacts both with the data tiers and presentation tiers. Peer-to-peer: There are no centralized or special machine that does the heavy lifting and intelligent work in this architecture.

All the decision making and responsibilities are split up amongst the machines involved and each could take on client or server roles. Blockchain is a good example of this. Pros and Cons of Distributed Systems. Advantages of Distributed Systems: The ultimate goal of a distributed system is to enable the scalability, performance and high availability of applications.

Major benefits include: Unlimited Horizontal Scaling - machines can be added whenever required. Low Latency - having machines that are geographically located closer to users, it will reduce the time it takes to serve users. Fault Tolerance - if one server or data centre goes down, others could still serve the users of the service.

Advantages of Distributed Systems:. Disadvantages of Distributed Systems: Every engineering decision has trade offs.



0コメント

  • 1000 / 1000