Download Reference Manual
The Developer's Library for D
About Wiki Forums Source Search Contact

Changes between Version 38 and Version 39 of ChapterClustering

Show
Ignore:
Author:
larsivi (IP: 84.48.50.144)
Timestamp:
02/10/08 23:10:03 (16 years ago)
Comment:

--

Legend:

Unmodified
Added
Removed
Modified
  • ChapterClustering

    v38 v39  
    1 [wiki:ClusteringComments Leave Comments, Critiques, and Suggestions Here] 
    2  
     1[[TOC()]] 
    32= Tango Clusters = 
    43 
    5150This embedded QOS is geared towards high efficiency, and makes a point of avoiding any heap activity in all but the one case where it becomes a necessity (cache hosting). It leverages both TCP/IP and multicast for transmission purposes, and disk files for queue storage. The embedded QOS is affectionately known as ''Tina'' and, to use it, an application would import ''tango.net.cluster.tina.Cluster'' 
    5251 
    53 = Key Notions
     52== Key Notions =
    5453 
    5554As with any toolkit, there are a few key ideas to become familiar with in order the get the best out of it. There are only a few to deal with and we’ll address these in this section, beginning with the ''channel'': 
    5655 
    57 == Channel == 
     56=== Channel === 
    5857 
    5958Ever used a publish/subscribe system? If so, the notion of a channel will likely be familiar. It is a named entity through which messages are transported and delivered. Clustered applications subscribe (or listen) to a channel, and publish (or write) on a channel. Without a channel, there is no way in which to communicate with the cluster. You obtain one by asking the QOS to create one on your behalf, and then utilize it from that point forward. In reality, various utility classes exposed by the model will perform channel creation in the background for you. However, you are free to utilize a channel directly if the need arises. 
    6362When a message is sent on a channel of a given name, only those listening on the same channel are candidates to receive that message. The channel name must be identical in both places for communication to occur: this provides the basis for segregating different types of messages for different purposes. In practice, we’ve found it highly convenient to use dot-notation for channel names – ''employee.contact.information'' for example – and to use channels to differentiate between different data types, or aggregates thereof. In fact, channels are a good way of representing a class in the D programming language – one channel for each distributed class – which nicely segregates differing content from one another and notably simplifies the transmission of aggregate data across the cluster itself. This is a good point at which to segue into ''message'': 
    6463 
    65 == Message == 
     64=== Message === 
    6665 
    6766Messages are the basis of all cluster content. When you send something to a queue it is in the form of a message. When retrieving content from a cache, you will receive a message. When executing a task within the cluster, it is represented by a message. When asynchronous multicast bulletins are distributed across the cluster, they are message instances. Everything in the cluster is a message. 
    8887Other than registration requirements, each message is a standard D class and operates in the normal fashion. It just has the additional abilities to appear and optionally ''behave'' upon other machines in the cluster. 
    8988 
    90 == Queue == 
     89=== Queue === 
    9190 
    9291A queue is a stash of messages. Each queue is identified by its channel name, thus each channel being queued will have a distinct queue instance. Messages are placed into the queue(s) via channel activity and retrieved in a similar manner. These latter two operations represent synchronous activity. Alternatively, message consumers (channel subscribers) can listen asynchronously for message activity, and have messages fed to them when queue activity occurs. Both approaches have their utility and the choice is yours to make. 
    9897Queues are persistent; they survive power failures. 
    9998 
    100 == Cache == 
     99=== Cache === 
    101100 
    102101Cache hosts store messages in much the same way as an associative array, or hash table, does. Messages are isolated by channel name, and are addressable by a key value. In Tango, this key is an array of the char type (char[]). 
    108107Cache instances are intended to be temporal only, thus the ''Tina'' implementation does not persist them. 
    109108 
    110 == Task == 
     109=== Task === 
    111110 
    112111A task is an executable message, and executes outside of the invoking process. In general, it will appear on one of the available task servers (in the cluster) and execute there before returning to the caller with results. This is a synchronous execution model. For a decoupled execution model, the task can be sent to a queue and hosted there until a subscriber retrieves and executes it. Replies from the decoupled model would generally be sent back via another queue, in the same manner as generic queue messages are replied to (see Queue above). 
    116115Task messages are also distinct in that they should be ''registered'' with the cluster. This means that the task message is an integral part of each task server, such that it can be executed there. In practice, there are two principal options available: statically link the task messages into each task server, or dynamically distribute and link them into each task server. Please note that dynamic linking is not currently available to D on all platforms, so the default ''Tina'' implementation takes the former route for now – registering with a task server is a matter of an import and a method call. 
    117116 
    118 == Bulletin == 
     117=== Bulletin === 
    119118 
    120119A notification message style sent to all cluster participants, leveraging the most efficient underlying mechanisms available. These messages are limited in size (generally less than 1KB maximum), and are intended to be simple and lightweight in nature. The Tina QOS uses bulletins for cluster discovery, queue activity notification, cache coherence, and uses multicast as the distribution mechanism. When a bulletin is sent, ''all'' listeners on the same channel will receive it.  
    130129When a notification occurs, the arrival context and incoming message are made available via a parameter passed to the listener. In most notification cases, the arriving message is a single entity representing the notification itself. However, a queue notification will result in one or more queued messages being delivered. 
    131130 
    132 = Client Usage
     131== Client Usage =
    133132  
    134133In this section we’ll take a look at how to use the cluster features through code examples. The first step is to import an appropriate cluster. For these examples we’ll be using the Tina QOS provided, but for other implementations one would import the relevant package instead. Note that we’ll focus on the client side here, and the server side in a following section. 
    135134 
    136 == Cache Client == 
     135=== Cache Client === 
    137136 
    138137In this example we show how to use the cluster as a distributed cache. There are a number of operations available, though the general idea is illustrated here. Note that we pass the command-line arguments to the join() methods: this configures the cache with the full set of valid cache instances available. Unlike other facilities, cache instances are not self-discovering. 
    164163}}} 
    165164 
    166 == Bulletin Client == 
     165=== Bulletin Client === 
    167166 
    168167How to send and receive notifications across the cluster. These are send to every listener on the specific broadcast channel. Take note that we create a callback function and pass to the cluster as our bulletin consumer. 
    199198 
    200199 
    201 == Queue Pull Client == 
     200=== Queue Pull Client === 
    202201 
    203202How to setup and use a queue in synchronous mode. We just place something into our queue and retrieve it: 
    224223}}} 
    225224 
    226 == Queue Push Client == 
     225=== Queue Push Client === 
    227226 
    228227Illustrates how to setup and use a Queue in asynchronous mode. We provide a listener delegate to the cluster, invoked when subscribed content arrives in a queue (from anywhere on the cluster). 
    263262 
    264263 
    265 == Queue Reply Client == 
     264=== Queue Reply Client === 
    266265 
    267266In this variation we queue a message in the cluster, receive it via a listener, reply to that message on a different channel and, finally, receive the reply. There are two listeners in this example: 
    299298}}} 
    300299 
    301 == Task Client == 
     300=== Task Client === 
    302301 
    303302Cluster task execution generally comprises three participants. First we create the task itself, generally in a distinct module. In this case we're demonstrating the use of an ''expression'' task: 
    346345}}} 
    347346 
    348 = Tina
     347== Tina =
    349348 
    350349Tina is the default QOS implementation, providing three distinct servers for handling each or queue, cache, and task requests. Source code is provided in the form of a toolkit, and one is expected to configure each server to specific needs. However, there are also examples programs supplied, through which a working server can be constructed via a simple compilation. These examples are trivial front-ends to the server functionality, so there should be little difficulty in getting going. For example, here is ''qserver.d'' in full: 
    371370Each of these servers has a set of command-line options for configuring the amount of log data emitted and the server port number. If neither is specified, an appropriate default will be set. All cluster examples reside in the ''tango/example/cluster'' folder, and the modules therein are referred to by name in the following discussion. 
    372371 
    373 == Queue Server == 
     372=== Queue Server === 
    374373 
    375374The queue is straightforward to configure: compile the example module ''qserver.d'' and start it up. Each queue is written to a (distinct) file in the directory where the server is started from. This means that two queue-server instances cannot be started from the same directory, since the queue files are not shared. To instantiate multiple queue-servers on a single machine, start them from different directories. 
    376375 
    377 == Cache Server == 
     376=== Cache Server === 
    378377 
    379378The cache is also straightforward to configure: compile the example module ''cserver.d'' and start it up. When using Tina, cache clients require a set of ''server:port'' combinations in order to identify the set of valid cache servers. This is needed due to the nature of the distribution algorithm in use, which requires knowledge of all servers. If, for example, not all cache-instances were running when a client started, the cluster-wide cache would potentially be viewed differently by that particular client than another. Thus, when each cache-server is started, make a note of the port selected or configure it on a specific port. This list should be provided to the cache-clients when they are started. 
    387386In general, it would be considered good practice to isolate each task, or group of tasks, into distinct modules – if for no other reason that maintenance and ease of isolation. 
    388387 
    389 == Logging == 
     388=== Logging === 
    390389 
    391390The servers in Tina all use the Tango logging subsystem to report activity. By default the content is logged to the console only, but by adjusting the server configuration one can direct the log to various other targets, including files and so on. Each server is provided with a logger instance by the hosting application, and this is where such configuration should take place (adding an ''appender'', etc). Please see the documentation on logging for further details. 
    392391 
    393 = Tech Notes
     392== Tech Notes =
    394393 
    395394These are programming concerns which may help you get the most out of the cluster toolkit. 
    396395 
    397 == Threads == 
     396=== Threads === 
    398397 
    399398Cluster listeners are asynchronous by nature, being processed on a separate thread from the main program. When a bulletin notification arrives (''push''), a delegate provided by the client is invoked with sufficient information to retrieve the incoming message(s).  
    401400It is up to the client to ensure appropriate measures are taken to ensure correct action ensues when a notification arrives, given that it is inherently a multi-threaded application at that point. We will likely add a module to convert these asynchronous notifications into event, once the event-subsystem is put in place. In the latter case, all asynchronous notifications would effectively be converted into synchronous notification instead. 
    402401 
    403 == Message Slicing == 
     402=== Message Slicing === 
    404403 
    405404IO within Tina is multi-threaded. Rather than share a single set of IO buffers, each channel instance has its own set. This sidesteps any issues regarding thread-contention & synchronization, and enables Tina to avoid heap-allocation entirely for all network activity. This significantly reduces the memory footprint of your applications, avoids a common point of thread contention, removes clustering as a potential instigator of garbage collection, and generally limits the load placed upon the host computer. 
    409408This may becomes an issue where a client intends to store the message locally for a period of time, rather than process it immediately. The design trades-off a large savings in GC pressure for the potential of some message ''cloning'' as and when necessary – the act of copying an incoming message such that it is no longer considered transient. The message class has a clone() method specifically for this purpose, and it should be used accordingly. 
    410409 
    411 == Message Constraints == 
     410=== Message Constraints === 
    412411 
    413412In order to successfully send a message it should generally be self-contained. That is – wherever a message is re-instantiated, the representation of it should not require the influence of any third party - it should support what's known as a default-constructor. 
    419418Shipping and executing unregistered tasks on the cluster will result in a remote exception, returned to the caller. However we expect to add a facility to install and register tasks dynamically, subject to potential security concerns. 
    420419 
    421 == Registration and Hosting == 
     420=== Registration and Hosting === 
    422421 
    423422Upon receipt of each incoming message, a cluster client requires a class instance to ''host'' the content. In most cases, the host is selected from the message registry where all your application message types were previously enrolled. This is not required for task messages, since the outgoing message instance is used to host the result also. For other message types though, the host is required. Instead of depending upon the registry, an application may manually supply an appropriate host as part of a cluster request. This can be convenient in some advanced uses, especially where the channel name maps directly to a specific message type (a one-to-one mapping between the channel and a message class). 
     423 
     424== Translations == 
     425 
     426 * [http://joyfire.spaces.live.com/blog/cns!502060A314B1A145!1601.entry Chinese] 
     427 
     428== User Comments == 
     429 
     430[[EmbedReplies(DocComments,ChapterClustering)]]