Changes between Version 38 and Version 39 of ChapterClustering

Author:: larsivi (IP: 84.48.50.144)
Timestamp:: 02/10/08 23:10:03 (16 years ago)
Comment:: --

Legend:

: Unmodified
: Added
: Removed
: Modified

ChapterClustering

v38	v39
1		[wiki:ClusteringComments Leave Comments, Critiques, and Suggestions Here]
2
	1	[[TOC()]]
3	2	= Tango Clusters =
4	3
51	50	This embedded QOS is geared towards high efficiency, and makes a point of avoiding any heap activity in all but the one case where it becomes a necessity (cache hosting). It leverages both TCP/IP and multicast for transmission purposes, and disk files for queue storage. The embedded QOS is affectionately known as ''Tina'' and, to use it, an application would import ''tango.net.cluster.tina.Cluster''
52	51
53		= ~~Key Notions~~ =
	52	== Key Notions ==
54	53
55	54	As with any toolkit, there are a few key ideas to become familiar with in order the get the best out of it. There are only a few to deal with and we’ll address these in this section, beginning with the ''channel'':
56	55
57		== ~~Channel~~ ==
	56	=== Channel ===
58	57
59	58	Ever used a publish/subscribe system? If so, the notion of a channel will likely be familiar. It is a named entity through which messages are transported and delivered. Clustered applications subscribe (or listen) to a channel, and publish (or write) on a channel. Without a channel, there is no way in which to communicate with the cluster. You obtain one by asking the QOS to create one on your behalf, and then utilize it from that point forward. In reality, various utility classes exposed by the model will perform channel creation in the background for you. However, you are free to utilize a channel directly if the need arises.
63	62	When a message is sent on a channel of a given name, only those listening on the same channel are candidates to receive that message. The channel name must be identical in both places for communication to occur: this provides the basis for segregating different types of messages for different purposes. In practice, we’ve found it highly convenient to use dot-notation for channel names – ''employee.contact.information'' for example – and to use channels to differentiate between different data types, or aggregates thereof. In fact, channels are a good way of representing a class in the D programming language – one channel for each distributed class – which nicely segregates differing content from one another and notably simplifies the transmission of aggregate data across the cluster itself. This is a good point at which to segue into ''message'':
64	63
65		== ~~Message~~ ==
	64	=== Message ===
66	65
67	66	Messages are the basis of all cluster content. When you send something to a queue it is in the form of a message. When retrieving content from a cache, you will receive a message. When executing a task within the cluster, it is represented by a message. When asynchronous multicast bulletins are distributed across the cluster, they are message instances. Everything in the cluster is a message.
88	87	Other than registration requirements, each message is a standard D class and operates in the normal fashion. It just has the additional abilities to appear and optionally ''behave'' upon other machines in the cluster.
89	88
90		== ~~Queue~~ ==
	89	=== Queue ===
91	90
92	91	A queue is a stash of messages. Each queue is identified by its channel name, thus each channel being queued will have a distinct queue instance. Messages are placed into the queue(s) via channel activity and retrieved in a similar manner. These latter two operations represent synchronous activity. Alternatively, message consumers (channel subscribers) can listen asynchronously for message activity, and have messages fed to them when queue activity occurs. Both approaches have their utility and the choice is yours to make.
98	97	Queues are persistent; they survive power failures.
99	98
100		== ~~Cache~~ ==
	99	=== Cache ===
101	100
102	101	Cache hosts store messages in much the same way as an associative array, or hash table, does. Messages are isolated by channel name, and are addressable by a key value. In Tango, this key is an array of the char type (char[]).
108	107	Cache instances are intended to be temporal only, thus the ''Tina'' implementation does not persist them.
109	108
110		== ~~Task~~ ==
	109	=== Task ===
111	110
112	111	A task is an executable message, and executes outside of the invoking process. In general, it will appear on one of the available task servers (in the cluster) and execute there before returning to the caller with results. This is a synchronous execution model. For a decoupled execution model, the task can be sent to a queue and hosted there until a subscriber retrieves and executes it. Replies from the decoupled model would generally be sent back via another queue, in the same manner as generic queue messages are replied to (see Queue above).
116	115	Task messages are also distinct in that they should be ''registered'' with the cluster. This means that the task message is an integral part of each task server, such that it can be executed there. In practice, there are two principal options available: statically link the task messages into each task server, or dynamically distribute and link them into each task server. Please note that dynamic linking is not currently available to D on all platforms, so the default ''Tina'' implementation takes the former route for now – registering with a task server is a matter of an import and a method call.
117	116
118		== ~~Bulletin~~ ==
	117	=== Bulletin ===
119	118
120	119	A notification message style sent to all cluster participants, leveraging the most efficient underlying mechanisms available. These messages are limited in size (generally less than 1KB maximum), and are intended to be simple and lightweight in nature. The Tina QOS uses bulletins for cluster discovery, queue activity notification, cache coherence, and uses multicast as the distribution mechanism. When a bulletin is sent, ''all'' listeners on the same channel will receive it.
130	129	When a notification occurs, the arrival context and incoming message are made available via a parameter passed to the listener. In most notification cases, the arriving message is a single entity representing the notification itself. However, a queue notification will result in one or more queued messages being delivered.
131	130
132		= ~~Client Usage~~ =
	131	== Client Usage ==
133	132
134	133	In this section we’ll take a look at how to use the cluster features through code examples. The first step is to import an appropriate cluster. For these examples we’ll be using the Tina QOS provided, but for other implementations one would import the relevant package instead. Note that we’ll focus on the client side here, and the server side in a following section.
135	134
136		== ~~Cache Client~~ ==
	135	=== Cache Client ===
137	136
138	137	In this example we show how to use the cluster as a distributed cache. There are a number of operations available, though the general idea is illustrated here. Note that we pass the command-line arguments to the join() methods: this configures the cache with the full set of valid cache instances available. Unlike other facilities, cache instances are not self-discovering.
164	163	}}}
165	164
166		== ~~Bulletin Client~~ ==
	165	=== Bulletin Client ===
167	166
168	167	How to send and receive notifications across the cluster. These are send to every listener on the specific broadcast channel. Take note that we create a callback function and pass to the cluster as our bulletin consumer.
199	198
200	199
201		== ~~Queue Pull Client~~ ==
	200	=== Queue Pull Client ===
202	201
203	202	How to setup and use a queue in synchronous mode. We just place something into our queue and retrieve it:
224	223	}}}
225	224
226		== ~~Queue Push Client~~ ==
	225	=== Queue Push Client ===
227	226
228	227	Illustrates how to setup and use a Queue in asynchronous mode. We provide a listener delegate to the cluster, invoked when subscribed content arrives in a queue (from anywhere on the cluster).
263	262
264	263
265		== ~~Queue Reply Client~~ ==
	264	=== Queue Reply Client ===
266	265
267	266	In this variation we queue a message in the cluster, receive it via a listener, reply to that message on a different channel and, finally, receive the reply. There are two listeners in this example:
299	298	}}}
300	299
301		== ~~Task Client~~ ==
	300	=== Task Client ===
302	301
303	302	Cluster task execution generally comprises three participants. First we create the task itself, generally in a distinct module. In this case we're demonstrating the use of an ''expression'' task:
346	345	}}}
347	346
348		= ~~Tina~~ =
	347	== Tina ==
349	348
350	349	Tina is the default QOS implementation, providing three distinct servers for handling each or queue, cache, and task requests. Source code is provided in the form of a toolkit, and one is expected to configure each server to specific needs. However, there are also examples programs supplied, through which a working server can be constructed via a simple compilation. These examples are trivial front-ends to the server functionality, so there should be little difficulty in getting going. For example, here is ''qserver.d'' in full:
371	370	Each of these servers has a set of command-line options for configuring the amount of log data emitted and the server port number. If neither is specified, an appropriate default will be set. All cluster examples reside in the ''tango/example/cluster'' folder, and the modules therein are referred to by name in the following discussion.
372	371
373		== ~~Queue Server~~ ==
	372	=== Queue Server ===
374	373
375	374	The queue is straightforward to configure: compile the example module ''qserver.d'' and start it up. Each queue is written to a (distinct) file in the directory where the server is started from. This means that two queue-server instances cannot be started from the same directory, since the queue files are not shared. To instantiate multiple queue-servers on a single machine, start them from different directories.
376	375
377		== ~~Cache Server~~ ==
	376	=== Cache Server ===
378	377
379	378	The cache is also straightforward to configure: compile the example module ''cserver.d'' and start it up. When using Tina, cache clients require a set of ''server:port'' combinations in order to identify the set of valid cache servers. This is needed due to the nature of the distribution algorithm in use, which requires knowledge of all servers. If, for example, not all cache-instances were running when a client started, the cluster-wide cache would potentially be viewed differently by that particular client than another. Thus, when each cache-server is started, make a note of the port selected or configure it on a specific port. This list should be provided to the cache-clients when they are started.
387	386	In general, it would be considered good practice to isolate each task, or group of tasks, into distinct modules – if for no other reason that maintenance and ease of isolation.
388	387
389		== ~~Logging~~ ==
	388	=== Logging ===
390	389
391	390	The servers in Tina all use the Tango logging subsystem to report activity. By default the content is logged to the console only, but by adjusting the server configuration one can direct the log to various other targets, including files and so on. Each server is provided with a logger instance by the hosting application, and this is where such configuration should take place (adding an ''appender'', etc). Please see the documentation on logging for further details.
392	391
393		= ~~Tech Notes~~ =
	392	== Tech Notes ==
394	393
395	394	These are programming concerns which may help you get the most out of the cluster toolkit.
396	395
397		== ~~Threads~~ ==
	396	=== Threads ===
398	397
399	398	Cluster listeners are asynchronous by nature, being processed on a separate thread from the main program. When a bulletin notification arrives (''push''), a delegate provided by the client is invoked with sufficient information to retrieve the incoming message(s).
401	400	It is up to the client to ensure appropriate measures are taken to ensure correct action ensues when a notification arrives, given that it is inherently a multi-threaded application at that point. We will likely add a module to convert these asynchronous notifications into event, once the event-subsystem is put in place. In the latter case, all asynchronous notifications would effectively be converted into synchronous notification instead.
402	401
403		== ~~Message Slicing~~ ==
	402	=== Message Slicing ===
404	403
405	404	IO within Tina is multi-threaded. Rather than share a single set of IO buffers, each channel instance has its own set. This sidesteps any issues regarding thread-contention & synchronization, and enables Tina to avoid heap-allocation entirely for all network activity. This significantly reduces the memory footprint of your applications, avoids a common point of thread contention, removes clustering as a potential instigator of garbage collection, and generally limits the load placed upon the host computer.
409	408	This may becomes an issue where a client intends to store the message locally for a period of time, rather than process it immediately. The design trades-off a large savings in GC pressure for the potential of some message ''cloning'' as and when necessary – the act of copying an incoming message such that it is no longer considered transient. The message class has a clone() method specifically for this purpose, and it should be used accordingly.
410	409
411		== ~~Message Constraints~~ ==
	410	=== Message Constraints ===
412	411
413	412	In order to successfully send a message it should generally be self-contained. That is – wherever a message is re-instantiated, the representation of it should not require the influence of any third party - it should support what's known as a default-constructor.
419	418	Shipping and executing unregistered tasks on the cluster will result in a remote exception, returned to the caller. However we expect to add a facility to install and register tasks dynamically, subject to potential security concerns.
420	419
421		== ~~Registration and Hosting~~ ==
	420	=== Registration and Hosting ===
422	421
423	422	Upon receipt of each incoming message, a cluster client requires a class instance to ''host'' the content. In most cases, the host is selected from the message registry where all your application message types were previously enrolled. This is not required for task messages, since the outgoing message instance is used to host the result also. For other message types though, the host is required. Instead of depending upon the registry, an application may manually supply an appropriate host as part of a cluster request. This can be convenient in some advanced uses, especially where the channel name maps directly to a specific message type (a one-to-one mapping between the channel and a message class).
	423
	424	== Translations ==
	425
	426	* [http://joyfire.spaces.live.com/blog/cns!502060A314B1A145!1601.entry Chinese]
	427
	428	== User Comments ==
	429
	430	[[EmbedReplies(DocComments,ChapterClustering)]]

Wiki Navigation

Changes between Version 38 and Version 39 of ChapterClustering

Legend:

ChapterClustering