Updated Design: msg.parts description of usage (markdown)

2017-09-13 15:00:41 +09:00 · 2017-09-13 15:00:41 +09:00 · 3378ec20c8
parent 7c68a9bf4b
commit 3378ec20c8
1 changed files with 77 additions and 19 deletions
--- a/Design:-msg.parts---description-of-usage.md
+++ b/Design:-msg.parts---description-of-usage.md
@ -5,23 +5,23 @@ It has been pointed out that there are many clever things that could be done in
 ### Existing `msg.parts` object
 The following sub-properties exists

-name | type | mandatory | description
------------ | ------------- | ------------ | ------------
-type | string | yes | Either  `string`, `buffer`, `array`, or  `object`
-id | string | yes | unique id for this set of parts (for example based on original _msgid), to allow for interleaved messages to be rejoined.
-index | number | yes | an index of the number of parts starting from 0
-count | number | no | total number of parts in the set. Typically known for simple splits, unknown for most streams, though a count may be added to files once reading is complete.
-ch | string/array | no | the data used to split/join the message. Either a String, for example `"\n"`, or an Array of bytes: `[1,2,3]`, `["0x0a","0x0d"]`. (Only present if the message was split using this value; not present for fixed-length splits).
-key | string | no | the key used as the property name within the joined object. (Only present if object was split)
-len | number | no | split length of array, string or buffer. (Only present if array, string or buffer was split into fixed lengths) 
-parts | object | no | push down stack of previous parts objects to allow nested splits to be handled.
+| name  | type         | mandatory | description                              |
+| ----- | ------------ | --------- | ---------------------------------------- |
+| type  | string       | yes       | Either  `string`, `buffer`, `array`, or  `object` |
+| id    | string       | yes       | unique id for this set of parts (for example based on original _msgid), to allow for interleaved messages to be rejoined. |
+| index | number       | yes       | an index of the number of parts starting from 0 |
+| count | number       | no        | total number of parts in the set. Typically known for simple splits, unknown for most streams, though a count may be added to files once reading is complete. |
+| ch    | string/array | no        | the data used to split/join the message. Either a String, for example `"\n"`, or an Array of bytes: `[1,2,3]`, `["0x0a","0x0d"]`. (Only present if the message was split using this value; not present for fixed-length splits). |
+| key   | string       | no        | the key used as the property name within the joined object. (Only present if object was split) |
+| len   | number       | no        | split length of array, string or buffer. (Only present if array, string or buffer was split into fixed lengths) |
+| parts | object       | no        | push down stack of previous parts objects to allow nested splits to be handled. |


 #### Current Join behaviour
 The msg is currently considered complete if

 -  `msg.parts.index` >= `msg.parts.count` 
-  - however if msg.parts.count == 0 or is undefined then no output is sent.
+  -  however if msg.parts.count == 0 or is undefined then no output is sent.
 -  `msg.complete` exists (is set to any value)
 -  after an optional timeout from the first part to arrive 

@ -33,16 +33,16 @@ Having popped any pushed msg.parts.parts back up to msg.parts the completed `msg
 What the current behaviour allows is that the join can be manipulated by changing msg.parts.count (or msg.complete) on any of the intermediate parts. For example the join can be ended early by saying it is now complete - or extra items inserted by just adding to the the count so the existing join doesn't complete.

 However - while this makes sense for strings and objects - may need some thinking about for consequences for arrays as currently we try to preserve order in the array. So what does removing or inserting mean in this context - or rather what should the behaviour actually be.
- 
+
 ### Proposal: optional list-like representation 
 The `msg.parts` is adequate tool for doing list-like operations with a Node-RED flow.  For example, filtering messages in a message stream or inserting messages to a message stream is useful for handling messages with some relation.  However, changing message counts of a message stream needs sotring whole messages in the stream then changing `msg.parts.count` and `msg.parts.index`.  For short streams, this should never be a problem.  However, for long streams keeping and changing all messages in a message stream may raise problems such as increased memory usage or delay in messages.  

 So as a optional representation of `msg.parts`,
 we propose following linked list representation:

-name | type | mandatory | description
------------ | ------------- | ------------ | ------------
-next | string | yes | `msg.id` of next message
+| name | type   | mandatory | description              |
+| ---- | ------ | --------- | ------------------------ |
+| next | string | yes       | `msg.id` of next message |

 Either `parts.count` and `parts.index`, or `parts.next` can be used at the same time.
 Insertion or deletion of messages with list-like representation can be implemented by keeping smaller number of messages.
@ -51,8 +51,8 @@ Selection of old array-like representation or list-like representation should be

 **DCJ comment** - looking at the list of proposed nodes - none currently would need this idea in order to be implemented successfully. Typically nodes will have to work in one of two ways.

- 1. message sequence - one at a time. As each message arrives it is processed - the parts can then be manipulated - eg insert extra by incrementing index and count, etc. or
- 2. message group - chunk at a time. The whole group needs to have arrived before being sorted / sliced etc - at which point the parts can be re-generated before sending on the burst.
+    1. message sequence - one at a time. As each message arrives it is processed - the parts can then be manipulated - eg insert extra by incrementing index and count, etc. or
+    2. message group - chunk at a time. The whole group needs to have arrived before being sorted / sliced etc - at which point the parts can be re-generated before sending on the burst.

 ---

@ -71,8 +71,11 @@ Messages grouping mode can be selected from:

 What happens when no messages arrive in "interval" mode ? Options:

- 1. send a blank msg, with the `parts` specified correctly as count of zero.
- 2. don't send anything.
+    1. send a blank msg, with the `parts` specified correctly as count of zero.
+    2. don't send anything.
+
+   - (HN)  (1) seems better because it is convenient if there is a triggering message for processing messages over a cetetain period.  As you pointed out offline, the functionality of chunk node is similar to split node.  Integrating chunk functionality to split node seems to be prefered.
+

 ### Filter node

@ -89,12 +92,19 @@ If the condition evaluates to true for an input message, the message is send to
 **Behaviour/Implementation thoughts**

 - Should it have two outputs ? one for things that match and one for things that don't ? Would save having two nodes if both types of message are required. (DCJ - I think yes)
+   - (HN) Having more than outputs is good idea.  It may be better to be able to specify multiple filter outputs and corresponding outputs similar to switch node.
 - Is javascript the right expression language to use (given we already have jsonata etc - is another one a good idea ?)
+    - (HN) Considering use of jsonata in core nodes, I think using JSONata as expression language is better.
 - Tail mode - is end only triggered by comparison to `count` ? (DCJ - I think yes.)
+    - (HN) In current prototype implementation, yes.  
 - Head mode - if n messages don't arrive for that group - drop any currently accumulated ? send part msg ? eg if we ask for 10 - and a group of 5 arrive... does it get sent ? or dropped ? (DCJ - I think send)
+    - (HN) Current implementation waits forever the arrival of messages.  But, sending message is better as you pointed out.  Whether to set the number of messages to be send to 5 or 10 (5 with empty payload?) seems to have room for discussion. 10 seems to be better for me because we can expect fixed size of sequence for output of filter node. 
 - Apart from head and tail modes - this is similar to the switch node - should that be enhanced to handle parts instead ? (DCJ - I think no - but ?)
+    - (HN) If we allow multiple outputs as pointed by 1st bullet, this node becomes very similar to the switch node.  So, I think enhancing the switch to handle `parts` property is better.
 - In non head and tail mode this node could work on messages without `parts` also. Should that be supported ? (DCJ - I think yes)
+    - (HN) I also think yes.  
 - If any filter removes the "last" message in an incoming group - then no final msg with a "correct" `parts` will be left to be sent (ie `count` won't be set) - so we will need to send a blank msg with `parts.count` set correctly in this case. (as well as maybe sending to the second output if that port exists) 
+    - (HN) Similar situation also happens on addition/removal of *any* message in the incoming group.  But, the filter node can not distinguish between message delay and deletion without correct `parts`.  (misunderstanding the comment?)

 ### Sort node

@ -110,8 +120,11 @@ Sort key can be `payload` or any JavaScript expression for sorting messages, ele
 **Behaviour/Implementation**

 - If "last" message doesn't arrive - just drop the group (when next group start to arrive) ?
+    - (HN) Depending on the flow, 1st message of next group may arraive before last message of previous group.  So, we have to wait the "last" message.
 - In order to sort correctly the whole sequence of messages needs to have arrived - so could (in theory) be a very large memory overhead, plus then once sorted all messages will be resent in correct order as a large burst. Is this acceptable ? The new parts properties can be regenerated just prior to sending.
+    - (HN) Yes.  Memory overhead and burst message sending may be problems of sort node (and also other node that need to fix `parts` property).  I have no good idea on this except to mention in info tab of nodes.
 - should the expression be javascript ? or jsonata ? or ...
+    - (HN) Similar to filter node, I think using JSONata is better. 

 ### Merge node

@ -128,8 +141,11 @@ This is merging messages without using `parts` - and creates a msg without `part
 **Implementation**

 - What happens if msg arrive out of order ? 
+    - (HN) Current prototype merges messages in order of their arrival.  Considering `parts` property and creating output message from corresponding messages seems to be better.
 - or too many of one topic arrive before the correct number of another ? drop first ? drop last ?
+     - (HN) Waits for arrival of required messages.  Other messages are kept internally until ready.
 - Should this be a mode of operation of the current join node ?
+     - (HN) I agree with this idea.

 ### Map/Reduce node

@ -144,4 +160,46 @@ After applying reduce expression to the message sequence, expression specified i
 **Behaviour/Implementation**

 - should the expressions be javascript or jsonata ?
+    - (HN) Similar to filter node and sort node, I think using JSONata is better.  Like Map/Reduce node, we need to add special variable to access `msg`,   `msg.payload`, number of msgs in a sequence, index of msg in a sequence, and accumulated value (represented by _, $, $N, $I, and $$, respectively in prototype implementation).
 - is the Map only mode - similar to change node ?
+    - (HN) Yes.  Its functionality is the same.  So, I think integrating reduce mode to change node is better idea.
+
+
+
+## Summary of Updated Proposal(HN)
+
+### Switch node
+
+ - enhance to consider `parts` property and to produce message sequence as filter node.
+ - allow head and tail as routing rule for message sequences.
+ - make head applicable to *Streaming mode* of split node (i.e. without `parts.count`).
+ - make whether or not to consider `parts` property selectable by option of switch node.  (add third rule *checking all rules with message sequence*)
+
+### Split node
+
+  - enhance to allow new mode for creating sequences from input messages as chunk node.
+  - as an extension to current chunk node implementation, add functionality to create sequences from String, Buffer, Array or Object as they are supported by split node.
+
+### Join node
+
+  - enhance to allow new mode for merging messages by topics as merge node.
+  - add new functionality to concatenate input sequences using topic value.
+    ```
+    example: Topic1:[1, 2] + Topic2:[A, B, C] => Topic1+Topic2:[1, 2, A, B, C]
+    ```
+  - messages are processed in order if they have `parts` property.
+
+### Change node
+
+  - enhance to support reduce functionality of Map/Reduce node.
+  - use JSONata as expression language for reduce and fixup.
+  - reduce and fixup expression is only applied to message sequence (msg having `parts` property). Other messages are processed the same as current change node.
+  - for JSONata expression of reduce mode add special variable $I(index of message), $N(number of messages in a sequence), $A(accumulated value).
+
+### Sort node
+
+ - use JSONata for key expression.
+
+### csv node and html node
+
+  - enhance to support `parts` property.