Job Distribution

Render jobs will be distributed in a round-based mode. Each round is called a hive cycle and lasts for one to a few minutes (not defined yet). This will give all nodes the time to form a consensus about who will render what part of which render job in a given cycle. There will be three HCS topics to coordinate the job distribution and to make sure all nodes have a common understanding about it:

the Hive Cycle Synchronization Topic
the Hive Cycle Application Topic
the Hive Cycle Validation Topic

The Hive Cycle Synchronization Topic will be used to synchronize all nodes on the render hive. The Hive Cycle Application Topic will gather the information about all render nodes that apply for rendering in the current hive cycle. The Hive Cycle Validation Topic will be used to vote for a specific job distribution result. All nodes are subscribed to both topics.

Node Synchronization

The major challenge of a round-based distribution is to keep all nodes in the render hive in synchronization. In principle, each node can locally keep track of when a hive cycle starts if it just knows the duration of a cycle and a given starting point. However, a significant issue would arise if the node's local clock time is off by a time period larger than a few seconds in comparison to the time of all other nodes. Therefore, the render hive needs a common reference time.

Luckily, the Hedera Hashgraph algorithm offers a fair timestamping for all transactions submitted to the Hedera network. The Renderhive project will utilize the consensus time of the Hedera network as a reference timeframe. That means, that each node will periodically send a transaction to the network to compare its local clock time with the consensus time of its transactions.

Since the render hive needs a reference point which can be used to calculate the start of the next cycle, there will be a transaction to the Hive Cycle Synchronization Topic as soon as the render hive is deployed on the Hedera main net and ready to start its work. The memo field of the message will contain the configuration of the hive cycles (e.g., duration in seconds). The consensus timestamp of this transaction will be used by each node to calculate the start of the next hive cycle based on the synchronized time and the defined hive cycle duration. Whenever needed, the hive cycle configuration can be changed by sending a new message with a different configuration to the Hive Cycle Synchronization topic. Renderhive nodes will only look at the most recent message in the topic.

The Hive Cycle Synchronization Topic will have a SubmitKey so that the configuration cannot be changed by everyone. As soon as a decentralized governance model was established, this SubmitKey will be be adjusted accordingly or a new topic will be created, so that only a joint group of people can change the duration.

Hive Cycles

Each hive cycle consists of several phases which each node needs to perform in order to come to the correct render job distribution.

Application phase

At the beginning of each hive cycle, render nodes who want to participate in this specific cycle submit the CID of their current render offer document to the Hive Cycle Application Topic. That way, the network knows they are online and want to render with the conditions defined in their specific render offer document. This phase has a duration defined in the Hive Cycle Synchronization topic and all CIDs that arrive after the phase has ended, are ignored.

Update phase

Each render node checks whether it has the render offer document of each node that applied for rendering in this round stored locally. If this is not the case, the node downloads the render offer document from the IPFS. This ensures that exchange of data is minimized between nodes. Furthermore, nodes that just came online are automatically detected and nodes that went offline are automatically ignored in this round, since they did not submit an application.

Subsequently, the render node scans the Job Queue Topic for pending render jobs, downloads the render request document of each job, and updates the local state of the job queue.

Distribution Phase

After the update phase, all render nodes should have the same information about all render offers and render requests. Therefore, each node can run the same job distribution algorithm locally, which will consecutively distribute the available render power on the pending render jobs.

Each node loops through the job queue starting at the earliest submitted job and for each job it creates a list of render nodes, whose render offers would fit to the render request document. This list is then sorted according to the distribution strategy defined in the client node's render request document (e.g., by price, by render time, etc.). Starting from the top, each render node in the list will be assigned to this job until the conditions defined in the render request document are satisfied. If there are not enough nodes to fulfill the render request at this point, the job is delayed to the next hive cycle where it will take part in the job distribution again.

For the distribution of next render job, the render node removes the already assigned render nodes from the list of available render nodes and only uses the remaining render nodes for further distribution. This process is repeated, until no further render job can be assigned to a render node.

Render Contract Phase

Each render node will create a new JSON file for each render request that was successfully distributed in this round. These documents – the render contracts – will follow a strictly defined schema and contain all information important for processing the render request (e.g., list of assigned render nodes, their cumulative render power, the CID of the render request document, the CIDs of the render offers, etc.).

// JSON schema for the render contracts
render_contract = {
    ID: int              // a unique ID of the render contract
    render_nodes: [      // list of assigned render nodes
        {
        nodeID: int
        },
        {
        nodeID: int
        },
        ...
    ]    
    ...
}

Each node will then check, if the document CID already exists on IPFS, which only would be the case if another render node arrived at the same conclusion about the job distribution. If it does not exist yet, it will pin the render contract on IPFS / Filecoin. Furthermore, each render node will make a list of CIDs of all render contracts it created, write it into a file, and pin this file on IPFS / Filecoin as well. The CID of this file is a single hash, which represents all render contracts of this hive cycle. All render nodes that followed the rules and had the complete information about render offers as well as render requests will then automatically get the exact same file. This file is called the meta contract and it can be used to check for consensus. Each node will check, if the file already exists on IPFS and will pin it on IPFS if this is not the case. The node will then sent the CID of its meta contract to the Hive Cycle Validation Topic.

Validation Phase

After the successful distribution of all render jobs, all nodes technically know which render job they should execute. They also know, which other nodes should work on the same job. To validate that they arrived at the correct job distribution, all nodes will listen for incoming messages on the Hive Cycle Validation Topic and will check if the message came from a node that applied for this hive cycle and is a registered node. The latter is checked by looking in a local database and querying the smart contract if the node is not (yet) in the local database. If the message came from a valid node, the CID contained in this message is added to the local dictionary of node account address/CID pairs. The node will repeat this process, until all nodes of this hive cycle sent their vote or until a specific timeout is reached.

Each node will then take the median of the ordered list of CIDs it generated and obtain the CID of the meta contract for this hive cycle. All nodes that created a different file which has not the consensus of the hive, will discard their own meta contract and render contract documents and download the consensus file and the render contract document it was assigned to by the hive.

A node may fail to come to the correct conclusion about the render job distribution locally because of a multitude of reasons. All of which would cause the render jobs not to be executed in the requested way, because at least one node failed in the job distribution or stopped collaborating. This might be due to, e.g.:

A render node was unable to download all render offer or render request documents
A render node went offline or has a bad connection to the network
A render node ignored the queue order or other distribution rules
The client node did not deposit enough HBAR to pay for the render work and power

Claiming Phase

Finally, the smart contract needs to be notified about which node was assigned to which job. Therefore, each render node will call the smart contract to claim the render job it was assigned to according to consensus. The smart contract will always check, if a node is already assigned to another job. If that is the case, it will reject any node's attempts to claim another job. Furthermore, the client node of this job and all other render nodes also assigned to a render job will have to verify that a specific render node had the right to claim this render job according to the render hive consensus during payment. If a render node claimed the wrong job, it will not receive the reward for this specific job. The client node will still receive the render result or get its money back, if no correct render result was delivered. In cases where a render node claimed a wrong job and delivered a render result, however, the payment for this job will be automatically transferred to the render hive funding account or back to the user (not decided yet).

This mechanism is required, because the smart contract cannot access the meta contract file directly and needs to rely on the information provided by the nodes. Technically, nodes could lie about the jobs they were assigned to. Consequently, the outlined mechanism strongly disincentivizes nodes to deviate from the render hive consensus about the job distribution and sanctions nodes that conduct such misbehavior.

At a later point, the Renderhive project might introduce a rating mechanism where notes that came to the wrong conclusion will get downvoted. After a specific amount of downvotes, node owners would be notified about the issue in order to solve it. A consequence of not solving the issue could finally be the exclusion from the render hive since in that case the node can be expected to have a manipulated version of the Renderhive Service App or severe technical issues.

PreviousJob Queue NextJob Execution

Last updated 1 year ago