This section is dedicated to the task of deploying an IPFS Cluster and running it in a stable fashion. It describes:
Make sure you are familiar with the Configuration section first.
This subsection provides different strategies to deploy an IPFS Cluster.
If you have some hosts and would like to run a stable deployment if IPFS Cluster on them, you can use these Ansible roles. They provide:
go-ipfsand IPFS Cluster binary distributions.
If you want to run the
/ipfs/ipfs-cluster Docker container, you will need to pay attention to several considerations:
go-ipfs. You should run the IPFS daemon separetly, for example, using the
/data/ipfs-clusterfolder to provide a custom, working configuration, as well as persistency for the cluster data. This is usually achieved by passing
When starting the
ipfs-cluster Docker container, if no
/data/ipfs-cluster/service.json file can be found, the default entrypoint script will:
api/restapi/http_listen_multiaddresswill be set to use
ipfs_connector/ipfshttp/proxy_listen_multiaddresswill be set to use
ipfs_connector/ipfshttp/node_multiaddresswill be set to the value of the
--net=host, you will need to set
$IPFS_APIor make sure the configuration has the correct
Make sure you read the Configuration documentation for more information on how to configure IPFS Cluster.
docker run ipfs/ipfs-cluster. By default it runs with
We would be very grateful if you have used different methods to deploy IPFS Cluster (docker, kubernetes, puppet etc.) and share your know-how. Let us know in the website repository.
This subsection provides useful information for running
IPFS Cluster in a stable production environment.
The configuration file contains a few options which should be tweaked according to your environment, capacity and requirements:
monitor.*.check_interval. This dictactes how long cluster takes to realize a peer is not responding (and trigger re-pins). Re-pinning might be a very expensive in your cluster. Thus, you may want to set this a bit high (several minutes). You can use same value for both.
cluster.disable_repinningto true if you don’t wish repinnings to be triggered at all on peer downtime.
raft.wait_for_leader_timeoutto something that gives ample time for all your peers to be restarted and come online without . Usually
raft.commit_retry_delay. Note: more retries and higher delays imply slower failures.
leader_lease_timeout, although defaults are quite big already. For low-latency clusters, these can all be decreased (at least by half).
snapshot_interval. If your cluster performs many operations, increase
api.restapinetwork timeouts depending on your API usage. Note that usually there are client-side timeouts too.
ipfs_connector.ipfshttpnetwork timeouts if you are using the ipfs proxy.
refs, but make sure auto-GC is not enabled in
go-ipfs(this is the default)
pin_tracker.maptracker.max_pin_queue_size. This is the number of things that can be queued for pinning at a given moment.
pin_tracker.maptracker.concurrent_pins. The value depends on how many things you would like to have ipfs download at the same time.
15should be ok.
informer.disk.metric_ttl. Depending on the size of your ipfs datastore. It is good to set it to
5mand more for large repos. If using
-1for replication factor, set to a very high number, since the informers are not used in that case.
ipfs init --profile=serveror
ipfs config profile apply serverif the configuration already exists.
[BACKUP ~/.ipfs] $ ipfs config profile apply badgerds # or ipfs init --profile=server,badgerds $ ipfs-ds-convert convert # Make sure you have enough disk space for the conversion. $ ipfs-ds-convert cleanup # removes the backup data
Make sure you have enough space for the conversion.
Swarm.ConnMgr.Highwater(maximum number of connections) and reduce
Datastore.BloomFilterSizeaccording to your repo size (in bytes):
1048576(1MB) is a good value (more info here)
Datastore.StorageMaxto a value according to the disk you want to dedicate for the ipfs repo.
IPFS_FD_MAXenvironment variable controls the FD
go-ipfssets for itself. Depending on your
Highwatervalue, you may want to increase it to
For easy and quick upgrades, make sure your system starts and restarts IPFS Cluster and
go-ipfs peers as follows:
ipfs-cluster-service daemon --ugprade
ipfs daemon --migrate
This subsection explains how to modify the cluster’s peerset. The peerset is maintained by the
consensus implementation, so instructions are specific to the implementation used. Right now, only the
raft implementation is available.
Raft is our default consensus implementation. It provides high availability, protection against network splits and fast state convergence. It is appropiate for small sized clusters (what small means is to be determined, but probably < 20 peers) running in trusted environments.
The downside is that Raft requires strict procedures when updating the cluster peerset in order to assure consistency and correct operations of the consensus. In fact, updating the peerset is a commit operation in Raft, meaning that it always needs a functioning leader (and thus, the majority of peers in the peerset need to be online for it to take effect).
Adding peers should always be performed by bootstrapping as explained here.
Removing a peer is a final operation for that peer. That means, that peer cannot (should not) be started again unless its Raft state is cleaned up (
ipfs-cluster-service state clean).
Removing peers can be done using
ipfs-cluster-ctl which calls the
DELETE /peers/<id> API endpoint:
ipfs-cluster-ctl peers rm <peerID>
A peer ID looks like
QmQHKLBXfS7hf8o2acj7FGADoJDLat3UazucbHrgxqisim. Removing a peer has the following effects:
peersconfiguration value will be cleared to avoid accidental restarts to mess with the existing cluster.
peersconfiguration value for the rest of peers.
peers rmalso works with offline peers. Offline peers should not be restarted after being removed.
IPFS Cluster includes a monitoring component which gathers metrics and triggers alerts when a metric is no longer renewed. There are currently two types of metrics:
informermetrics are used to decide on allocations when a pin request arrives. Different “informers” can be configured. The default is the
diskinformer, which extracts
repo statinformation from IPFS and sends a freespace metric.
pingmetric is used to regularly signal that a peer is alive.
Every metric carries a Time-To-Live associated with it. This TTL can be configued in the
informer configuration section. The
ping metric TTL is determined by the
cluster.monitoring_ping_interval, and is equal to 2x its value.
Every IPFS Cluster peer broadcasts metrics regularly to all other peers. This happens TTL/2 intervals for the informer metrics and in
cluster.monitoring_ping_interval for the ping metric.
When a metric for an existing cluster peer stops arriving and previous metrics have outlived their Time-To-Live, the monitoring component triggers an alert for that metric.
monbasic.check_interval determines how often the monitoring component checks for expired TTLs and sends these alerts. If you wish to detect expired metrics more quickly, decrease this interval. Otherwise, increase it.
The IPFS Cluster peer will react to ping metrics alerts by searching for pins allocated to the alerting peer and triggering re-pinning requests for them, unless the
cluster.disable_repinning option is
true. These re-pinning requests may result in re-allocations if the the CID’s allocation factor crosses the
replication_factor_min boundary. Otherwise, the current allocations are maintained.
The monitoring and failover system in cluster is very basic and requires improvements. Failover is likely to not work properly when several nodes go offline at once (specially if the current Leader is affected). Manual re-pinning can be triggered with
ipfs-cluster-ctl pin <cid>.
ipfs-cluster-ctl pin ls <CID> can be used to inspect the current list of peers allocated to a CID.
Backups are never a bad thing. This subsection explains what IPFS Cluster does to make sure your pinset is not lost in a disaster event, and what further measures you can take.
When we speak of backups, we are normally referring to the
~/.ipfs-cluster/raft folder (state folder), which effectively contains the cluster’s pinset and other consensus-specific information.
When a peer is removed from the cluster, or when the user runs
ipfs-cluster-service state clean, the state folder is not removed. Instead, it is renamed to
raft.old.X, with the newest copy being
raft.old.0. The number of copies kept around is configurable (
On the other side,
raft additionally takes regular snapshots of the pinset (which means it is fully persisted to disk). This is also performed on a clean shutdown of the peers.
When the peer is not running, the last persisted state can be manually exported with:
ipfs-cluster-service state export
This will output the pinset, which can be in turn re-imported to a peer with:
ipfs-cluster-service state import
import can be used to salvage a state in the case of a disaster event, when peers in the cluster are offline, or not enough peers can be started to reach a quorum (when using
raft). In this case, we recommend importing the state on a new, clean, single-peer, cluster, and bootstrapping the rest of the cluster to it manually.