Skip to content

General Management

Introduction

Validator performance is pivotal in maintaining the security and stability of the Polkadot network. As a validator, optimizing your setup ensures efficient transaction processing, minimizes latency, and maintains system reliability during high-demand periods. Proper configuration and proactive monitoring also help mitigate risks like slashing and service interruptions.

This guide covers essential practices for managing a validator, including performance tuning techniques, security hardening, and tools for real-time monitoring. Whether you're fine-tuning CPU settings, configuring NUMA balancing, or setting up a robust alert system, these steps will help you build a resilient and efficient validator operation.

Configuration Optimization

For those seeking to optimize their validator's performance, the following configurations can improve responsiveness, reduce latency, and ensure consistent performance during high-demand periods.

Deactivate Simultaneous Multithreading

Polkadot validators operate primarily in single-threaded mode for critical paths, meaning optimizing for single-core CPU performance can reduce latency and improve stability. Deactivating simultaneous multithreading (SMT) can prevent virtual cores from affecting performance. SMT implementation is called Hyper-Threading on Intel and 2-way SMT on AMD Zen. The following will deactivate every other (vCPU) core:

for cpunum in $(cat /sys/devices/system/cpu/cpu*/topology/thread_siblings_list | cut -s -d, -f2- | tr ',' '\n' | sort -un)
do
  echo 0 > /sys/devices/system/cpu/cpu$cpunum/online
done

To save the changes permanently, add nosmt=force as kernel parameter. Edit /etc/default/grub and add nosmt=force to GRUB_CMDLINE_LINUX_DEFAULT variable as follows:

sudo nano /etc/default/grub
# Add to GRUB_CMDLINE_LINUX_DEFAULT
/etc/default/grub
GRUB_HIDDEN_TIMEOUT = 0;
GRUB_HIDDEN_TIMEOUT_QUIET = true;
GRUB_TIMEOUT = 10;
GRUB_DISTRIBUTOR = `lsb_release -i -s 2> /dev/null || echo Debian`;
GRUB_CMDLINE_LINUX_DEFAULT = 'nosmt=force';
GRUB_CMDLINE_LINUX = '';

After updating the variable, be sure to update GRUB to apply changes:

sudo update-grub

After the reboot, you should see that half of the cores are offline. To confirm, run:

lscpu --extended

Deactivate Automatic NUMA Balancing

Deactivating NUMA (Non-Uniform Memory Access) balancing for multi-CPU setups helps keep processes on the same CPU node, minimizing latency. Run the following command to deactivate NUMA balancing in runtime:

sysctl kernel.numa_balancing=0

To deactivate NUMA balancing permanently, add numa_balancing=disable to GRUB settings:

sudo nano /etc/default/grub
# Add to GRUB_CMDLINE_LINUX_DEFAULT
/etc/default/grub
GRUB_DEFAULT = 0;
GRUB_HIDDEN_TIMEOUT = 0;
GRUB_HIDDEN_TIMEOUT_QUIET = true;
GRUB_TIMEOUT = 10;
GRUB_DISTRIBUTOR = `lsb_release -i -s 2> /dev/null || echo Debian`;
GRUB_CMDLINE_LINUX_DEFAULT = 'numa_balancing=disable';
GRUB_CMDLINE_LINUX = '';

After updating the variable, be sure to update GRUB to apply changes:

sudo update-grub

Confirm the deactivation by running the following command:

sysctl -a | grep 'kernel.numa_balancing'

If you successfully deactivated NUMA balancing, the preceding command should return 0.

Spectre and Meltdown Mitigations

Spectre and Meltdown are well-known vulnerabilities in modern CPUs that exploit speculative execution to access sensitive data. These vulnerabilities have been patched in recent Linux kernels, but the mitigations can slightly impact performance, especially in high-throughput or containerized environments.

If your security needs allow it, you may selectively deactivate specific mitigations for performance gains. The Spectre V2 and Speculative Store Bypass Disable (SSBD) for Spectre V4 apply to speculative execution and are particularly impactful in containerized environments. Deactivating them can help regain performance if your environment doesn't require these security layers.

To selectively deactivate the Spectre mitigations, update the GRUB_CMDLINE_LINUX_DEFAULT variable in your /etc/default/grub configuration:

sudo nano /etc/default/grub
# Add to GRUB_CMDLINE_LINUX_DEFAULT
/etc/default/grub
GRUB_DEFAULT = 0;
GRUB_HIDDEN_TIMEOUT = 0;
GRUB_HIDDEN_TIMEOUT_QUIET = true;
GRUB_TIMEOUT = 10;
GRUB_DISTRIBUTOR = `lsb_release -i -s 2> /dev/null || echo Debian`;
GRUB_CMDLINE_LINUX_DEFAULT =
  'spec_store_bypass_disable=prctl spectre_v2_user=prctl';

After updating the variable, be sure to update GRUB to apply changes and then reboot:

sudo update-grub
sudo reboot

This approach selectively deactivates the Spectre V2 and Spectre V4 mitigations, leaving other protections intact. For full security, keep mitigations activated unless there's a significant performance need, as disabling them could expose the system to potential attacks on affected CPUs.

Monitor Your Node

Monitoring your node's performance is critical to maintaining network reliability and security. Tools like Prometheus and Grafana provide insights into block height, peer connections, CPU and memory usage, and more. This section walks through setting up these tools and configuring alerts to notify you of potential issues.

Prepare Environment

Before installing Prometheus, it's important to set up the environment securely to ensure Prometheus runs with restricted user privileges. You can set up Prometheus securely as follows:

  1. Create a Prometheus user - ensure Prometheus runs with minimal permissions
    sudo useradd --no-create-home --shell /usr/sbin/nologin prometheus
    
  2. Set up directories - create directories for configuration and data storage
    sudo mkdir /etc/prometheus
    sudo mkdir /var/lib/prometheus
    
  3. Change directory ownership - ensure Prometheus has access
    sudo chown -R prometheus:prometheus /etc/prometheus
    sudo chown -R prometheus:prometheus /var/lib/prometheus
    

Install and Configure Prometheus

After preparing the environment; install and configure the latest version of Prometheus as follows:

  1. Download Prometheus - obtain the respective release binary for your system architecture from the Prometheus releases page. Replace the placeholder text with the respective release binary, e.g. https://github.com/prometheus/prometheus/releases/download/v3.0.0/prometheus-3.0.0.linux-amd64.tar.gz
    sudo apt-get update && sudo apt-get upgrade
    wget INSERT_RELEASE_DOWNLOAD_LINK
    tar xfz prometheus-*.tar.gz
    cd prometheus-3.0.0.linux-amd64
    
  2. Set up Prometheus - copy binaries and directories, assign ownership of these files to the prometheus user, and clean up download directory as follows:

    sudo cp ./prometheus /usr/local/bin/
    sudo cp ./promtool /usr/local/bin/
    sudo cp ./prometheus /usr/local/bin/
    
    sudo cp -r ./consoles /etc/prometheus
    sudo cp -r ./console_libraries /etc/prometheus
    sudo chown -R prometheus:prometheus /etc/prometheus/consoles
    sudo chown -R prometheus:prometheus /etc/prometheus/console_libraries
    
    cd .. && rm -r prometheus*
    
  3. Create prometheus.yml for configuration - run this command to define global settings, rule files, and scrape targets:

    sudo nano /etc/prometheus/prometheus.yml
    
    Prometheus is scraped every 5 seconds in this example configuration file, ensuring detailed internal metrics. Node metrics with customizable intervals are scraped from port 9615 by default.
    prometheus-config.yml
    global:
      scrape_interval: 15s
      evaluation_interval: 15s
    
    rule_files:
      # - "first.rules"
      # - "second.rules"
    
    scrape_configs:
      - job_name: 'prometheus'
        scrape_interval: 5s
        static_configs:
          - targets: ['localhost:9090']
      - job_name: 'substrate_node'
        scrape_interval: 5s
        static_configs:
          - targets: ['localhost:9615']
    

  4. Validate configuration with promtool - use the open source monitoring system to check your configuration

    promtool check config /etc/prometheus/prometheus.yml
    

  5. Assign ownership - save the configuration file and change the ownership of the file to prometheus user
    sudo chown prometheus:prometheus /etc/prometheus/prometheus.yml
    

Start Prometheus

  1. Launch Prometheus - use the following command to launch Prometheus with a given configuration, set the storage location for metric data, and enable web console templates and libraries:

    sudo -u prometheus /usr/local/bin/prometheus --config.file /etc/prometheus/prometheus.yml --storage.tsdb.path /var/lib/prometheus/ --web.console.templates=/etc/prometheus/consoles --web.console.libraries=/etc/prometheus/console_libraries
    

    If you set the server up properly, you should see terminal output similar to the following:

  2. Verify access - verify you can access the Prometheus interface by visiting the following address:

    http://SERVER_IP_ADDRESS:9090/graph
    

    If the interface appears to work as expected, exit the process using Control + C.

  3. Create new systemd service file - this will automatically start the server during the boot process

    sudo nano /etc/systemd/system/prometheus.service
    
    Add the following code to the service file:

    prometheus.service
    [Unit]
    Description=Prometheus Monitoring
    Wants=network-online.target
    After=network-online.target
    
    [Service]
    User=prometheus
    Group=prometheus
    Type=simple
    ExecStart=/usr/local/bin/prometheus \
     --config.file /etc/prometheus/prometheus.yml \
     --storage.tsdb.path /var/lib/prometheus/ \
     --web.console.templates=/etc/prometheus/consoles \
     --web.console.libraries=/etc/prometheus/console_libraries
    ExecReload=/bin/kill -HUP $MAINPID
    
    [Install]
    WantedBy=multi-user.target
    
    Once you save the file, execute the following command to reload systemd and enable the service so that it will load automatically during the operating system's startup:

    sudo systemctl daemon-reload && sudo systemctl enable prometheus && sudo systemctl start prometheus
    
    4. Verify service - return to the Prometheus interface at the following address to verify the service is running:
    http://SERVER_IP_ADDRESS:9090/
    

Install and Configure Grafana

Grafana provides a powerful, customizable interface to visualize metrics collected by Prometheus. This guide follows Grafana's canonical installation instructions. To install and configure Grafana, follow these steps:

  1. Install Grafana prerequisites - run the following commands to install the required packages:

    sudo apt-get install -y apt-transport-https software-properties-common wget    
    

  2. Import the GPG key:

    sudo mkdir -p /etc/apt/keyrings/
    wget -q -O - https://apt.grafana.com/gpg.key | gpg --dearmor | sudo tee /etc/apt/keyrings/grafana.gpg > /dev/null
    

  3. Configure the stable release repo and update packages:

    echo "deb [signed-by=/etc/apt/keyrings/grafana.gpg] https://apt.grafana.com stable main" | sudo tee -a /etc/apt/sources.list.d/grafana.list
    sudo apt-get update
    

  4. Install the latest stable version of Grafana:

    sudo apt-get install grafana
    

After installing Grafana, you can move on to the configuration steps:

  1. Set Grafana to auto-start - configure Grafana to start automatically on system boot and start the service

    sudo systemctl daemon-reload
    sudo systemctl enable grafana-server.service
    sudo systemctl start grafana-server
    

  2. Verify the Grafana service is running** with the following command:

    sudo systemctl status grafana-server
    
    If necessary, you can stop or restart the service with the following commands:

    sudo systemctl stop grafana-server
    sudo systemctl restart grafana-server
    
  3. Access Grafana - open your browser, navigate to the following address, and use the default user and password admin to login:

    http://SERVER_IP_ADDRESS:3000/login
    

Change default port

If you want run Grafana on another port, edit the file /usr/share/grafana/conf/defaults.ini with a command like:

sudo vim /usr/share/grafana/conf/defaults.ini 
You can change the http_port value as desired. Then restart Grafana with:
sudo systemctl restart grafana-server

Grafana login screen

Follow these steps to visualize node metrics:

  1. Select the gear icon for settings to configure the Data Sources
  2. Select Add data source to define the data source Select Prometheus
  3. Select Prometheus Save and test
  4. Enter http://localhost:9090 in the URL field, then select Save & Test. If you see the message "Data source is working" your connection is configured correctly Import dashboard
  5. Next, select Import from the menu bar on the left, select Prometheus in the dropdown list and select Import
  6. Finally, start your Polkadot node by running ./polkadot. You should now be able to monitor your node's performance such as the current block height, network traffic, and running tasks on the Grafana dashboard Live dashboard

Import via grafana.com

The Grafana dashboards page features user created dashboards made available for public use. Visit "Substrate Node Metrics" for an example of available dashboards.

Install and Configure Alertmanager

The optional Alertmanager complements Prometheus by handling alerts and notifying users of potential issues. Follow these steps to install and configure Alertmanager:

  1. Download extract Alertmanager - download the latest version from the Prometheus Alertmanager releases page. Replace the placeholder text with the respective release binary, e.g. https://github.com/prometheus/alertmanager/releases/download/v0.28.0-rc.0/alertmanager-0.28.0-rc.0.linux-amd64.tar.gz
    wget INSERT_RELEASE_DOWNLOAD_LINK
    tar -xvzf alertmanager*
    
  2. Move binaries and set permissions - copy the binaries to a system directory and set appropriate permissions
    cd alertmanager-0.28.0-rc.0.linux-amd64
    sudo cp ./alertmanager /usr/local/bin/
    sudo cp ./amtool /usr/local/bin/
    sudo chown prometheus:prometheus /usr/local/bin/alertmanager
    sudo chown prometheus:prometheus /usr/local/bin/amtool
    
  3. Create configuration file - create a new alertmanager.yml file under /etc/alertmanager

    sudo mkdir /etc/alertmanager
    sudo nano /etc/alertmanager/alertmanager.yml
    
    Add the following code to the configuration file to define email notifications:
    alertmanager.yml
    global:
      resolve_timeout: 1m
    
    route:
      receiver: 'gmail-notifications'
    
    receivers:
      - name: 'gmail-notifications'
        email_configs:
          - to: INSERT_YOUR_EMAIL
            from: INSERT_YOUR_EMAIL
            smarthost: smtp.gmail.com:587
            auth_username: INSERT_YOUR_EMAIL
            auth_identity: INSERT_YOUR_EMAIL
            auth_password: INSERT_YOUR_APP_PASSWORD
            send_resolved: true
    

    App password

    You must generate an app password in your Gmail account to allow Alertmanager to send you alert notification emails.

    Ensure the configuration file has the correct permissions:

    sudo chown -R prometheus:prometheus /etc/alertmanager
    
    4. Configure as a service - set up Alertmanager to run as a service by creating a systemd service file
    sudo nano /etc/systemd/system/alertmanager.service
    
    Add the following code to the service file:
    alertmanager.service
    [Unit]
    Description=AlertManager Server Service
    Wants=network-online.target
    After=network-online.target
    
    [Service]
    User=root
    Group=root
    Type=simple
    ExecStart=/usr/local/bin/alertmanager --config.file /etc/alertmanager/alertmanager.yml --web.external-url=http://SERVER_IP:9093 --cluster.advertise-address='0.0.0.0:9093'
    
    [Install]
    WantedBy=multi-user.target
    
    Reload and enable the service
    sudo systemctl daemon-reload
    sudo systemctl enable alertmanager
    sudo systemctl start alertmanager
    
    Verify the service status using the following command:
    sudo systemctl status alertmanager
    
    If you have configured the Alertmanager properly, the Active field should display active (running) similar to below:

    sudo systemctl status alertmanager alertmanager.service - AlertManager Server Service Loaded: loaded (/etc/systemd/system/alertmanager.service; enabled; vendor preset: enabled) Active: active (running) since Thu 2020-08-20 22:01:21 CEST; 3 days ago Main PID: 20592 (alertmanager) Tasks: 70 (limit: 9830) CGroup: /system.slice/alertmanager.service

Grafana Plugin

There is an Alertmanager plugin in Grafana that can help you monitor alert information. Follow these steps to use the plugin:

  1. Install the plugin - use the following command:
    sudo grafana-cli plugins install camptocamp-prometheus-alertmanager-datasource
    
  2. Restart Grafana
    sudo systemctl restart grafana-server
    
  3. Configure datasource - go to your Grafana dashboard SERVER_IP:3000 and configure the Alertmanager datasource as follows:
    • Go to Configuration -> Data Sources, and search for Prometheus Alertmanager
    • Fill in the URL to your server location followed by the port number used in the Alertmanager. Select Save & Test to test the connection
  4. To monitor the alerts, import the 8010 dashboard, which is used for Alertmanager. Make sure to select the Prometheus Alertmanager in the last column then select Import

Integrate Alertmanager

A few more steps are required to allow the Prometheus server to talk to the Alertmanager and to configure rules for detection and alerts. Complete the integration as follows:

  1. Update configuration - update the configuration file in the etc/prometheus/prometheus.yml to add the following code:
    prometheus.yml
    rule_files:
      - 'rules.yml'
    
    alerting:
      alertmanagers:
        - static_configs:
            - targets:
                - localhost:9093
    
  2. Create rules file - here you will define the rules for detection and alerts Run the following command to create the rules file:
    sudo nano /etc/prometheus/rules.yml
    
    If any of the conditions defined in the rules file are met, an alert will be triggered. The following sample rule checks for the node being down and triggers an email notification if an outage of more than five minutes is detected:
    rules.yml
    groups:
      - name: alert_rules
        rules:
          - alert: InstanceDown
            expr: up == 0
            for: 5m
            labels:
              severity: critical
            annotations:
              summary: 'Instance [{{ $labels.instance }}] down'
              description: '[{{ $labels.instance }}] of job [{{ $labels.job }}] has been down for more than 5 minutes.'
    
    See Alerting Rules and additional alerts in the Prometheus documentation to learn more about defining and using alerting rules.
  3. Update ownership of rules file - ensure user prometheus has access by running:
    sudo chown prometheus:prometheus rules.yml
    
  4. Check rules - ensure the rules defined in rules.yml are syntactically correct by running the following command:
    sudo -u prometheus promtool check rules rules.yml
    
  5. Restart Alertmanager
    sudo systemctl restart prometheus && sudo systemctl restart alertmanager
    

Now you will receive an email alert if one of your rule triggering conditions is met.

Updated prometheus.yml
global:
  scrape_interval: 15s
  evaluation_interval: 15s

rule_files:
  - 'rules.yml'

alerting:
  alertmanagers:
    - static_configs:
        - targets:
            - localhost:9093

scrape_configs:
  - job_name: 'prometheus'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9090']
  - job_name: 'substrate_node'
    scrape_interval: 5s
    static_configs:
      - targets: ['localhost:9615']

Secure Your Validator

Validators in Polkadot's Proof of Stake network play a critical role in maintaining network integrity and security by keeping the network in consensus and verifying state transitions. To ensure optimal performance and minimize risks, validators must adhere to strict guidelines around security and reliable operations.

Key Management

Though they don't transfer funds, session keys are essential for validators as they sign messages related to consensus and parachains. Securing session keys is crucial as allowing them to be exploited or used across multiple nodes can lead to a loss of staked funds via slashing.

Given the current limitations in high-availability setups and the risks associated with double-signing, it’s recommended to run only a single validator instance. Keys should be securely managed, and processes automated to minimize human error.

There are two approaches for generating session keys:

  1. Generate and store in node - using the author.rotateKeys RPC call. For most users, generating keys directly within the client is recommended. You must submit a session certificate from your staking proxy to register new keys. See the How to Validate guide for instructions on setting keys

  2. Generate outside node and insert - using the author.setKeys RPC call. This flexibility accommodates advanced security setups and should only be used by experienced validator operators

Signing Outside the Client

Polkadot plans to support external signing, allowing session keys to reside in secure environments like Hardware Security Modules (HSMs). However, these modules can sign any payload they receive, potentially enabling an attacker to perform slashable actions.

Secure-Validator Mode

Polkadot's Secure-Validator mode offers an extra layer of protection through strict filesystem, networking, and process sandboxing. This secure mode is activated by default if the machine meets the following requirements:

  1. Linux (x86-64 architecture) - usually Intel or AMD
  2. Enabled seccomp - this kernel feature facilitates a more secure approach for process management on Linux. Verify by running:
    cat /boot/config-`uname -r` | grep CONFIG_SECCOMP=
    
    If seccomp is enabled, you should see output similar to the following:
    CONFIG_SECCOMP=y
    

Note

Optionally, Linux 5.13 may also be used, as it provides access to even more strict filesystem protections.

Linux Best Practices

Follow these best practices to keep your validator secure:

  • Use a non-root user for all operations
  • Regularly apply OS security patches
  • Enable and configure a firewall
  • Use key-based SSH authentication; deactivate password-based login
  • Regularly back up data and harden your SSH configuration. Visit this SSH guide for more details

Validator Best Practices

Additional best practices can add an additional layer of security and operational reliability:

  • Only run the Polkadot binary, and only listen on the configured p2p port
  • Run on bare-metal machines, as opposed to virtual machines
  • Provisioning of the validator machine should be automated and defined in code which is kept in private version control, reviewed, audited, and tested
  • Generate and provide session keys in a secure way
  • Start Polkadot at boot and restart if stopped for any reason
  • Run Polkadot as a non-root user
  • Establish and maintain an on-call rotation for managing alerts
  • Establish and maintain a clear protocol with actions to perform for each level of each alert with an escalation policy

Additional Resources

For additional guidance, connect with other validators and the Polkadot engineering team in the Polkadot Validator Lounge on Element.

Last update: November 22, 2024
| Created: October 16, 2024