Clustering with Linux

This topic was published by and viewed 2947 times since "". The last page revision was "".

Viewing 1 post (of 1 total)
  • Author

  • DevynCJohnson
    • Topics - 437
    • @devyncjohnson

    Sometimes, businesses and people may need extra computing power. Thankfully, computers can combine their resources and act as one system. This single entity is called a cluster and the act of making a cluster is called clustering. In a cluster, computers are connected together on a Local Area Network (LAN). Each computer is called a node and each node acts as a server. The server must have an operating system running on it. Linux is one of many operating systems that supports clustering.

    Each node is not required to be physically equal. In other words, some of the nodes can have a 2GHz processor and 3GB of RAM while other nodes can have 3.5GHz and 4GB of memory. The nodes can even be different brands of computers (Dell, Toshiba, ThinkPenguin, etc.).

    The most simple type of cluster is a beowulf cluster. Such a cluster has nodes connected together on a LAN. All of the nodes work together as a single machine as opposed to a "Cluster of Workstations" (COW cluster) where the nodes work together, but not as a single machine. However, in a Beowulf cluster, each node has its own operating system, so it may appear to users that the machines are not working together.

    To setup a Linux MPICH1 Beowulf cluster, obtain some computers and connect them together on the same network. Make a note as to which computer will be what node. The best hardware should be node0. Remember the node numbers that are assigned to each computer. Next, select a Linux distro and install that distro on each computer. After the install, follow the below steps on each computer.

    Create a user on each node. Be sure to use the same username so that the user is the same across all of the nodes.

    Add the IP address of each node to /etc/hosts. Remember to assign the same IP to the same node on each computer's /etc/hosts file. For instance, in all of the /etc/hosts files, node0 would have the IP address of

    Now, login as the user that was created on all of the nodes. Once logged in, open a terminal and type ssh-keygen -t dsa. If the cluster is not connected to the Internet and security is not a concern, then the generated SSH key can lack a password. When done, open ~/.ssh/ and copy the public key. Next, in the same directory, create a file called "authorized_keys" and place the public key inside.

    Afterwards, download MPICH1 ( and uncompress it. Open a terminal in the MPICH1 source code folder (the folder that was in the compressed file). Once inside, run the following commands which will compile and install MPICH1. GCC may need to be installed prior to running the commands.

    mkdir ~/mpich1
    ./configure --prefix=~/mpich1
    make install

    NOTE: Remember to follow these steps on all of the nodes.

    After the installation, open ~/.bashrc and place the below code inside. If the file does not exist, then create it and insert the below code.

    export PATH=~/mpich1/bin:$PATH
    export PATH
    export LD_LIBRARY_PATH

    Next, with Root privileges, execute sudo echo ~/mpich1/bin >> /etc/environment. When finished, logout and then log back in so that the BASHRC and environment scripts take effect.

    MPICH1 needs to be configured. To do so, find the file called "machines.LINUX". It will be in ~/mpich1/share/ or ~/mpich1/util/machines/. In the file, list all of the hostnames of the nodes. Place each hostname on its own line. Also, do not add the hostname of the node that owns the file. For instance, on node3, the file should not contain node3's hostname. In the same file, after each hostname type a space followed by a colon and then another space. After this, type the number of cores owned by that node. For illustration, if node1 is a quad-core, then the line would look like "node1 : 4".

    The cluster is now complete. To take advantage of the clustering, open a terminal and type mpirun -np # PROGRAM where "#" is the number of processes/threads to create and "PROGRAM" is the program or script to run on the cluster.

    Further Reading

Viewing 1 post (of 1 total)