Platform Cluster Manager version 2.0.1 DellTM
Edition (PCM 2.0.1) User Guide


Version 2.0.1 DellTM Edition

July 2010

Platform Computing

Contents

[ Top ]


About Platform Cluster Manager

Building a Linux® cluster is a challenging and time-consuming task. There are many tools in the community and on the Internet for building, configuring, and managing Linux clusters. However, these tools typically assume a familiarity with Linux cluster concepts.  

Platform Cluster Manager (previously known as OCS - Open Cluster Stack), is a hardware vendor certified, software stack that enables the consistent deployment and management of scale-out Linux clusters.  Backed by global 24x7 enterprise support, Platform Cluster Manager is a modular and hybrid stack that transparently integrates open source and commercial software into a single consistent cluster operating environment.

Platform Cluster Manager version 2.0.1 DellTM Edition ("PCM 2.0.1") is fully supported by Platform Computing Corporation and requires a Red Hat®, CentOS®, or SuSE Enterprise Linux Server® based operating system.


Important: PCM 2.0.1 provides full cross distro cluster building support (with HPC and OFED kits) for RHEL 5.4 and SLES 10 SP3 OS platforms.

Installation

See the "PCM Installation Guide" document for details on how to install PCM 2.0.1.

Upgradeability

See the "Upgrading PCM 1.2 to PCM 2.0.1" document for details on how to upgrade from PCM 1.2 Standard Edition to PCM 2.0.1.

Registration and Licensing

PCM 2.0.1 is based on Project Kusu is available on GPL V2 license. PCM 2.0.1 does not require a software license to function.

Users who have purchased support entitlement for Platform Cluster Manager, are entitled to download PCM 2.0.1 and find associated documentation at http://my.platform.com.

This includes software, patch, documentation downloads as well as access to support knowledgebase and eSupport (electronic support ticket submission and status).

The following Platform Computing software products packaged as PCM 2.0.1 kits require a software license:

  • Platform LSF 7 UP 6 (Kit)
  • Platform RTM v2.0.1 (Kit) - license is required to activate the LSF-pollers for monitoring of Platform LSF
  • Platform HPC Portal (integrated in the Platform Console v2.0 Kit) - licensed required to activate Platform HPC Portal capabilities
  • Platform MPI v7.1 (Kit) - license is required to activate the LSF-pollers for monitoring of Platform LSF

Where to get Platform Cluster Manager

You can download PCM 2.0.1 and find associated documentation at

http://my.platform.com/products/platform-cm.

If you intend to install other third-party kits, obtain the CD or DVD containing those kits.

To purchase support for PCM 2.0.1, contact Platform Computing.


Basic administration

The following topics describe basic tasks and provide useful information for administering your PCM cluster:

Access online documentation

Online documentation is provided online by the installer node. To access the documentation, open a browser on the installer node. It defaults to the Cluster page, which contains links to the following guides:

Kit-specific documentation is available from the Installed Kits link. Follow the corresponding documentation links beside the kit.

View Nagios information

Nagios is an NMS (Network Management Server) that monitors hosts and services you specify. It alerts you if problems occur within customizable thresholds  you specify.

Nagios runs on the installer node and provides a web GUI to display the information collected from the nodes. Each compute node contains a daemon called NRPE that the primary install node calls to collect data. This data is displayed in the Nagios web GUI showing very detailed statistical data as well as graphs.  When compute nodes are added to a PCM cluster, Nagios monitoring is installed and configured on the nodes and the PCM 2.0.1 installs node re-configured to detect the new nodes.

To view Nagios information, go to the main cluster webpage, click the "Nagios(R) Network Management System Monitoring" link, or point your browser to https://localhost/nagios (on the installer node). Nagios can also be accessed from the Platform Management Console (PMC). Refer to the Nagios kit documentation for details on viewing Nagios pages on the PMC.

Disable SSH forwarding

By default, the OpenSSH daemon is configured to enable X11 forwarding. This can sometimes slow down node connections. You can disable forwarding by using the -x option when connecting to a node to skip X11 forwarding.

This can also be disabled permanently by editing the /etc/ssh/ssh_config file and changing the line ForwardX11 Yes and setting this to No.

An SSH connection from one node to another may be slow in setting up. This is usually because of a name resolution failure, and subsequent timeout. This can occur if the installer node was installed with an invalid DNS server.


Note: This will also slow MPI jobs.

Use PDSH to run parallel commands across nodes in cluster

PCM 2.0.1 comes with Parallel Distributed Shell (pdsh) installed and configured for operation in the cluster. As new compute nodes are added or removed from the cluster, PCM automatically generates the required configuration files for pdsh to ensure that parallel commands can be issued across all nodes in the cluster. By default, pdsh is configured to use ssh as the underlying shell mechanism for connecting to nodes. The PCM genconfig tool dynamically creates the list of nodes in the cluster and saves the list in /etc/hosts.pdsh. This file can be used to run parallel commands across all nodes in the cluster. For example:  

Running a command on a node in the cluster

# /usr/bin/pdsh -w compute-00-00 uname -a

Running a command on all nodes in the cluster (in this example there is only one node in the cluster)

# /usr/bin/pdsh -a uname -a

Optionally, you can specify a static host list instead of using the dynamic host list. This means pdsh commands affect selected nodes instead of all the nodes in the cluster. To specify the nodes, make a copy of hosts.pdsh, modify it, and save under any name. To make the change take effect and use your modified file in place of hosts.pdsh, you must export the path to the modified file. Run:

# export WCOLL=path_to_modified_hosts_file

Note: Running pdsh sometimes produces an ssh exit message (for example, "ssh exited with exit code xx"). This simply tells you that command did not finish normally. When a command finishes on UNIX systems, it returns an exit status - normally this status is 0. A status of 0 means that the command finished and exited normally. When a command does not finish normally, it exits with a non-zero value which pdsh shows you.

Add or remove kits

PCM 2.0.1 provides kits with a mechanism for packaging applications and installation scripts together for easy installation onto a PCM 2.0.1 cluster.

A kit is a defined collection of components and packages/package groups that is manifested as an ISO image or a CD/DVD media. A PCM 2.0.1 kit contains the following:

  • A kit info defining kit information (such as kit version, release number, noarch, etc) and components.
  • A meta-rpm containing the documentation for the kit and default node group associations.
  • Component rpms containing a list of rpm dependencies and installation/removal scripts for a component
  • rpms - containing the actual software.

Kits can contain multiple components and components have a list of dependencies on other components or rpms. The kit mechanism provides PCM 2.0.1 with the ability to install and automatically configure applications or tools onto the entire cluster.

  • Kits collect components into a simple package and components add a extra layer of dependency abstraction to the standard rpm mechanisms.
  • Kits use the native package management system to resolve dependencies. For Red Hat based operating systems, PCM 2.0.1 relies on yum to resolve all dependencies in a kit. For SLES operating systems, PCM 2.0.1 relies on autoyast to resolve all dependencies in a kit.

Add kits using CLI or One Step Install

Install a kit by using the kusu-kitops or kusu-kit-install on the command-line interface (CLI), or by using the One Step Install feature of the Platform Management Console (PMC). The kusu-kit-install is a kit installation and deployment tool that assumes reasonable defaults to seamlessly integrate kusu-kitops, kusu-repoman, kusu-cfmsync, and kusu-ngedit functions for easy deployment. The steps on using kusu-kit-install on the CLI and how to use "One Step Install" on the PMC are detailed in the "Install a kit section" of each of the kit's documentation.


Important: In this release, the cluster management tools had been renamed (i.e., kitops to kusu-kitops). The old command names still exist but will be deprecated in future releases.


#  [root@master ~] # kusu-kitops -h

Add kits using the "kusu-kitops" command

The PCM 2.0.1 kusu-kitops command adds kits to a PCM 2.0.1 system. Everything in PCM 2.0.1 is a kit including operating systems of Red Hat Enterprise Linux (RHEL) and SuSE Linux Enterprise Server (SLES).  The base kit and operating system kit are added during installation. These kits were used to create the repository and build the install node. In order to add more kits for operating systems or applications, use the kusu-kitops command.

#  [root@master ~] # kusu-kitops -h
 
usage: kusu-kitops - kit operations tool options: -h, --help show this help message and exit -a, --add add the kit specified with -m -l, --list list all the kits available on the system -e, --remove remove the specified kit -m MEDIA specify the media iso or mountpoint for the kit -k KIT, --kit=KIT kit -i KID, --kid=KID kid -o KITVERSION, --kitversion=KITVERSION kitversion -c KITARCH, --kitarch=KITARCH kitarch --dbdriver=DBDRIVER Database driver (sqlite, mysql, postgres) --dbdatabase=DBDATABASE Database --dbuser=DBUSER Database username --dbpassword=DBPASSWORD Database password -y, --yes Assume the answer to any question is Yes -v, --version Display version

Add an operating system kit

Adding a new operating system to PCM 2.0.1 is very simple. All you need is an ISO or disk containing the operating system you want to add. When PCM 2.0.1 detects a new operating system media, it automatically creates an OS kit from the media and installs it on the cluster. For example:

Add an OS kit to a PCM 2.0.1 cluster

Insert OS kit into CD/DVD then run the following command:

#  kusu-kitops -a 

Add a tool or application kit

The kusu-kitops command can also add pre-packaged tools and applications to the cluster. PCM 2.0.1 comes with applications which have been packaged as kits. See the latest Release Notes document for details.

Adding an application kit to the cluster is very similar to adding an operating system kit. For example:

#  kusu-kitops -a -m kit-ofed-1.4.1-14x86_64.iso
Added Kit ofed-1.4.1-noarch

Add a single kit or many kits in a meta-kit

kusu-kitops -a
[0]: base-2.0-noarch
[1]: hpc-2.0-x86_64
[2]: ofed-1.4.1-noarch
[3]: cuda-2.3-noarch
Provide a comma separated list of kits to install,
'all' to install all kits or ENTER to quit::

Add kit to a repository

To add kit to a repository on the installer node, use the kusu-repoman command.

1. Determine which repository, on the installer node, you wan to add the kit. To do this, list the repositories currently in the system:

#  kusu-repoman -l 

2. Add the kit to the desired repository:

kusu-repoman –r  <repository name> –a –i <kit_id>

3. Refresh the repository:

#  kusu-repoman -r <repository name> -u

Remove a kit

Kits can be removed from PCM 2.0.1, however if a kit is in use by a repository or node group PCM 2.0.1 will  not remove it from the cluster.

To remove a kit from PCM 2.0.1, run the following command:

#  kusu-kitops -e -i <kit_id>
+--------+---------+--------------+
| Kit    | Version | Architecture |
+--------+---------+--------------+
| ofed   | 1.4.1   | noarch       |
+--------+---------+--------------+
The above kits will be removed.

Confirm [y/N]:y 

Understand kits, repositories and node groups

After adding kits to a PCM 2.0.1 installer node, you need to run the kusu-repoman command to make the kit available to the repository. If a new repository is created and new node groups are associated with the repository then you will need to add all the kits to the new repository and then associate the components in the kits with the appropriate node groups. Refer to Creating a New Repository and Associating kit components to node groups for more information.

Disable a kit and components

To disable a kit or to prevent components in a kit from being installed on a compute node, follow these steps:

  1. Run kusu-ngedit as root.

  2. Select the node group (or the node group you wish to edit) and select Next.

  3. Select Next until you arrive at the Components screen.

  4. Expand the component you wish to remove using the spacebar.

  5. Clear the component selection field with the space bar.

  6. Proceed through the rest of the kusu-ngedit screens by selecting Next.

    Your changes are confirmed and applied.

  7. The kit is now disabled from the nodes in the node group.

Add or remove users

PCM 2.0.1 is configured by default to share all of the user names and passwords defined on the installer node across all nodes and node groups in the cluster. Use the standard Linux command line or GUI tools to add a user to PCM 2.0.1. Once the user is added, the PCM 2.0.1 cfm tool (Configuration File Manager) must be called to synchronize the user names and passwords across the cluster.

Add a user from the command line


Important: To add a user on RHEL, use the adduser or useradd commands. On SLES, you can only use the useradd command


Remove a user from the command line

To remove a user, run the following command:

# userdel <user_name>

# kusu-cfmsync -f

Understand firewalls/iptables

The installer node is configured with some basic forwarding rules. From a network security standpoint, the install and compute nodes are not secure. Evaluate the security risks at your site and create appropriate firewall rules to secure the cluster.


Warning: The installer node should never be connected to the Internet without first restricting the type of packets allowed by customizing the iptables rules.

The installer node is configured with network address translation (NAT), allowing compute nodes access to the public network. By default, nodes on the public network do not have a route to the provision network.

HTTP and HPPTS are enabled by default. To change this and keep services visible only to the private network, edit /etc/sysconfig/iptables.

On SLES, configure firewall by using yast2 interactively or by running:

# yast2 firewall services add service=https zone=EXT
  
# yast2 firewall services add service=http zone=EXT

Disable HTTP and HTTPS over the public network

  1. In the /etc/sysconfig/iptables file, find and comment the following lines:
  2.  -A INPUT -i ethl -m state --state NEW -p tcp --dport 80 -j ACCEPT
     
    -A INPUT -i ethl -m state --state NEW -p tcp --dport 443 -j ACCEPT
  3. Then, restart iptables:
  4. # service iptables restart

On SLES, run the following commands:

# yast2 firewall services remove service=http zone=EXT
  
# yast2 firewall services remove service=https zone=EXT

You may need to restart the firewall using the command SuseFirewall2

For details on customizing your firewall, see http://www.netfilter.org.

Understand PCM 2.0.1 services and utilities

cfm

Platform HPC provides a service called cfm. This is very similar to NIS. It is used to synchronize files across a cluster. This is done via multicasting a notification of change from the installer node, then having the nodes download the file over an encrypted channel. Users and groups are one example of information passed over cfm.  Cfm can also synchronize yum repositories in the cluster.   Each node in the cluster has a yum repository and when notified via kusu-cfmsync the nodes will automatically update from the installer node repository using the httpd server.  Whenever you run useradd or userdel, kusu-cfmsync must be run to  update the user information on all nodes in the cluster.

By default the following files are propagated throughout all node groups in the cluster by cfm:

DHCP and TFTP

PCM 2.0.1 uses DHCP and the TFTP services to handle installation/reinstallation of nodes. The services are automatically configured when running the kusu-addhost tool. Use kusu-addhost to configure the DHCP settings for each node.

pdsh

This shell provides the ability to execute commands on a cluster wide basis. For example:

#  WCOLL=/etc/hosts.pdsh;export WCOLL
pdsh uname
compute-00-00-eth0: Linux
compute-00-01-eth0: Linux
compute-00-02-eth0: Linux

Find log files

PCM 2.0.1 generates the following logs:

[ Top ]


Advanced administration

The following topics describe advanced tasks when administrating your PCM 2.0.1:

Manage node groups

PCM 2.0.1 is built around the concept of node groups. Node groups are powerful template mechanisms that allow the cluster administrator to define common shared characteristics among a group of nodes. PCM 2.0.1 ships with a default set of node groups for, installer nodes and packaged installed compute nodes. The default node groups can be modified or new node groups can be created from the default node groups. All of the nodes in a node group share the following:

A typical HPC cluster is created from a single installer node and many compute nodes. Normally compute nodes are exactly the same as each other with just a few exceptions, like the node name or other host specific configuration files. A node group for compute nodes makes it easy to configure and manage 1 or 100 nodes all from the same node group. The kusu-ngedit command is a graphical TUI (Text User Interface) run by the root user to create, delete and modify node groups. The kusu-ngedit tool modifies cluster information in the Postgres database and also automatically calls other tools and plugins to perform actions or update configuration files automatically. For example, modifying the set of packages associated with a node group in kusu-ngedit automatically calls cfm (configuration file manager) to synchronize all of the nodes in the cluster using yum to add and remove the new packages, while modifying the partitioning on the node group notifies the administrator that a re-install must be performed on all nodes in the cluster in order to change the partitioning on all nodes. The Postgres database keeps track of the node group state, thus several changes can be made to a node group simultaneously and the physical nodes in the group can be updated immediately or at a future time and date using the kusu-cfmsync command.

Add custom script

  1. Log into the installer node as root.
  2. Prepare a shell script.
    Example:
      [root@master ~]# cat /root/custom_test.sh
  3.   #!/bin/sh
      echo "/root/custom_test.sh" >> /tmp/aaa.txt
  4. Open a Terminal and run the Node Group Editor to select the node group you want to change.
    #  kusu-ngedit
  5. Figure: Node Group Edit screen

  6. Navigate to the Custom Scripts panel to input the script path in New Script text input field.

    Figure: Custom Scripts path

  7. Press Add button.

    The newly added script is listed in the panel.

  8. Press Next button.

    Notification appears on the Summary of Changes panel.

    Figure: Summary of Changes

  9. Follow the wizard to complete modifying the node.
  10. Run ‘boothost -r’ to reinstall the nodes in the node group.
  11. The custom script is added into /etc/rc.kusu.d on the node. It is called by kusurc each time when node reboots.

Reinitialize a partition table

A known issue in Red Hat is that Anaconda raises an error when a system with one or more uninitialized disks is provisioned using a kickstart file. The kickstart installation halts and shows an interactive message when provisioning a RHEL compute node in a new disk or zero out partition. When you have a corrupted partition table, a workaround is to reinitialize the partition on the system by using RHEL install disk, in rescue mode (boot: linux_rescue), to successfully proceed installing the compute nodes with PCM 2.0.1. For example:

    1. To reinitialize a partition table, run:

     fdisk dev/sda 

    2. Type O to create a new empty DOS partition table.

    3. Press W to write.

    The partition table is reinitialized. Reboot and proceed with the compute node installation.

    Create a new node group

    1. Open a Terminal and run the Node Group Editor as root.
      #  kusu-ngedit
    2. From the list, select an existing node group that is similar to what you want to create (the new node group is only initially based on the selected node group; you will later edit it, as required).


    3. Scroll to the bottom of the Node Group Editor page and then select Copy.


    4. Edit the newly created node group as desired. For example, you may want to rename the node group or change the format that the machine uses when naming new nodes based on this group.


    5. Tips:
      N= Node number (automatic)
      NN= Double node-numbering used (for example, 01, 02, 03, etc.)
      R= Rack number
      RR= Double rack-numbering used

    6. Make any other required changes, or leave the default settings as-is.

    Add RPM packages in RHEL to node groups

    1. Open a Terminal and run the Node Group Editor as root:
      #  kusu-ngedit
    2. Select a node group and move through the Text User Interface screens by pressing F8 or by selecting Next on the screen. Stop at the Optional Packages screen.

      Figure: Optional Packages screen

    3. Add additional rpm packages by choosing the package in the tree list.

    4. Press the space bar to expand or contract the list to display all of the available packages. By default, packages are sorted alphabetically. To re-sort the list of packages by groups, select Toggle View. Choose additional packages using the spacebar. When a package is selected, an asterisk displays beside the package name.

      Package dependencies are automatically handled by yum. If a selected package requires other packages, they are automatically included when the package is installed on the cluster nodes. Ngedit automatically calls cfm to synchronize the nodes and to install the new package--this synchronizes but does not automatically remove packages from nodes in the cluster (this is by design). If required, pdsh and rpm can be used to completely remove packages from the rpm database on each node in the cluster.

    Add RPM packages not in OS to node groups

    Linux OS vendors maintain a repository containing all of the RPM packages that ship with OS distributions (RHEL and SLES). For most customers, this repository is sufficient.

    For RPM packages that are not in OS distributions, these can also be added to PCM repository by placing the RPM packages into the appropriate /depot/contrib/<repository_ID> directory.

      1. Get the repository name and ID of the RPM packages to be added into:

      # kusu-repoman -l

      2. Place RPM packages to /depot/contrib/<repository_ID> directory.

      Example:

      # cp foo.rpm /depot/contrib/1000

      3. Rebuild the repository:

      # kusu-repoman -ur rhel-5.5-x86_64 

      4. Run kusu-ngedit and navigate to the Optional Packages screen.

      5. Select the new package by navigating within the package tree and using the spacebar to select.

      6. Continue through the kusu-ngedit screens and either allow kusu-ngedit to synchronize the nodes immediately or perform the node synchronization manually with kusu-cfmsync at a later time.

    Figure: Example--Selecting an rpm that is not included in Red Hat Enterprise Linux

    Add SLES repository to the installer node

    Adding other operating systems such as SLES requires a few steps. In order to add SLES to the installer node, you need a copy of the SLES media or a SLES iso.

    1. Add SLES using the kusu-kitops command, mounted on /media/CDROM :
    2. # kusu-kitops -a -m /media/CDROM/ --kit=sles

      Adding a kit makes the software available for use in a repository.

    3. Create a SLES repository.
    4. # kusu-repoman –n –r sles10.3
    5. Add the required operating system kit to the repository.
    6.  # kusu-repoman –a –r sles10.3 –kit=sles
    7. Add the base kit to the repository.The base kit contains all of the tools required for managing the cluster.
    8. # kusu-repoman –a –r sles10.3 –kit=base

      The operating system and base kits are always required in a repository. At this point the repository can be used to installer nodes or you can add more kits to the repository.

    9. Rebuild the repository with the new operating system and base kit.
    10. # kusu-repoman –u –r sles10.3

    Congratulations, you have added a new repository to your cluster. View the available repositories with the following command:

    # kusu-repoman –l

    Associate a repository with node groups

    A single installer node can contain more than one operating system repository. Adding a new operating system such as SLES involves several steps:

    1. Add SLES operating system CDs/DVD/iso as a kit with this command: kusu-kitops.

    2. Create a new repository for SLES with this command: kusu-repoman -n.

    3. Add the SLES kit to the new repository with this command: kusu-repoman -a .

    4. Add the base kit to the repository with this command: kusu-repoman -a .

    5. Update the repository with this command: kusu-repoman -u.

      This assembles all of the kits into a complete repository.

    6. Once steps 1-5 are completed, the new repository can be added to node groups with the kusu-ngedit tool.

    7. Run kusu-ngedit from a terminal, and create a copy of an existing node group. In our example we will copy the rhel-compute node group.
    8. Figure: Node Group Editor screen

    9. Edit the newly created node group. Then, on the Repository screen, change the repository to SLES (or your snapshot repository).
    10. Figure: Repository screen

    By changing to your new repository, you have effectively added this new node group to your new repository. Continue moving through the rest of the kusu-ngedit screens, selecting or modifying settings as needed. Upon exit, kusu-ngedit automatically updates the database.

    Add kit components to node groups

    Adding kit components to nodes in a node group is very similar to adding additional rpm packages.

    1. Open a terminal and start the kusu-ngedit tool.

    2. Select the compute-rhel node group, press F8 or select Next and then proceed to the Components screen.
    3. Each kit installs an application or a set of applications. The kit also contains components that are meta-rpm packages, designed for installing and configuring applications onto a cluster. By choosing the appropriate components it is easy to configure all nodes in a node group.

      Add hosts to a node group

      Once an install node is configured and all of the necessary kits are installed on it, you can then add nodes to the cluster. The install node runs dhcpd and is configured to respond to PXE requests on one or more '''provision''' networks. The quickest way to add hosts is to physically connect them to the same '''provision''' network, run the '''kusu-addhost''' tool and then PXE-boot the nodes. The following steps detail this procedure:

      1. Physically connect the new nodes to the same '''provision''' network as the install node.

      2. Log on to the install node as root and run the following command:
        # kusu-addhost 
      3. Choose a PCM 2.0.1 node group.

        A node group is a template that defines how a group of nodes will be configured and what software will be installed on the node. When an installer node is created default node groups are built from the operating system supplied during the install node install. Different node groups include the following:
        • install nodeCompute node Compute node, imaged Compute node, diskless
        • Unmanaged

        If this is your first PCM 2.0.1 installation, choose the compute node node group. Choosing this node group performs a standard package-based installation onto a new host. Although it is the most reliable method for installation, it is also the slowest method. Once a node group is chosen, select Next to proceed.

      4. Choose which network to listen on for new hosts. In most cases there is only one network, but for complex clusters there may be more than one network interface. Choose a network and proceed to the next screen.

      5. Once kusu-addhost is listening on a network, boot the new node. Ensure the new node properly PXE boots.

      6. Go to the new node and start the boot process.

        If your node is not configured to PXE boot, change the boot order in the BIOS or press F12 while the node is booting to enter PXE boot. If everything is connected properly (that is, if the new node is physically on the same network as the install node and kusu-addhost is listening on that network), you should see the new node download an initrd (initial ram disk) and start a full operating system install. The kusu-addhost will report that it has detected the new node and the install is proceeding.

      The install on a 100 MB network should take no more than 5 minutes. Once the node installation is complete the machine will reboot and join the cluster.

      Manage repositories

      PCM 2.0.1 can support multiple repositories on a single installer node. In a simple PCM 2.0.1 cluster, there is usually only one repository; however, as the cluster environment becomes more complex there is a need for different operating system repositories or even copies of existing repositories. The PCM 2.0.1 kusu-repoman tool creates, deletes, manages kits and snapshots of PCM 2.0.1 repositories. Repositories are attached.

      Repository commands and examples are given below.


      Note: If there is a space in the repository name, enclose the name in quotation marks. For example:
      kusu-repoman -n -r "repository name"

      Create a repository

      # kusu-repoman -n -r testrepo
       
      Repo: testrepo created. You can now add kits, including OS kits, to the new repository.

      Delete a repository

      # kusu-repoman -e -r testrepo

      Add kits to a repository

      # kusu-repoman -a -r testrepo -k base
      Kit: base, version 5.1, architecture noarch, has been added to the repo: testrepo. Remember to refresh with -u

      Remove kits from a repository

      # kusu-repoman -e -r testrepo -k base
      
      Kit: base-5.1-noarch removed from repo: testrepo. Remember to refresh with -u

      Update a repository

      # kusu-repoman -u -r testrepo
      
      Refreshing repo: testrepo. This may take a while...

      Create a repository snapshot

      # kusu-repoman -s -r rhel5_x86_64 

      Add nodes to the cluster as diskless and imaged

      Imaged node provisioning uses a repository on the install node and creates a disk image (essentially a pre-installed version of the OS) that is saved in the PCM 2.0.1 /depot/images directory.  When a node is added to the compute-imaged node group, a special initial ram disk and kernel are sent to the node. The disk image is then sent across the network and written to the disk. Once the disk image is installed, the node reboots and starts from the disk instead of the network. An imaged installation is much faster than a standard package-based installation, and is theoretically more customizable than the package-based installation. Images are useful in situations where an application requires a specific version of an operating system with specific packages installed.  

      Diskless installations are very similar to imaged installations, with one big difference: the image created for a diskless install must fit entirely in the RAM on the compute node. Diskless installs are much smaller than package-based or imaged installs. PCM 2.0.1 comes configured with a compute-diskless node group. The compute-diskless node group is configured to create a small image that can fit in the RAM of a node. The image can be further customized to reduce the size of the image if needed. Diskless installations are very quick, usually taking less than 30 seconds to install a single node with an operating system. Diskless installations do not require nodes with physical disks attached to them.

      Adding nodes to the cluster as diskless or imaged is very simple.  The steps for installing are exactly the same as installing a package-based node, the only difference being that the node group compute-diskless or compute-imaged is chosen for the nodes. To add nodes to the cluster as either diskless or imaged, complete the following:

      1. Log on as root, and then run the kusu-addhost tool:
      2. # kusu-addhost
      3. Choose compute-imaged or compute-diskless node group, and then select Next.

      4. Choose the network interface to listen for new nodes, and then select Next.

        The kusu-addhost waits for the new nodes to boot.

      5. PXE boot the new nodes either manually or by logging into the BMC (Base Management Controller) on the compute node.
        • If the node is connected to the proper network, it PXE boots and installs either a diskless or disk image.  When the installation is complete, the node reboots.
        • If kusu-addhost properly detected the new nodes, their MAC addresses appear on the kusu-addhost TUI.  Exit the TUI when finished.


      Note: PCM 2.0.1 does not support diskless and image based node groups in SLES 10 SP3 platform.

      Use kusu-ngedit to provision diskless and imaged nodes


      Currently, the default initrd.img for diskless and imaged node groups only contains a limited set of drivers. Some errors are encountered, when imaged and diskless nodes are provisioned, due to lack of drivers.

      Use kusu-ngedit to resolve dependencies among kernel modules.

      For example: Provisioning diskless nodes on Intel hardwares fails as the Ethernet card driver module "igb" is not included in default initrd.img for imaged and diskless node groups.

      As a workaround, do the following steps:

      1. Get the list of drivers for the hardware you want to provision as imaged or diskless nodes.
      2. You can get the list from the corresponfing hardware vendor.

        Example: module "igb" is needed for Intel diskless hardware

      3. Check if there are any dependencies among the kernel modules.
      4. Example:

        module "igb" depends on module "dca" and "8021q"

        In RHEL 5.4, both "dca" and "8021q" do not depend on any modules

        The final kernel modules list should be: "igb", "dca" and "8021q"

         # modinfo igb
         filename: /lib/modules/2.6.18-164.el5/kernel/drivers/net/igb/igb.ko
         version:  2.1.9
         ...
         depends:  dca,8021q
         ...
      5. Check if all kernel modules in the list are associated with imaged and diskless node group in the Modules screen:
      6.  # modinfo igb

        If some kernel modules are not associated, select them and apply changes.


      Use the exported '/shared' directory to install software that can be used from all compute nodes

      PCM 2.0.1 automatically creates a /depot/shared NFS share on the installer node which will be mounted by all compute nodes. This NFS shared directory is accessible as /shared from the installer and compute nodes to easily deploy ISV (Independent Software Vendor) software or share programs to the compute nodes without installing them on each node. This is a quick and easy solution for mid size clusters. Otherwise, the NFS server becomes too loaded for big size clusters.

      Connect to NFS servers

      Home directories in PCM 2.0.1

      PCM 2.0.1 automatically configures /home on the installer node as an NFS export.  The compute nodes automatically NFS mount /home. This default configuration makes it easy to add new users and their home directories to the cluster. Add them to the installer node and the files in the cluster using the command kusu-cfmsync -f.

      NFS mounted directories in PCM 2.0.1

      Adding other NFS fileservers to a PCM 2.0.1 cluster involves adding the mount point for the NFS server to the /etc/fstab file on all nodes or to the automounter on all nodes (or just nodes in a node group).  The cfm manages a directory tree of files for all nodes in a node group. Modifying the /etc/cfm/<node group>/etc/fstab.append file for a particular node group adds the new filesystem to all machines in the cluster. Adding filesystem mounts to fstab.append mounts the new filesystems on boot; if the automounter is used, the filesystem is mounted automatically by the automount daemon.
      Example: Adding filesystem to fstab

      In this example you must specify from where you want to mount extra filesystems. A node group must already exist.

      1. From a command prompt, navigate to the location of the fstab.append file:
      2. # /etc/cfm/<node group>/etc/fstab.append
      3. Edit fstab.append.

        In the file, indicate the location of the directory you want shared by the NFS server, and the location of the directory where you want it mounted on a node. Run man fstab for more information on how to edit this file. An abbreviated example is provided below:

        # <nfs_server_name>: /<directory_shared_by_nfs_server>  /<directory_to_mount_on_node>   <other_fstab_default_info>

        Note: The directory that you want to mount must already exist on the compute node.
      4. Run kusu-cfmsync -f to make your changes effect.

      Append external nodes

      Running genconfig hosts on its initial state only generates information on existing nodes (including PCM installer node, compute nodes, and unmanaged nodes). Initially,the optional /etc/hosts.append file does not exist on your local machine so you need to create this file if you want to append external nodes information to the genconfig hosts output.

      Sample output on initial state of genconfig hosts:

      
        .. ...
        192.168.0.100    master-node
        192.168.0.101    compute-00-00
        192.168.0.102    compute-01-00
      
        # Unmanaged nodes
        192.168.1.123    unmanaged-00-00
        

      Note: Moving managed nodes into unmanaged node group: PCM will not change anything on these nodes and these nodes will be able to receive CFM broadcasts in cluster. Unmanaged nodes are designed for network devices (such as routers and switches) only. It is not recommended to move a managed host into unmanaged node group.

      To append external nodes information to genconfig hosts output:

      1. Create and edit /etc/hosts.append file on the PCM installer node.

      
          111.111.111.111  external-unmanaged-00-00
          222.222.222.222  external-unmanaged-00-01
      Refer to the validation notes specified below when providing line entries in the /etc/hosts.append file.

      2. Run kusu-genconfig hosts>/etc/hosts.

      3. Run kusu-cfmsync -f.

      The generated output shows both the details for existing nodes and external nodes, including validation comments on line entries that were ignored during the verification process of importing external nodes information.

      Note that if the same hostname is assigned to different IP addresses, the line of information using the same hostname but referencing to a different IP address will be ignored.

      Also note that the whole line of external nodes information will be ignored if any of the following entries are found in certain lines of the /etc/hosts.append file during validation:

      • IP is not a valid IPv4 address. 
      • IP is reserved by managed or unmanaged nodes (including PCM master node and other compute nodes) in PCM cluster. 
      • Host name is not a fully qualified host name (containing invalid characters, of invalid format).  
      • Host name is reserved by managed or unmanaged nodes (including PCM master node and other compute nodes) in PCM cluster.  
      • DNS zone is not a fully qualified domain name (containing invalid characters, of invalid format).  
      • DNS zone is same as the provisioning DNS zone of the current PCM cluster.  

    Manage network interfaces

    Unlike other HPC cluster toolkits, PCM 2.0.1 allows for a lot of flexibilty in how networks are defined and used in a cluster. In a PCM 2.0.1 cluster, all possible network configurations are defined using the kusu-netedit tool. The networks are then associated with the appropriate node groups.

    Figure: Sample network configuration

    The key to configuring networks with kusu-netedit is that each entry in the kusu-netedit table is a template for a network adaptor attached to a particular network. For example, if the network adaptor is eth0 attached to a 10.10.0.0 network, then the entry in the kusu-netedit table represents all nodes that have a eth0 network attached to a 10.10.0.0 network.  There are some exceptions to this rule:

    • On the installer nodes, the networks may be provision networks. In these cases, the network definition in the networks table must be unique since installer node provision networks will have a DHCP server bound to them.  In the table below there is a entry for eth0 on the installer node and on the compute nodes connected to the same network interface (10.10.0.0) - so even though the network adaptors are the same (eth0) and the network is the same (10.10.0.0) the network definitions in kusu-netedit are different simply because the network on the installer node is a provision network and the network on the compute nodes (1 and 2) are public.  
    • When the network adaptors are the same, or when the networks are the same but the starting IP address for numbering the nodes is different or the gateway IP address for the nodes is different.  In either case, defining an entry in kusu-netedit for each adaptor attached to a particular network is always the best way to start configuring networks in PCM 2.0.1. 
    • During installation, some networks are automatically configured by PCM 2.0.1. In particular, the networks attached to the installer node must always be configured during installation since it is difficult to add new networks to the installer node after installation.  

    The following table lists the networks that must be configured using kusu-netedit:

    Device Network Subnet Description Type
    eth0
    10.10.0.0
    255.0.0.0
    Eth0 network on installer node
    provision
    eth0
    20.20.0.0
    255.0.0.0
    Eth0 network on node 5
    public
    eth0
    10.10.0.0
    255.0.0.0
    Eth0 network on node 1,2
    public
    eth1
    20.20.0.0
    255.0.0.0
    Eth1 on installer node
    provision
    eth1
    20.20.0.0
    255.0.0.0
    Eth1 on node 6
    public
    eth1
    10.10.0.0
    255.0.0.0
    Eth 1 on node 3, 4
    public
    Ib0 30.30.0.0 255.0.0.0 Infiniband on nodes 3,4,5,6 provision

    Configure new network templates

    1. Log on as root.

    2. From a terminal prompt, run the kusu-netedit tool:

      # kusu-netedit

      Figure: Network Editor screen


    3. Create a new network definition by selecting New.

    4. Fill in the fields to create a new network.

      Note that some networks do not need a Gateway IP address. For example: In the network diagram above, the Infiniband network does not require a route to the public internet network, thus no Gateway IP address is required for the Infiniband network configuration.

    5. Figure: Edit existing network screen


    6. Once networks are defined in the database with the kusu-netedit tool they can be attached to node groups using the kusu-ngedit tool. For example, once the 10.10.0.0 network is defined, log on as root and run the following tool:
      # kusu-ngedit
    7. Edit any network of your choice and select Next until the Networks screen is displayed.  

      All of the available networks defined in kusu-netedit should display on the Networks screen.

      Figure: Networks screen


    8. Assign a network to a node group by selecting the network using the space bar.

      This puts an asterisk beside the network, and assigns all nodes in the chosen node group with this network.

    By combining kusu-netedit with the node group editor kusu-ngedit, complex network configurations such as the one described above can be created and managed with PCM 2.0.1.


    Important: If you created multiple network interfaces on a node, PCM 2.0.1 automatically creates short names of hosts for these interfaces of the node:

    * First provision interface (for master node)
    * First bootable provision interface (for compute node)


    Synchronize files in the cluster

    HPC clusters are built from many individual compute nodes. All of these nodes must have copies of common files such as /etc/passwd, /etc/shadow, /etc/group, and others. PCM 2.0.1 contains a file synchronization service called cfm (Configuration File Manager). The cfm service runs on each compute node in the cluster. When new files are available on the installer node, a message gets sent to all of the nodes notifying them that files are available. Each compute node connects to the installer node and copies the new files using the httpd daemon on the installer node. All of the files to by synchronized by cfm are located in the directory tree/etc/cfm/<node group>. The cfm service organizes file synchronization trees by node group. A directory exists for each node group under /etc/cfm. Below the node group name is a tree that replicates the file structure of the machines in the node group.

    Figure: File structure example

    In the figure above, the /etc/cfm directory contains several node group directories such as compute-diskless and compute-rhel. In each of those directories is a directory tree where the /etc/cfm/<node group> directory represents the root of the tree. The /etc/cfm/compute-rhel/etc directory contains several files or symbolic links to system files. These system files synchronize across all of the nodes in the node group automatically by cfm. Creating symbolic links for the files in cfm allows the compute nodes to automatically synchronize with system files on the installer node.

    To add files to cfm, create the new file in the appropriate directory. Make sure to create all of the directories and subdirectories for the file before placing the file in the correct location. Existing files can also have a <filename>.append file. The contents of a <filename>.append file is automatically appended to the existing <filename> file on all nodes in the node group.

    To notify all of the nodes in all node groups or nodes in a single node group, run this command:

    # kusu-cfmsync –f –n compute-rhel

    This synchronizes all files in the compute-rhel node group.

    To synchronize all files in all node groups, run this command:

    # kusu-cfmsync –f

    For more information on kusu-cfmsync, view the man pages.


    Note: Place a file in /etc/cfm/<node group>/<path to file> to replace the existing file on all nodes in the node group or to create the file if it does not exist.  Create a file with /etc/cfm/<node group>/<path to file>.append to append the contents of the file on all nodes in the node group.


    Synchronize time in installer and compute nodes

    Clock synchronization is very important in a cluster that is why NTPD is running. Follow these steps to synchronize the time on the installer and compute nodes:

    1. Use an external NTP server to sync the time in both installer and compute nodes.
      # service ntpd stop
      # ntpdate pool.ntp.org
      # ssh compute-01-00
      # service ntpd stop
      # date
    2. Modify the timestamps of the /etc/{passwd,group,shadow} files to that of current time before logging out of the compute node.

      Repeat the step for the installer node.


    3. Update files on the compute node.
    4. # cfmsync -f

      Move hosts between node groups

      During installation, PCM 2.0.1 assigns all nodes in the cluster to a node group; however, HPC clusters rarely stay the same and change over time: new nodes are added, old nodes are removed, the applications, packages and even the operating system on the nodes will change. PCM 2.0.1 is designed with this flexibility in mind. It is very easy to move single nodes or a group of nodes from one node group to another. When a node is moved, the "personality" of the node changes and the node is manually or automatically provisioned according to the configuration defined by the new node group.

      1. Run the kusu-nghosts command to move one node group to another.
        # kusu-nghosts

        Figure: Node Membership Editor screen

      2. Choose either Move selected nodes to a node group or Move all nodes from a node group to a new node group .

      3. Choose the nodes you wish to move and then choose a destination node group.

      4. Select Move to transfer the nodes to the new node group.

        Figure: Node Selection screen


      5. Once the nodes have been moved, select Quit to exit.

        The nodes are now assigned to a new node group. The nodes, however, are not re-provisioned until the node (or nodes) is rebooted.

      6. Shutdown the affected nodes and then restart them, ensuring that they PXE boot (by default PCM 2.0.1 expects all nodes to always PXE boot).

      If nodes can be moved back to the original node group using kusu-nghosts, they are provisioned back to their original state. Note: Each time a node is provisioned, it returns to its original installed state - thus any configuration files or applications on the nodes are also removed and re-installed. If you need to retain the state of the nodes, consider saving all shared configuration files on a separate network attached storage or fileserver.

      Add multiple unmanaged hosts file

      The kusu-addhost tool was enhanced to let you add multiple unmanaged hosts with static IPs and hostnames in a single process. The unmanaged hosts file does not exist unless you enable the enhanced kusu-addhost feature to import multiple unmanaged hosts information.
      Specify the static IP and static hostname of each unmanaged host, to be added in a file, with this format: static host name: static IP>

      Example:

      
      	  hostA:1.2.3.4
      	  hostB:11.22.33.44
           

      To import Mac file using kusu-addhost -f:

      # kusu-addhost [-f file -j ifname -n {node group} [-r rack#]]

      Note the following:

      • -n must be specified when kusu-addhost -f is used. 
      • Use -j if -n is specified with managed node groups (installer, compute, compute-imaged and compute-diskless),. 
      • Do not use -j if -n is specified with unmanaged node groups.  

      • For unmanaged hosts to be added to PCM cluster, each pair of static host name:static IP in unmanaged hosts file, hostname and IP validation must meet these criteria:

        • Static host name only contains characters '.', '-', 'a'-'z', 'A'-'Z' and '0'-'9', and does not start with characters '.' and '-'. 
        • Static host name is not in use. 
        • Static IP address is a valid IPv4 address.  
        • Static IP address is not in use. 
        • Static IP address is within one of the provisioning networks of PCM cluster.  

    Create a new repository

    PCM 2.0.1 comes pre-configured with at least one repository. This repository is used to create the installer node and to create compute nodes or any other type of node needed for the cluster. There are two obvious cases when more than one repository is needed in the cluster:

    • Updating a repository from a snapshot (see Update a repository)
    • Mixing two different operating systems in one cluster

    Clusters tend to grow over time. As the needs of users increase, an administrator adds more nodes to the cluster. It is not uncommon for clusters to start on one version of an operating system and over time need new nodes with a new version of the operating system. PCM 2.0.1 can provision different operating system types and versions from a single installer node. This provides the cluster administrator with flexibility when designing a cluster, and makes it easy to migrate the cluster to a new operating system when it arrives.

    To add a new operating system to a PCM 2.0.1 installer node, perform the following:

    1. Create a new repository using the kusu-repoman command:
      # kusu-repoman -n -r "Repo for sles10.3-x86_64"
    2. Add the appropriate kits to the repository using the kusu-kitops command.

      The base kit is always needed for proper operation of PCM 2.0.1. The operating system is also a kit and must be added to the installer node before it can be added to the repository.

      1. Add the OS kit PCM 2.0.1:
        # kusu-kitops -a -m "/media/sles10.3-x86_64" --kit=sles 
      2. Add the base kit and the OS kit to the repository:
        # kusu-repoman -a --kit=sles -r "Repo for sles10.3-x86_64"
        # kusu-repoman -a --kit=base -r "Repo for sles10.3-x86_64"
        # kusu-repoman -u -r "Repo for FC-6-i386"

        Note: If you have more than one kit called "sles" or "base", you will need to specify the kit version or kit architecture.

    3. Update the repository (see Update a repository) .

    4. Create new node groups (see Create a new node group) .

    5. Associate the new repository with one or more node groups (see Associate a repository with node groups).

    Now the repository contains the necessary components the kusu-ngedit tool needs to create node groups and assign them to the new repository.

    Update a repository

    PCM 2.0.1 repository management allows administrators to update operating system repositories using the native package management tools. On Red Hat based systems, the tool is "yum". PCM 2.0.1 can connect to Red Hat Network and download updates to the repository (as long as you have a valid entitlement) or connect to any yum repository and download patches and updates.

    Maintaining and updating repositories is critical for system administrators, particularly within HPC clusters. When security patches are issued, the natural response is to install the patches as quickly as possible on the cluster. In many cases, however, there are unintended side-effects caused by updating a cluster. Typically, when an update tool like yum is used, the administrator downloads all updates and applies them to the operating system. Some of these updates may cause problems on a cluster while others fix security issues. What normally happens is the administrator decides which updates are needed and then manually installs them (or writes a script to do this) into the cluster. A better solution is to employ an "update and test" mechanism (that is, download the updates, test them out, and only if everything is all righ, update the entire cluster). PCM 2.0.1 provides this mechanism.

    If you are updating an existing software kit, you must first update the configuration file that is read when you initiate the patch procedure. Edit the file /opt/kusu/etc/updates.conf and specify your Red Hat Network user name and password.

    To safely update a repository, perform the following:

    1. Create a snapshot of the existing repository that will be updated using the kusu-repoman command.
      # kusu-repoman -s -r "repository name"

      The kusu-repoman command can perform the following actions on a repository:

      • Create a repository
      • Add kits to a repository
      • Refresh a repository
      • Snapshot a repository
      • Delete a repository

      You can list repositories, including the new snapshot repository, using the kusu-repoman -l command. The new repository is named similarly to the original repository on which it was based, with the addition of a snapshot date and time stamp. For example, if the original repository is named "myrepo," the new repository is named "myrepo(snapshot Tue Mar 18 11:43:06 2008)."

    2. Update the snapshot repository with the command kusu-repopatch.
      "kusu-repopopatch -r "new repository name""

      The above command is required to interact with the updates. You have to input "Yes" or "No" to specify whether to update kernel and initrd or not.

      If you do not want to see this option, use the following command:

      "kusu-repopopatch -y -r "new repository name""
    3. Test the updated repository snapshot on one or two nodes.
      1. Launch kusu-ngedit.

      2. From the original repository (on which you based the new repository), select a node group and make a copy of it.

      3. Edit the copied node group, making changes on the Repository page to include the new repository name to which you want to associate it.

      4. Review each configuration page and make any other changes to the new node group, as required. Select Accept when your edits are completed.

        The new node group is now mapped to the new repository.

      5. Exit the Node Group Editor.

      6. Launch kusu-nghosts.

      7. Find one or two nodes to use for testing purposes, and then select those nodes to move to the newly created node group. These nodes are reinstalled automatically.


      8. Log on to the testing nodes to check if the kernel and packages are updated.

    4. Merge the updates with one of these two methods:

      • Move all of the machines into the new node group.
      • OR

      • Run kusu-repoman -a -k new_kit_name -r repository_name to associate the updates with the original repository.

        You must then run kusu-repoman -u -r repository_name to update the repository to which you have added the new kit.

    5. Update the entire cluster after testing the updates in the snapshot repository.

      You can update the cluster by reinstalling nodes (cleanest method), or by updating existing node packages in the node group (a faster method, but there is a risk of failed package upgrades). Both options are described below:

      • Reinstall nodes: Run kusu-boothost -n "node group name" -r

        This performs a full reinstallation on all nodes in the node group.

      • Update node packages: Run kusu-cfmsync -u -n "node group name"

        All nodes in the node group will check their repositories and make any required updates to packages.

    Patch OS of a repository

    Kusu-repopatch is a repository patching tool. Multiple repositories supported on the installer node can be patched with the latest OS vendor packages using kusu-repopatch commnand. This handles the authentication to the Distribution vendors site and downloads the updated packages from there then patches the existing repositories. Kusu-repopatch supports the use of proxies. If the PCM installer can only access internet via proxy, set the proxy settings in /opt/kusu/updates.conf or set using environment variables.

    /opt/kusu/etc/updates.conf or set using environment variables.

    To set proxy settings in the configuration file /opt/kusu/updates.conf:

    1. Specify the proxy server url.
      [proxy]
      http_proxy = http://username:password@proxy_host:port
      https_proxy = http://username:password@proxy_host:port2

      Note: If the proxy servers for https and https are the same, specify only one. If the proxy servers are different, the system chooses https as its priority.

      Ensure that default gateway IP should be the proxy server IP. Modify in /etc/sysconfig/network file.


    2. Run kusu-repopatch.
    3. To set proxy settings using environment variables:

      1. Export the proxy server.
        export http_proxy = http://username:password@proxy_host:port1
      2. Run kusu-repopatch -r repository name.
        Note: If you export the proxy server url and specify the proxy server url in conf file at the same time, the one specified in the conf file is invalid.


      3. If the PCM installer can directly access internet, configure /opt/kusu/etc/updates.conf, then run kusu-repopatch -r repository name.

        For RHEL OS, you need to enter the username, password, and server ID.

        Example:
        
        	   [rhel]
        	   username=
        	   password=
        	   url=https://xmlrpc.rhn.redhat.com/XMLRPC
        	   yumrhn=https://rhn.redhat.com/rpc/api
        
        	   # Server ID can be found under RHN under Systems -> Details -> System Info -> RHN System ID
        	   # Do ensure that the correct server ID is given.
        
        	   [rhel-5-x86_64]
        	   serverid=
        
        	   [rhel-5-i386]
        	   serverid=
              

        For SLES OS, you need to enter the username and password.

        Example:
        
        	   #mirror credentials for your Novell subscription.
        	   #To do this, visit http://www.novell.com/center, select your subscription, and press Mirror Credentials.
        	   [sles]
        	   username=
        	   password=
         	   

        Important: There are two cases for cross distribution of clusters.

        Case 1: The installer is RHEL and the node is SLES. To patch SLES OS, enter username and password in /opt/kusu/etc/updates.conf in SLES section.

        Case 2: The installer is SLES and the node is RHEL. To patch RHEL OS, you need to obtain a server ID for the RHEL OS of the repository you want to patch. To patch a RHEL repository, a serverID (RHN System ID that is maintained in RHN) which can be obtained from RHN via rhn_register, is required. Refer to the steps below on how to obtain the server ID.
        To obtain the server ID, follow these steps:

        1. Find an identical RHEL OS version machine with the OS of the RHEL repository then launch rhn_register.

        2. Enter valid username and password when prompted then click Next.

        3. In "Register a System Profile - Hardware" screen,enter the SLES installer hostname and then click Next.

        4. Go to RHN to look for the SLES installer hostname and get the server ID.

        5. Modify the file /opt/kusu/etc/updates.conf then enter valid username, password, and server ID (obtained from RHN).

        6. Run kusu-repopatch -y -r RHEL repository name command.

        7. Once the patch is finished, provision a RHEL node to check that the kernel and packages are updated.

        Provision compute node after kusu-repopatch

        After patching the PCM repository with kusu-repopatch, the initrd RAM disks for compute node provisioning is updated. The new initrd RAM disks include the new drivers in the new OS kernel but the DELL proprietary drivers are old in the new RAM disks because the DELL proprietary drivers were built to work with the old OS kernel.

        When the compute node is provisioned after kusu-repopatch, the new initrd RAM disks load, the DELL proprietary drivers fail, and the initrd RAM disks cannot detect the DELL network devices. Thus, PCM cannot provision the compute node(s).

        To resolve the problem when provisioning a compute node after kusu-repopatch:

        For packaged compute node(s):

        1. Run kusu-ngeditto edit the 'compute-rhel-5.5-x86_64' node group.
        2. On the 'Boot Time Parameters' screen, change the 'Initrd:' to 'initrd-rhel-5.5-x86_64.img' and change the 'Kernel:' to 'kernel-rhel-5.5-x86_64'.
        3. Follow through the wizard and complete.
        4. Run kusu-addhost to provision the packaged compute node.

        For imaged or diskless compute node(s):

        Obtain the DELL proprietary drivers that can work with the new OS kernel.

        DELL proprietary drivers

        • igb.ko
        • ixgbe.ko
        • bnx2.ko
        • bnx2x.ko
        • e1000e.ko
        • cxgb3.ko
        • mpt2sas.ko
        • megaraid_sas.ko

        After obtaining the DELL proprietary drivers, package them into the initrd RAM disks manually:

        1. initrd.disked.3.img for imaged compute node group
        2. initrd.diskless.4.img for diskless compute node group

        Contact Platform Computing for technical support.

      Refresh Platform OFED kit packages for new kernel

      It is a common task for system administrators to patch the operating system with the latest OS patches to keep the system up to date for security, among other reasons. Platform Cluster Manager has the kusu-repopatch tool that can help the system administrators to do this job easily. Note that you may have to do some manual steps depending on the case.

      The Platform OFED kit has a dependency on the Linux kernel that is presented in the operating system. Problems or difficulties may arise after kernel patching so it is highly recommended to rebuild the Platform OFED kit package(s). There are some packages in the Platform OFED kit that are related with the OS kernel such as kernel-ib, and kernel-ib-devel rpms. If the OS kernel is updated, the kernel-ib and kernel-ib-devel packages should also be updated with the new OS kernel.

      The solution involves updating the compute node with new OS kernel then rebuilding the kernel-ib and kernel-ib-devel package on the kit. It is safer to do these on the compute node than on the master node. Once the kernel-ib and kernel-ib-devel packages are built, these have to be copied into the master node for these to be added into the PCM local repository then run kusu-cfmsync -u command to install the new packages to all nodes prior to rebooting all the compute nodes and master.

      Prerequisites:

      * PCM 2.0.1 DELL master node such as 'dellmaster'

      * One PCM 2.0.1 DELL packaged compute node such as 'compute-00-00'

      * Base OS is RHEL 5.5 and subscription channel is clocked on RHEL 5.5 on RHN

      * Internet connection on master node

      Follow these steps to update the Platform OFED kit package:

      1. Run kusu-ngedit to associate all the components of Platform OFED kit with the 'installer-rhel-5.5-x86_64' and 'compute-rhel-5.5-x86_64' node groups then select "Yes" to accept the changes and sync-up the changes to the installer and compute nodes.
      2. On the dellmaster, run kusu-repopatch against the current repository.

        Example:

        kusu-repopatch -r rhel-5.5-x86_64 -y
      3. On the compute-00-00 node, detect and delete conflicting packages.
        • Run yum -c /var/cache/yum/yum.conf update.
        • Note down the packages that are reporting dependency errors.
        • Remove the actual RPM from both /depot/updates/rhel/5/x86_64/ and their corresponding symbolic links in /depot/kits/<rhel-updates-kit-id>>/.

        Tip:If dependency issues persist, rerun yum -c /var/cache/yum/yum.conf update command to resolve it. In addition, the packages in the Platform OFED kit conflict with the OS OFED packages, so you need to remove the relative RPMs (like dapl and libmlx4) from the /depot/kits/<os-kit-id> directory. To get the os-kit-id, run kusu-kitops -l -k rhel| grep -i kid .

      4. On the dellmaster, run kusu-repoman -u to refresh the PCM local repository.
      5. On the dellmaster, run kusu-cfmsync -u to install the updates.
      6. Reboot the compute-00-00 node to rebuild Platform OFED Kit packages.
      7. Rebuild kernel-ib and kernel-ib-devel rpms on the compute-00-00 node.

        1. On the dellmaster, find OFED source rpm in /depot/kits/<platform-ofed-kit-id>/: OFED-1.5.1-1.noarch.rpm Copy it to the compute-00-00 node.

          Tip: To get the rhel-updates-kit-id, run kusu-kitops -l -k rhel-updates | grep -i kid

        2. Install ofed source rpm on the compute-00-00 node: rpm -ivh OFED-1.5.1-1.noarch.rpm.

        3. Install kernel-devel, rpm-build, bison, tk, flex, and whatever is required on the compute-00-00 node.
        4. Note that during the building process, you are prompted for any missing package that needs to be installed.

        5. Create the required directories on the compute-00-00 node to build the kernel-ib and kernel-ib-devel packages:
        6. # mkdir -p /tmp/ofed-new
          # mkdir -p /tmp/ofed-new/RPMS
          # mkdir -p /tmp/ofed-new/BUILD
        7. Rebuild the kernel-ib and kernel-ib-devel rpms with the following command on the compute-00-00 node.

          Tip: Replace the <kernel-version> to your running kernel uname -r' in the given example.

        8. Example:

                    rpmbuild --rebuild  --define '_topdir /tmp/ofed-new' --define 'configure_options
          	  --with-core-mod --with-user_mad-mod --with-user_access-mod --with-addr_trans-mod
          	  --with-mthca-mod --with-mlx4-mod --with-mlx4_en-mod --with-nes-mod --with-ipoib-mod
          	  --with-sdp-mod --with-srp-mod --with-rds-mod' --define 'build_kernel_ib 1'
          	  --define 'build_kernel_ib_devel 1' --define 'KVERSION <kernel-version>'
          	  --define 'K_SRC /lib/modules/<kernel-version>/build' --define 'network_dir /etc/sysconfig/network'
          	  --define '_prefix /usr' --define '__arch_install_post %{nil}' /opt/ofed/src/OFED-1.5.1-1/OFED-1.5.1/SRPMS/ofa_kernel-1.5.1-OFED.1.5.1..src.rpm
        9. Wait for rebuilding to be completed. This may take a few minutes or longer.

          Once finished, all RPMs become available in /tmp/ofed-new/RPMS/x86_64 directory on the compute-00-00 node.

        10. Copy the RPMS into the /depot/kits/<platform-ofed-id>/ directory on the dellmaster.


      8. On the dellmaster, run kusu-repoman -u to refresh the repository.
      9. On the dellmaster, run kusu-cfmsync -u to install the updated Platform OFED Kit RPM packages.
      10. Reboot all the compute nodes and master to switch to the most current Linux kernel and load the new OFED drivers.

      Associate kit components to node groups

      When you add a new kit to a node group repository, set up a new node group, or offload some functions from one machine to another, you will need to associate kit components to the node group.

      For information about individual components, see the corresponding kit documentation.

      1. Run kusu-ngedit.

      2. (Optional) If desired, create a node group to reserve certain machines for a special purpose (for example, to run all web services).
        • Copy an existing node group upon which to base the new one.

        • Configure the new node group to make it distinctive and suitable for your purposes. (For example, rename it, provide a description, indicate the NN format and name, etc.)
      3. Choose the node group to which you want to associate kit components.

      4. Navigate to the Components screen.

        Here you can control the association of components with the parent kit. What you see on the Components page depends on the repository to which the node group is associated.

      5. Choose those components you want installed on the machines in the node group.

      6. Complete any other configuration changes within kusu-ngedit.

      7. If desired, use nghost to move a node to a node group, or kusu-addhost to add a new host to a node group.

      Set compute node in BIOS

      In BIOS, set the boot sequence of the compute node to always boot from the NIC that connects to provision network.

      Update the installer node and compute node

      To install a security update, or other required updates across your PCM 2.0.1 cluster, you will need to update both the installer node and the compute node.

      1. Use native tools, such as yum, to update the installer node. For example:

        yum update

        This command uses an existing repository to update files on the installer node. Ensure that you have updated this repository with required changes prior to running this command. See Update a repository for details.


        Note: Depending on your repository configuration, the install script might visit certain OS and 3rd party application sites to download kernel or driver updates. Updates might take some time as a result. You will want to monitor the update process and consent to or deny various update requests.

        Warning: If you update a kernel, some drivers that were previously provided by 3rd parties may no longer work. Take precautions in deciding to update a kernel.


      2. Update all compute nodes:

        1. Update the repository (see Update a repository).

        2. Perform a full reinstallation on all nodes in the node group:

          kusu-boothost -n "node group name" -r

          PCM 2.0.1 checks the database state, and then reboots all nodes and reinstalls required updates and drivers.

      PCM 2.0.1 NIS/LDAP authentication

      1. Configure the installer node to authenticate using the corporate NIS or LDAP server.

        To do this, run the RH tool system_config_authentication to configure the authentication services that the installer node uses.

      2. Run kusu-cfmsync -f.

        This command signals the nodes in the cluster to look for any configured NIS or LDAP servers you might have, and to update their configuration files and or packages/components accordingly.

      3. Navigate to /etc/cfm, and look for the these auto-generated files: shadow.getent, passwd.getent, and group.getent.

        These updated files now contain password and authentication information needed for the rest of the cluster.

      4. Tell the node group which authentication files to use, along with password information:
        1. Navigate to /etc/cfm/<node group>/etc.

          Here you will find existing symbolic links that must be updated.

        2. Change the symbolic link to point to the new file containing updated authentication information (the *.getent files).

      Passwordless ssh between managed compute nodes

      PCM 2.0.1 allows passwordless ssh between managed compute nodes for root account. The passwordless ssh between managed compute nodes is configured for PCM cluster by default behavior:

      • Root account users can perform passwordless ssh between the provisioned compute nodes if PCM is installed on the installer node.
      • Root account users cannot perform passwordless ssh to the installer node if at least two compute nodes of managed compute node groups are provisioned.
      • Root account users cannot perform passwordless ssh between unmanaged nodes. Neither can passwordless ssh be used to other nodes from unmanaged nodes.
      • Root account users can perform passwordless ssh to other managed compute nodes from the newly provisioned node when new node group is created by copying a managed compute node group (e.g., compute node group) and node of the newly created node group is provisioned.
      • Root account users can perform passwordless ssh perform passwordless ssh to other managed compute nodes from the newly provisioned node when at least two compute nodes of managed compute node groups are provisioned and root password is changed.
      • Root account users cannot perform passwordless ssh to other existing managed compute nodes from the removed node if one of the provisioned managed compute nodes is removed.

      When a PCM installer node is installed, in addition to the public/private key pair of the installer, a pair of public/private keys is generated for compute-only.

      The generated pulic/private key pair is located in a PCM-specific location on the installer node as below:

      • /opt/kusu/etc/.ssh/id_rsa - the private key for compute-only.
      • /opt/kusu/etc/.ssh/id_rsa.pub - the pulic key for compute-only.

      The directory of /opt/kusu/etc/.ssh and the public/private key pair under can only be accessed by root account with READ permission only.

      When a PCM installer node is installed, in addition to the authorized_keys file of the installer located under /root/.ssh, an authorized_keys file is generated to be distributed to all the managed compute nodes only.

      The generated pair of authorized_keys file for compute-only, is located at:

      • /opt/kusu/etc/.ssh/authorized_keys - the autnorized_keys for compute-only.

      The directory of /opt/kusu/etc/.sshand and the authorized_keys file under can only be accessed by root account with READ permission only.

      Symbolic links are created for each managed node group on the installer node during installation of the installer node.The directory of /etc/cfm//root/.ssh and the symbolic links under pointing the private key and the authorized_keys generated under /opt/kusu/.ssh can only be accessed by root account with READ permission.

      Symbolic links to the private key and the authorized_keys are created during the installation of the installer node. Managed compute nodes will have the private key and the authorized_keys available at /root/.ssh initially after being provisioned. Once a managed compute node is provisioned and up-running, you can access other managed compute nodes by default.

      Access the encrypted private key and authorized_keys from the Web (i.e. http:///cfm/2/root/.ssh/id_rsa). The encryption key is not downloadable so you will not be able to retrieve the unencrypted files from the Web.


      Known Issues: If you remove a managed compute node from the PCM cluster, configuration files used to be synchronized by CFM will not be touched. Thus, previously distributed public/private key pair for compute-only on and the encryption key .cfmsecret file on this node is not removed. The removed node may still be able to ssh to other existing managed compute nodes in the cluster without password.

      Similarly, regeneration of the public/private key pair for compute-only on the installer node does not support this. Although newly generated key pair is distributed on the existing managed compute nodes in the cluster which will be different from the key pair remained on the removed node, the remove node can still download the encrypted key pair from the Wen and hackers may still be able to decrypt the real key pair with the .cfmsecret file remained on the node. This security issue is not handled in this release.


      Provision add-on NIC

      Provision add-on NIC based on the applicable case as shown below:

      Case 1: During Head Node installation

      1. Select eth0 then deselect Configure this device option before clicking OK.

      2. Select new network device eth4 then click Configure.
        • Select Configure this device.

        • Specify IP address and NetMask (for example, 172.20.0.1 and 255.255.0.0).
        • Update network name as Cluster.
        • Select Network type as Provision.
        • Select Activate on boot.
      3. Click OK to continue with the installation.
      4. Verify kusu-netedit output after the installation.
      5. Check kusu-ngedit (Networks field) for the installer and compute node groups.
        • In the example, eth4 is selected by default for both node groups.
        • Change this selection to eth0 for the compute node group as NIC1 will be used for compute nodes connectivity to the network.
      6. Install compute nodes using kusu-addhost.

      Procedures fo after Head Node installation is presented in this second case. Assume that the Head node installation is already completed with eth0 (provisioning) and eth1 (public) networks configured by default.


      Important: If there are compute nodes which are already installed, remove these nodes by running kusu-addhost ¨Ce --purge before configuring. Once the configuration is complete, all the nodes can be added back to the cluster using kusu-addhost, selecting eth3(Nic4) as the provisioning network. See examples below.
      Case 2: After Head Node installation

      1. Create another provision network like eth2(Nic3) using netedit.

      2. Deselect eth0 from the installer and compute node groups using ngedit.

      3. Delete eth0 using netedit.

      4. Use kusu-net-tool to add a new provisioning network.

      5. The following command updates the configuration and refresh the repository:
         kusu-net-tool addinstnic eth3 --netmask=255.255.0.0 --ipaddress= 172.20.0.1 --start-ip=172.20.0.1 --provision -- gateway=172.20.0.1 --desc="network2" --macaddr=ˇ±00:0C:29:1A:33:FFˇ±
        
      6. Verify kusu-netedit output.

      7. Check icfg file on both SLES and RHEL.

      8. For SLES, you need to check the /etc/sysconfig/network/ifcfg-eth-id-00:0c:29:1a:33:ff file to ensure that these information (as below example) are included:
         sles103master:/etc/sysconfig/network # cat ifcfg-eth-id-00:0c:29:1a:33:ff
        	  
         BROADCAST='192.168.10.255'
        
         IPADDR='192.168.10.10'
        
         NETMASK='255.255.255.0'
        
         NETWORK='192.168.10.0'
        
         STARTMODE='onboot'
        
         sles103master:/etc/sysconfig/network #
        
        For RHEL, you need to check that these information (as below example) are included:
         Set "ONBOOT=no" in /etc/sysconfig/network-scripts/ifcfg-eth0
        		  	  
         Set "ONBOOT=yes" in /etc/sysconfig/network-scripts/ifcfg-eth3
        		  
      9. Reboot the Head Node.

      10. Check ngedit (Networks field) for the installer and compute node groups.
        • In the example, eth3ˇ± is selected by default for the installer node group provisioning network.
        • For the compute node group, none of the networks is selected by default.
        • Select eth0 for the compute node group, as NIC1 will be used for compute nodes connectivity to the network.
      11. Install compute nodes using kusu-addhost.

      Backup and restore PostgreSQL database in newer clusters

      PCM 2.0.1 uses a PostgreSQL (postgres) database to store cluster configuration information. This section provides an overview of the database tables, review the backup procedure, as well as how to restore the database.

      The database has the following tables:

      • appglobals
      • components
      • driverpacks
      • kits
      • modules
      • networks
      • ng_has_comp
      • ng_has_net
      • nics
      • node groups
      • nodes
      • os
      • packages
      • partitions
      • repos
      • repos_have_kits
      • scripts

      Prior to performing server maintenance or upgrading your application, you will want to safely back up the database, and then later restore it.

      Back up the database

      Use the following commands to back up the database.

      # export PGPASSWORD=`cat /opt/kusu/etc/db.passwd`
      # pg_dump -U apache kusudb > db.backup 

      Restore the database

      Use the following commands to restore the database.

      1. Authenticate access permission of kusudb for user postgres:

      a) Edit the file:

      /var/lib/pgsql/data/pg_hba.conf

      and add the following line:

      local   all   postgres  trust 

      b) Restart postgres:

      service postgresql restart 

      2. Drop database kusudb

      dropdb -U postgres kusudb 

      3. Create database kusudb:

      createdb -U postgres kusudb 

      4. Load backup file:

      psql -U postgres kusudb < db.backup 

      5. Remove user postgres access permission to kusudb:

      a) Edit the file:

      /var/lib/pgsql/data/pg_hba.conf

      and remove the line:

      local   all   postgres   trust

      b) Restart postgres:

      service postgres restart
      Verify the database restoration

      To verify that the database has been successfully restored, run the following command:

      • genconfig nodes: If the database restore was successful, this command returns a list of machines.

      Backup and restore MySQL database in older clusters

      Older versions of PCM, use a MySQL database to store cluster configuration information. This section provides an overview of the database tables, review the backup procedure, as well as how to restore the database.

      The database has the following tables:

      • app_globals
      • components
      • driverpacks
      • kits
      • modules
      • networks
      • ng_has_comp
      • ng_has_net
      • nics
      • node groups
      • nodes
      • packages
      • partitions
      • repos
      • repos_have_kits
      • scripts

      Prior to performing server maintenance or upgrading your application, you will want to safely back up the database, and then later restore it.

      Back up the database

      1. From a command prompt, run kusu-genconfig debug.

        Once run, you will see the populated database file.

      2. From the directory where you ran kusu-genconfig debug, specify a name for the backup file. For example:
        # kusu-genconfig debug > db.backup

      Restore the database

      1. To delete the existing database, run this command:

        # mysqladmin drop kusudb


        Caution! This command deletes the entire database. Ensure you have created a backup file first.
      2. Run this command to create a new database:
        # mysqladmin create kusudb

        The newly created database is empty at this point.

      3. Navigate to the location of your backup file (in this example, named db.backup), and then run this command:
      4. # mysqladmin kusudb < db.backup
      5. Navigate to /opt/kusu/sql, and look for a file called kusu_dbperms.sql.

        This file sets the default password for Apache users.

      6. Edit kusu_dbpersm.sql and change the default password "Letmein" to your configured password.

      7. Copy this file to the restored database with this command:
        # mysqladmin kusudb < ./kusu_dbperms.sql
      8. Change directories to /opt/kusu/etc, and then look for a file called db.passwd.

      9. Change the password string in db.passwd to match the one previously entered in kusu_dbperms.sql.
      Verify the database restoration

      To verify that the database has been successfully restored, run the following command:

      • kusu-genconfig nodes: If the database restore was successful, this command returns a list of machines.

      Configure a dedicated logging server in PCM cluster

      The PCM installer node, by default, is set as the logging server. To know the default PCM logging server's address, run command:

      # sqlrunner -q kusu-appglobals-tool show SyslogServer

      You can set a dedicated logging server, in the PCM cluster, that supports forwarding of all logging messages from the PCM installer node and compute nodes but not to other dedicated nodes. Having a dedicated logging server, that collects all logged files, releases some pressure from the PCM installer node. This also allows you identify which messages come from which exact nodes and easily analyze problems.

      To set a dedicated logging server in PCM cluster:

      1. Specify a dedicated logging server address:

        # sqlrunner -q kusu-appglobals-tool set SyslogServer 'IP address'

        For example:

      2. # sqlrunner -q kusu-appglobals-tool set SyslogServer '172.20.7.35'


        Tip: You can specify a logging server using an IP addres or hostname.
      3. Effect configuration changes by running these commands:

        # kusurc /etc/rc.kusu.d/S04KusuRsyslogMaster.rc.py

        # kusu-cfmsync -f

        or

      4. # kusurc /etc/rc.kusu.d/S04KusuRsyslogMaster.rc.py

        # kusu-addhost -u

    [ Top ]


    Configuring Platform ISF AC

    Important: Read this section only if you want to use the Platform ISF AC kit. This section includes the following topics:

    About Platform ISF AC Kit

    The Platform Infrastructure Sharing Facility, Adaptive Cluster (Platform ISF AC) kit version 1.0 installs the Platform ISF AC multiboot tool that dynamically changes resources into HPC cloud computing environement. This allows you to switch between operating systems installed on a compute node. These compute nodes will be added to the newly-created multiboot node groups.

    Platform ISF AC supports PCM 2.0.1 RHEL, CentOS, and SLES distribution versions. Compute nodes in the dualboot node groups with the correct partitioning schema will be able to boot from either Linux (when moved into Dualboot_Linux) or windows (when moved into Dualboot_Windows).


    Important: Note the following information when using the Platform ISF AC:

    • All compute nodes added to a multiboot node group must have identical partition layouts. The compute node is supported by kusu-power.
    • By default, ISF AC disk provisioning rules preserve everything unless told otherwise. You should remove the following: all ext2 partitions, all ext3 partitions, and all swap partitions.
    • Platform ISF AC in PCM2.0.1 just provides the dual boot mechanism on compute nodes.
    • Platform ISF AC is not an independent kit.This kit is based on Platform LSF. If Platform LSF is not installed, Platform ISF AC will not work.
    • If you are installing Platform ISF AC kit separately, remember to install it after the Platform LSF kit v7.0.6 and Platform Console kit v2.0 (GUI).
    • After Platform ISF AC kit is installed, Dualboot_Windows and Dualboot_Linux node groups are created by default. These two node groups are known as ‘multiboot’.
    • The Dualboot_Windows and Dualboot_Linux node groups can only be modified with the following: the node group name, node group description, node name format, boot partition, and networks of ‘multiboot’ node group.
    • It is recommended to keep the same node name format with the format of packaged compute node group.
    • The compute component of Platform LSF kit is associated with the the multiboot node groups. These are set to join the LSF cluster that the packaged compute node group belongs to.
    • Moving nodes from packaged node group to multiboot node group only is supported.
    • When updating dualboot node group nodes, you must create the links of configuration files manually (such as /etc/passwd, fstab.append) under /etc/cfm/dualboot_node group_name/.

      The better and quicker method to create all the links is copy all the files under /etc/cfm/xxx/ (xxx should be packaged node group) to /etc/cfm/dualboot_node group_name/.

      Example: #cp -rf /etc/cfm/compute-rhel-5.4-x86_64/* /etc/cfm/Dualboot_Linux/


    Install this kit using kusu-kitops or kusu-kit-install on the CLI or using the "One Step Install" function on the Platform Management Console. When installing, see the "Install a kit" section in any of the bundled kits' (such as Base, Nagios, etc.) documentation for details.

    Similarly, see the "Remove a kit" section from any of the standard kits for details on uninstalling the Platform ISF AC kit.


    Attention:Hostname and IP for compute nodes will no longer be preserved if Platform ISF AC is uninstalled. To completely remove the Platform ISF-AC kit, ensure that there are no nodes in the multiboot node group before disassociating the kit from the installer node group. The node group cannot be removed if there are nodes in the multiboot node group.


    This kit is made up of the following:

    • kit-platform-isf-ac - kit rpm
    • component-platform-isf-ac - node component that is associated to the installer node by default and associates the Platform MPI software to relevant node groups
    • platform-isf-ac rpm - contains the kitinfo file and is associated to installer node group by default

    When the workload-driven policy is enabled, Platform ISF AC is automatic. This enables dynamic allocation of physical nodes within LSF clusters to match the OS configuration of nodes with workload demand.

    To avoid low cluster utilization due to dedicated cluster templates (which includes OS, middleware, and configuration data), node groups can be rebooted to different templates based on workload demand and Administrator manual operation

    The Node Personality Manager (NPM) is the key service within Platform ISF AC that interfaces with LSF and triggers switching of hosts according to manual or workload triggers. The NPM determines when the OS switching is necessary and will let LSF select which machine to switch.

    The Platform ISF AC GUI is more interactive and enables Administrator to view the templates of hosts, configure and monitor workload-driven policies for switching, and view reports about the activities that have taken place.

    Use Platform ISF AC

    For using Platform ISF AC, you need to modify LSF files. Platform Cluster Manager dynamically changes LSF configuration files when changes are made to the cluster. The administrator has to enable LSF features by modifying the template files.

    1. Log on to the installer node as root.

      Edit/etc/cfm/templates/lsf/default.lsf.conf

      Add LSF_USER_DOMAIN=.


      Note: The dot (.) in steps 1 and 2 specifies the Windows local host, not a domain. If you want to use Windows domain, refer to "Using Platform LSF on Windows" documentation from the LSF documentation package


    2. Edit /etc/cfm/templates/lsf/default.lsf.cluster and change:

    3. Administrators = lsfadmin .\lsfadmin

    4. Run kusu-addhost -u to apply changes.

    5. Restart the LSF daemons.

      lsf_daemons stop

      lsf_daemons start

    Configure compute nodes

    Configure and prepare compute hosts for Platform ISF AC.

    Note: You can find the dynamic_provision queue in the lsb.queues file and the Platform ISF AC configuration file at the /opt/lsf/conf/npm/npm.conf location. If it is required, you can use these files later on.


    Set BMC network

    Follow these steps to set up BMC network to support out of band management for configuring ipmi.:

    1. Create a new network, using the kusu-netedit tool, having the following:

    a) "bmc" as device

    b) same network as the provision interface, but different Starting IP or Subnet to divide the single BMC network into two different networks (as in the case of multiple node groups associating with single BMC network to avoid disorder of BMC IPs and hostname)

    c) "provision" as the interface type

    2. Associate this new bmc network with the appropriate node group.


    Note: Node groups having several rack numbers may cause disorder in BMC IPs and hostsname. In such cases, disassociate the BMC network first and then associate it again to make them in order.

    3. (Re)provision the node group.

    The BMC will then be automatically configured on the nodes (IP, username,and password).

    It is not recommended to add nodes into packaged node group before adding the BMC network to it. In such case, use: kusu-boothost -n -r . Note that you may need to manually reboot some nodes.

    4. Once the node is up, determine the status of the bmc device by using ipmitool from the installer node (as the root user):

    ipmitool -I lan -H -U kusu-ipmi -P `cat /opt/kusu/etc/.ipmi.passwd` chassis power status


    Note:Once the chassis power status is determined, the GUI can also successfully obtain the heat map information.

Add a host to your cluster

You must have root access to the installer node. The host must be properly connected to the private network and configured to PXE boot from the network before local media. You could also add hosts to a node group using an imaged installation or by adding diskless compute nodes.

Ensure that no other elements on the private network are providing DHCP services (like a router).

Adding a host into imaged node group will not preserve your Windows partition.

Add one or more new hosts to existing node groups, which are logical groupings of hosts that share properties. Platform Cluster Manager includes default node group definitions for different types of compute hosts. For example, the node group compute-rhel5.3-5-x86_64 is a default node group. Hosts added to this node group have the RHEL 5.3 operating system installed.

Login to the the Platform Management Console as LSF Administrator to add hosts and manage workloads (submit, monitor, and manage jobs).

Important: Only package node group preserves Windows partitions. If you want to use multiboot host feature, add the host into packaged node group (for example, compute-rhel5.3-5-x86_64).

Adding a host into imaged node group will not preserve your Windows partition.

Multiboot hosts

Add one or more hosts that have multiple OS to the appropriate node groups.

When you want to enable multiboot on nodes, move it first from packaged compute node group to a multiboot node group and then move it to other node groups. For example, if a host has Windows and Linux is installed by PCM, move this host to Dualboot_Windows first before moving to Dualboot_Linux.

Login to Platform Management Console (PMC) as admin user and switch to Multiboot tab to check if the node has been set to Windows and Linux personalities.

Naming conventions for hosts

The default naming convention for hosts is compute-#RR-#NN, where #RR is the two-digit rack number given when adding the host and #NN is a unique two-digit node number in the rack automatically assigned.

Platform ISF AC supports hostname preservation in node groups and preserves hostname for a compute node that uses the previous compute node with the same mac address. For packaged node groups, moving a node between these node groups does not change the node name if the two node groups have the same name format. If the name formats are different, the hostname is changed.

For example, multiboot node group or package node group is copied then changed its name to comp#NN::

NG1 (Compute-rhel5.3-x86-64) - compute-#RR-#NN

NG2 (Compute_copied) - comp#NN

NG3 (Dualboot_Windows) - compute-RR-NN

If you move a host named as compute-01-00 to compute_copied, the hostname can be changed to comp00. If the host is moved back to Compute-rhel5.3-x86-64 or Dualboot_Windows, the hostname is still named compute-01-00 since the node groups have the same name format.

Set up a mixed dynamic provisioning cluster for LSF


IMPORTANT: Removing Platform LSF from the installer node impacts Platform ISF AC. The Platform ISF AC kit creates two multiboot node groups (Dualboot_Windows and Dualboot_Linux).

The default boot partition of Dualboot_Linux is hd0 2 (such as the second partitition). If Windows 2008 or Windows 7 is initially installed, Windows installation will create a 100M partition for system preservation and install Windows on the second partition. Windows will boot from the first partition.

When you provision Linux OS on this machine, the boot will use the 3rd partition. Thus, you have to change the node group template Dualboot_Linux to set the boot partition as hd0 3. However, there is no need to change Dualboot_Windows.


Follow these steps to set up a mixed (Windows and Linux) dynamic provisioning cluster for Platform LSF:

1. Install the PCM 2.0.1 installer.

PCM installer will be the LSF master node. Refer to the PCM Installation guide for instalation details.

2. Prepare a host with Windows OS on the first disk. Partition then allocate enough disk space (>40G) for Linux. Configure the node to boot from the NIC that is attached to provision network.

3. Add the node with Windows OS into packaged compute node group of the PCM cluster (compute-rhel5.3-5-x86_64):

a) Run kusu-addhost.

b) Reboot or start the node.PCM automatically installs Linux. Note that the node after rebooting is Windows.

4. Move the node from packaged compute node group (compute-rhel5.3-5-x86_64) into Dualboot_Linux node group.

5. Move the node from Dualboot_Linux node group to Dualboot_Windows node group, to make Platform ISF AC know that the node has Windows personality.

The node does not reboot and the node is still Windows.

6.Apply the Platform LSF license on the PCM installer node.

For this release, it is recommended that you use the kusu-license-tool. Refer to the "Apply license using kusu-license-tool" section in the Installation Guide for details.

7.Install Platform LSF on Windows then change Windows hostname using the steps on mixed LSF cluster installation:

a) Create local user "lsfadmin" and set its password.

b) Add .\lsfadmin into Administrators group.

c) Double-click the installation package to launch the install wizard.

d) Join an LSF cluster as a server host, the master name should be the name of installer node.

e) Customize the installation and do not enable EGO. The service starter and execute user should be '.\lsfadmin'.

f) Choose start LSF services after installation.

8. Configure LSF mixed cluster on PCM installer:

a) Add 'LSF_USER_DOMAIN=.' in /etc/cfm/templates/lsf/default.lsf.conf file.

b) Change 'Administrators=lsfadmin' to 'Administrators=lsfadmin .\lsfadmin' in /etc/cfm/templates/lsf/default.lsf.cluster in file.

c) Run kusu-addhost -u to regenerate the configuration files.


Note: You need to manually change the Windows hostname to the name that is assigned by PCM then reboot the Windows OS.

After step 8 is done, reboot the host. The LSF services should be started and the host should join into the cluster.Check using any of these commands:lsid, lsload, and bhosts. Then on a cmd shell, run lspasswd -u .\lsfadmin -p


9. Login to PCM GUI with root and click the 'Host Repurposing' in Cluster -> Cluster Inventory to check if the node is available in Platform ISF AC.

[ Top ]


Get technical support

Contact Platform Computing for technical support in one of these ways:

Web Portal eSupport

You can take advantage of our web-based self-support available 24 hours per day, 7 days a week ("24x7") by visiting http://www.platform.com .

The Platform eSupport and Support Knowledgebase site enables you to search for solutions, submit your support request, update your request, enquire about your request, as well as download product manuals, binaries and patches.

Email Support

support@platform.com

Telephone Support

Contact information available at http://www.platform.com/services/support


When contacting Platform, please include the full name of your company.

See the Platform web site at http://www.platform.com/Company/Contact.Us.htm for other contact information.

Get patch updates and hotfixes

Obtain the latest patches and hotfixes for PCM 2.0.1 from the following page:

http://my.platform.com/products/platform-cm

To obtain a user name and password, contact Platform Computing technical support at support@platform.com

[ Top ]


Copyright and trademarks

© 1994-2010 Platform Computing Corporation. All Rights Reserved.

Although the information in this document has been reviewed, Platform Computing Corporation ("Platform") does not warrant it to be free of errors or omissions. Platform reserves the right to make corrections, updates, revisions or changes to the information in this document.

UNLESS OTHERWISE EXPRESSLY STATED BY PLATFORM, THE PROGRAM DESCRIBED IN THIS DOCUMENT IS PROVIDED "AS IS" AND WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. IN NO EVENT WILL PLATFORM COMPUTING BE LIABLE TO ANYONE FOR SPECIAL, COLLATERAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING WITHOUT LIMITATION ANY LOST PROFITS, DATA, OR SAVINGS, ARISING OUT OF THE USE OF OR INABILITY TO USE THIS PROGRAM.

Trademarks

LSF®, MPI®, and LSF HPC® are trademarks or registered trademark of Platform Computing Corporation in the United States and in other jurisdictions.

Platform Cluster ManagerTM, ACCELERATING INTELLIGENCETM, PLATFORM COMPUTINGTM, and the PLATFORMTM and PCM logos are trademarks of Platform Computing Corporation in the United States and in other jurisdictions.

Linux® is the registered trademark of Linus Torvalds in the U.S. and other countries.

CentOS® is the registered trademark of Linus Torvalds in the U.S. and other countries.

SLES
® is a registered trademark of Novell Inc.

Red Hat® is a registered trademark of Red Hat, Inc.

Intel® is a registered trademark of Intel Corporation or its subsidiaries in the United States and other countries.

Cisco® Systems is a registered trademark of Cisco Systems, Inc. and/or its affiliates in the U.S. and in other countries

Nagios® and the Nagios Logo are servicemarks, trademarks, registered trademarks owned by or licensed to Ethan Galstad.

Java JRE® is a registered trademark of Sun Microsystems.

NVIDIA®CUDATM is a registered trademark of the NVIDIA Corporation.

Other products or services mentioned in this document are identified by the trademarks or service marks of their respective owners.

[ Top ]


 

Date Modified: July 2010
Platform Computing: www.platform.com

Platform Support: support@platform.com
Platform Information Development: doc@platform.com

1994-2010 Platform Computing Corporation. All rights reserved.