Node Installation Documentation
This document is my log of installing the Microsoft Linux Cluster...
HeadNode
- Install Windows 2012 R2, and HPC Pack 2012 R2 U3 Head Node onto a domain server - I called it fi--didelxhn.
- Create a folder C:\HPCLinux, and create a network share called hpclinux that allow everyone access to it.
copy "%CCP_DATA%InstallShare\LinuxNodeAgent\*.*
in that folder. (setup.py and hpcnodeagent.tar.gz arrive)- Run powershell as admin.
Export-HpcLinuxCertificate –FilePath C:\HPCLinux\cert.pfx
and give it a magic password.- (To make a certificate manually, a script something like the below might do it, but I couldn't make it work...
New-SelfsignedCertificateEx -Subject "CN=Microsoft HPC Linux Communication" -EKU "Server Authentication","Client Authentication" -KeySpec "Signature" -KeyUsage "DigitalSignature,DataEncipherment,KeyEn cipherment,NonRepudiation,KeyCertSign" -SAN "fi--didemrchnb","fi--didemrchnb.dide.local","fi--didemrchnb.dide.ic.ac.uk" -NotAfter 2039/01/01 -StoreLocation "LocalMachine" -exportable
Nodes
Install linux and enable SSH
- I used the normal Ubuntu 14.04 desktop USB, as the others didn't work.
- It all worked pretty smoothly really.
sudo apt-get update
sudo apt-get upgrade
sudo apt-get install openssh-server
Sort out infiniband support
- The cards I used were the old Voltaire ones, so a bit of hacking was needed:-
sudo nano /etc/modules
- and add ib_mthca rdma_ucm ib_umad ib_uverbs ib_ipoib ib_srp ib_sdpsudo modprobe ib_ipoib
sudo nano /etc/network/interfaces
and add the below, where x is the node number+1. (eg, fi--didelx15 should be 12.0.0.16). Don't add anything about eth0 or eth1 or it will break.
auto ib0 iface ib0 inet static address 12.0.0.x netmask 255.255.255.0 broadcast 12.0.0.255
- We may need to disable IPv6.
sudo nano /etc/sysctl.conf
, and add the following somewhere:
net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 net.ipv6.conf.lo.disable_ipv6=1
Add the HPC mount for some useful bits
sudo mkdir -p /hpclinux
sudo apt-get install cifs-utils
sudo mount -t cifs //fi--didelxhn/HPCLinux /hpclinux -o user=adminuser,dom=dide.local
Adding to the domain
Install NTP support
sudo apt-get install ntp
sudo cp /hpclinux/linux_inst/ntp.conf /etc/ntp.conf
- (That sets the only server to be time.imperial.ac.uk)
sudo /etc/init.d/ntp stop
sudo ntpdate time.imperial.ac.uk
sudo /etc/init.d/ntp start
Domain things
sudo apt-get install winbind libpam-winbind libnss-winbind krb5-user krb5-config libpam-krb5
- The domain, when asked, is DIDE.local - case sensitive.
sudo cp /hpclinux/linux_inst/nsswitch.conf /etc/nsswitch.conf
- adds winbind to passwd group, and removes [NOTFOUND=return] from hosts.sudo cp /hpclinux/linux_inst/smb.conf /etc/samba/smb.conf
- lots of config for DIDE.sudo cp /hpclinux/linux_inst/krb5.conf /etc/krb5.conf
- lots more config for DIDE.ifconfig -a
and make note of the IP address if you haven't already.sudo nano /etc/hosts
and replace with:-
127.0.0.1 localhost 129.31.x.y fi--didelx99.dide.local fi--didelx99.dide.ic.ac.uk fi--didelx99 129.31.26.21 fi--didedc1.dide.local fi--didedc1.dide.ic.ac.uk fi--didedc1 129.31.26.171 fi--didedc6.dide.local fi--didedc6.dide.ic.ac.uk fi--didedc6 129.31.26.172 fi--didedc7.dide.local fi--didedc7.dide.ic.ac.uk fi--didedc7
sudo net cache flush
sudo service smbd restart
sudo service nmbd restart
sudo service winbind restart
sudo kinit adminuser@DIDE.LOCAL
sudo net ads join -U adminuser
Preparing drive mounting
sudo apt-get install libpam-mount
sudo cp /hpclinux/linux_inst/pam_mount.conf.xml /etc/security/pam_mount.conf.xml
- this enables looking for .pam_mount_conf.xml in the home folder, and automatically sets up a mount point (on fi--san02) to that folder beforehand.sudo cp /hpclinux/linux_inst/.pam_mount.conf.xml /etc/skel
- for convenience really. Suggest that users copy all the "." files from /etc/skel to their home folder, to get a nice experience when ssh-ing.
Installing HPC
cd /hpclinux
sudo python setup.py -install -clusname:fi--didelxhn -certfile:cert.pfx
(you'll need the magic password).- If you need to reinstall/readd, then
sudo python setup.py -uninstall
and redo the line above.
Securing SSH
sudo usermod -aG sudo user
if you need to add any sudo-ers. (Maybe do this after the domain stuff below...)sudo nano /etc/ssh/sshd_config
if you need to set ssh users.- Add a line
AllowGroups ssh
- Also, be good and add
DenyUsers root
andDenyGroups root
when you've setup sudo-ers. sudo usermod -aG ssh user
to add each user to ssh.sudo service ssh restart
to apply changes. Don't lock yourself out muppet-brain.
- Add a line