Node Installation Documentation
This document is my log of installing the Microsoft Linux Cluster...
HeadNode
- Install Windows 2012 R2, and HPC Pack 2012 R2 U3 Head Node onto a domain server - I called it fi--didelxhn.
- Create a folder C:\HPCLinux, and create a network share called hpclinux that allow everyone access to it.
copy "%CCP_DATA%InstallShare\LinuxNodeAgent\*.*in that folder. (setup.py and hpcnodeagent.tar.gz arrive)- Run powershell as admin.
Export-HpcLinuxCertificate –FilePath C:\HPCLinux\cert.pfxand give it a magic password.- (To make a certificate manually, a script something like the below might do it, but I couldn't make it work...
New-SelfsignedCertificateEx -Subject "CN=Microsoft HPC Linux Communication" -EKU "Server Authentication","Client Authentication" -KeySpec "Signature" -KeyUsage "DigitalSignature,DataEncipherment,KeyEn cipherment,NonRepudiation,KeyCertSign" -SAN "fi--didemrchnb","fi--didemrchnb.dide.local","fi--didemrchnb.dide.ic.ac.uk" -NotAfter 2039/01/01 -StoreLocation "LocalMachine" -exportable
Nodes
Install linux and enable SSH
- I used the normal Ubuntu 14.04 desktop USB, as the others didn't work.
- It all worked pretty smoothly really.
sudo apt-get updatesudo apt-get upgradesudo apt-get install openssh-server
Sort out infiniband support
- The cards I used were the old Voltaire ones, so a bit of hacking was needed:-
sudo nano /etc/modules- and add ib_mthca rdma_ucm ib_umad ib_uverbs ib_ipoib ib_srp ib_sdpsudo modprobe ib_ipoibsudo nano /etc/network/interfacesand add the below, where x is the node number+1. (eg, fi--didelx15 should be 12.0.0.16). Don't add anything about eth0 or eth1 or it will break.
auto ib0
iface ib0 inet static
address 12.0.0.x
netmask 255.255.255.0
broadcast 12.0.0.255
- We may need to disable IPv6.
sudo nano /etc/sysctl.conf, and add the following somewhere:
net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 net.ipv6.conf.lo.disable_ipv6=1
Add the HPC mount for some useful bits
sudo mkdir -p /hpclinuxsudo apt-get install cifs-utilssudo mount -t cifs //fi--didelxhn/HPCLinux /hpclinux -o user=adminuser,dom=dide.local
Adding to the domain
Install NTP support
sudo apt-get install ntpsudo cp /hpclinux/linux_inst/ntp.conf /etc/ntp.conf- (That sets the only server to be time.imperial.ac.uk)
sudo /etc/init.d/ntp stopsudo ntpdate time.imperial.ac.uksudo /etc/init.d/ntp start
Domain things
sudo apt-get install winbind libpam-winbind libnss-winbind krb5-user krb5-config libpam-krb5- The domain, when asked, is DIDE.local - case sensitive.
sudo cp /hpclinux/linux_inst/nsswitch.conf /etc/nsswitch.conf- adds winbind to passwd group, and removes [NOTFOUND=return] from hosts.sudo cp /hpclinux/linux_inst/smb.conf /etc/samba/smb.conf- lots of config for DIDE.sudo cp /hpclinux/linux_inst/krb5.conf /etc/krb5.conf- lots more config for DIDE.ifconfig -aand make note of the IP address if you haven't already.sudo nano /etc/hostsand replace with:-
127.0.0.1 localhost 129.31.x.y fi--didelx99.dide.local fi--didelx99.dide.ic.ac.uk fi--didelx99 129.31.26.21 fi--didedc1.dide.local fi--didedc1.dide.ic.ac.uk fi--didedc1 129.31.26.171 fi--didedc6.dide.local fi--didedc6.dide.ic.ac.uk fi--didedc6 129.31.26.172 fi--didedc7.dide.local fi--didedc7.dide.ic.ac.uk fi--didedc7
sudo net cache flushsudo service smbd restartsudo service nmbd restartsudo service winbind restartsudo kinit adminuser@DIDE.LOCALsudo net ads join -U adminuser
Preparing drive mounting
sudo apt-get install libpam-mountsudo cp /hpclinux/linux_inst/pam_mount.conf.xml /etc/security/pam_mount.conf.xml- this enables looking for .pam_mount_conf.xml in the home folder, and automatically sets up a mount point (on fi--san02) to that folder beforehand.sudo cp /hpclinux/linux_inst/.pam_mount.conf.xml /etc/skel- for convenience really. Suggest that users copy all the "." files from /etc/skel to their home folder, to get a nice experience when ssh-ing.
Installing HPC
cd /hpclinuxsudo python setup.py -install -clusname:fi--didelxhn -certfile:cert.pfx(you'll need the magic password).- If you need to reinstall/readd, then
sudo python setup.py -uninstalland redo the line above.
Securing SSH
sudo usermod -aG sudo userif you need to add any sudo-ers. (Maybe do this after the domain stuff below...)sudo nano /etc/ssh/sshd_configif you need to set ssh users.- Add a line
AllowGroups ssh - Also, be good and add
DenyUsers rootandDenyGroups rootwhen you've setup sudo-ers. sudo usermod -aG ssh userto add each user to ssh.sudo service ssh restartto apply changes. Don't lock yourself out muppet-brain.
- Add a line