Node Installation Documentation: Difference between revisions
Jump to navigation
Jump to search
| Line 77: | Line 77: | ||
127.0.0.1 localhost | 127.0.0.1 localhost | ||
129.31.x.y fi--didelx99.dide.local fi--didelx99.dide.ic.ac.uk fi--didelx99 | 129.31.x.y fi--didelx99.dide.local fi--didelx99.dide.ic.ac.uk fi--didelx99 | ||
129.31.26.137 fi--didelxhn.dide.local fi--didelxhn.dide.ic.ac.uk fi--didelxhn | |||
129.31.26.21 fi--didedc1.dide.local fi--didedc1.dide.ic.ac.uk fi--didedc1 | 129.31.26.21 fi--didedc1.dide.local fi--didedc1.dide.ic.ac.uk fi--didedc1 | ||
129.31.26.171 fi--didedc6.dide.local fi--didedc6.dide.ic.ac.uk fi--didedc6 | 129.31.26.171 fi--didedc6.dide.local fi--didedc6.dide.ic.ac.uk fi--didedc6 | ||
129.31.26.172 fi--didedc7.dide.local fi--didedc7.dide.ic.ac.uk fi--didedc7 | 129.31.26.172 fi--didedc7.dide.local fi--didedc7.dide.ic.ac.uk fi--didedc7 | ||
</pre> | </pre> | ||
I'm not sure strictly why some of these (eg, fi--didelxhn) needs adding, since Ubuntu can already ping fi--didelxhn in all of those forms. However, without adding that, an HttpException occurs when adding the node to the cluster, so this is the non-entirely understood workaround. | |||
* <code>sudo net cache flush</code> | * <code>sudo net cache flush</code> | ||
* <code>sudo service smbd restart</code> | * <code>sudo service smbd restart</code> | ||
Revision as of 11:04, 23 August 2017
This document is my log of installing the Microsoft Linux Cluster...
HeadNode
- Install Windows 2012 R2, and HPC Pack 2012 R2 U3 Head Node onto a domain server - I called it fi--didelxhn.
- Create a folder C:\HPCLinux, and create a network share called hpclinux that allow everyone access to it.
copy "%CCP_DATA%InstallShare\LinuxNodeAgent\*.*in that folder. (setup.py and hpcnodeagent.tar.gz arrive)- Run powershell as admin.
Export-HpcLinuxCertificate –FilePath C:\HPCLinux\cert.pfxand give it a magic password.- (To make a certificate manually, a script something like the below might do it, but I couldn't make it work...
New-SelfsignedCertificateEx -Subject "CN=Microsoft HPC Linux Communication" -EKU "Server Authentication","Client Authentication" -KeySpec "Signature" -KeyUsage "DigitalSignature,DataEncipherment,KeyEn cipherment,NonRepudiation,KeyCertSign" -SAN "fi--didemrchnb","fi--didemrchnb.dide.local","fi--didemrchnb.dide.ic.ac.uk" -NotAfter 2039/01/01 -StoreLocation "LocalMachine" -exportable
Nodes
Install linux and enable SSH
- I used the normal Ubuntu 14.04 desktop USB, as the others didn't work.
- It all worked pretty smoothly really.
sudo apt-get updatesudo apt-get upgradesudo apt-get install openssh-server
Sort out infiniband support
- The cards I used were the old Voltaire ones, so a bit of hacking was needed:-
sudo nano /etc/modules- and add ib_mthca rdma_ucm ib_umad ib_uverbs ib_ipoib ib_srp ib_sdpsudo modprobe ib_ipoibsudo nano /etc/network/interfacesand add the below, where x is the node number+1. (eg, fi--didelx15 should be 12.0.0.16).
auto eth0
iface eth0 inet dhcp
metric 100
auto eth1
iface eth1 inet dhcp
metric 101
auto ib0
iface ib0 inet static
address 12.0.0.x
netmask 255.255.255.0
broadcast 12.0.0.255
metric 102
- This assumes that eth0 is the enterprise network (129.31.26.x) and eth1 is the private (11.0.0.x) networks.
- We may need to disable IPv6.
sudo nano /etc/sysctl.conf, and add the following somewhere:
net.ipv6.conf.all.disable_ipv6=1 net.ipv6.conf.default.disable_ipv6=1 net.ipv6.conf.lo.disable_ipv6=1
Add the HPC mount for some useful bits
sudo mkdir -p /hpclinuxsudo apt-get install cifs-utilssudo mount -t cifs //fi--didelxhn/HPCLinux /hpclinux -o user=adminuser,dom=dide.local
Adding to the domain
Install NTP support
sudo apt-get install ntpsudo cp /hpclinux/linux_inst/ntp.conf /etc/ntp.conf- (That sets the only server to be time.imperial.ac.uk)
sudo /etc/init.d/ntp stopsudo apt install ntpdatesudo ntpdate time.imperial.ac.uksudo /etc/init.d/ntp start
Domain things
sudo apt-get install winbind libpam-winbind libnss-winbind krb5-user krb5-config libpam-krb5- The domain, when asked, is DIDE.local - case sensitive.
sudo cp /hpclinux/linux_inst/nsswitch.conf /etc/nsswitch.conf- adds winbind to passwd group, and removes [NOTFOUND=return] from hosts.sudo cp /hpclinux/linux_inst/smb.conf /etc/samba/smb.conf- lots of config for DIDE.sudo cp /hpclinux/linux_inst/krb5.conf /etc/krb5.conf- lots more config for DIDE.ifconfig -aand make note of the IP address if you haven't already.sudo nano /etc/hostsand replace with:-
127.0.0.1 localhost 129.31.x.y fi--didelx99.dide.local fi--didelx99.dide.ic.ac.uk fi--didelx99 129.31.26.137 fi--didelxhn.dide.local fi--didelxhn.dide.ic.ac.uk fi--didelxhn 129.31.26.21 fi--didedc1.dide.local fi--didedc1.dide.ic.ac.uk fi--didedc1 129.31.26.171 fi--didedc6.dide.local fi--didedc6.dide.ic.ac.uk fi--didedc6 129.31.26.172 fi--didedc7.dide.local fi--didedc7.dide.ic.ac.uk fi--didedc7
I'm not sure strictly why some of these (eg, fi--didelxhn) needs adding, since Ubuntu can already ping fi--didelxhn in all of those forms. However, without adding that, an HttpException occurs when adding the node to the cluster, so this is the non-entirely understood workaround.
sudo net cache flushsudo service smbd restartsudo service nmbd restartsudo service winbind restartsudo kinit adminuser@DIDE.LOCALsudo net ads join -U adminuser
Preparing drive mounting
sudo apt-get install libpam-mountsudo cp /hpclinux/linux_inst/pam_mount.conf.xml /etc/security/pam_mount.conf.xml- this enables looking for .pam_mount.conf.xml in the home folder, and automatically sets up a mount point (on fi--san02) to that folder beforehand.sudo cp /hpclinux/linux_inst/.pam_mount.conf.xml /etc/skel- for convenience really. Suggest that users copy all the "." files from /etc/skel to their home folder, to get a nice experience when ssh-ing.
Installing HPC
cd /hpclinuxsudo ./install_filters.shsudo python setup.py -install -clusname:fi--didelxhn -certfile:cert.pfx(you'll need the magic password).- If you need to reinstall/readd, then
sudo python setup.py -uninstalland redo the line above.
Securing SSH
sudo usermod -aG sudo userif you need to add any sudo-ers. (Maybe do this after the domain stuff below...)sudo nano /etc/ssh/sshd_configif you need to set ssh users.- Add a line
AllowGroups ssh - Also, be good and add
DenyUsers rootandDenyGroups rootwhen you've setup sudo-ers. sudo usermod -aG ssh userto add each user to ssh.sudo service ssh restartto apply changes. Don't lock yourself out muppet-brain.
- Add a line