Node Installation Documentation: Difference between revisions

From MRC Centre for Outbreak Analysis and Modelling
Jump to navigation Jump to search
Line 20: Line 20:
* <code>sudo apt-get upgrade</code>
* <code>sudo apt-get upgrade</code>
* <code>sudo apt-get install openssh-server</code>
* <code>sudo apt-get install openssh-server</code>
* <code>sudo usermod -aG sudo user</code> if you need to add any sudo-ers.
* <code>sudo nano /etc/ssh/sshd_config</code> if you need to set ssh users.
** Add a line <code>AllowGroups ssh</code>
** Also, be good and add <code>DenyUsers root</code> and <code>DenyGroups root</code> when you've setup sudo-ers.
** <code>sudo usermod -aG ssh user</code> to add each user to ssh.
** <code>sudo service ssh restart</code> to apply changes. Don't lock yourself out muppet-brain.


=== Sort out infiniband support ===
=== Sort out infiniband support ===

Revision as of 12:41, 28 June 2016

This document is my log of installing the Microsoft Linux Cluster...

HeadNode

  • Install Windows 2012 R2, and HPC Pack 2012 R2 U3 Head Node onto a domain server - I called it fi--didelxhn.
  • Create a folder C:\HPCLinux, and create a network share called hpclinux that allow everyone access to it.
  • copy "%CCP_DATA%InstallShare\LinuxNodeAgent\*.* in that folder. (setup.py and hpcnodeagent.tar.gz arrive)
  • Run powershell as admin.
  • Export-HpcLinuxCertificate –FilePath C:\HPCLinux\cert.pfx and give it a magic password.
  • (To make a certificate manually, a script something like the below might do it, but I couldn't make it work...
New-SelfsignedCertificateEx -Subject "CN=Microsoft HPC Linux Communication" -EKU "Server Authentication","Client Authentication" -KeySpec "Signature" -KeyUsage "DigitalSignature,DataEncipherment,KeyEn
cipherment,NonRepudiation,KeyCertSign" -SAN "fi--didemrchnb","fi--didemrchnb.dide.local","fi--didemrchnb.dide.ic.ac.uk" -NotAfter 2039/01/01 -StoreLocation "LocalMachine" -exportable

Nodes

Install linux and enable SSH

  • I used the normal Ubuntu 14.04 desktop USB, as the others didn't work.
  • It all worked pretty smoothly really.
  • sudo apt-get update
  • sudo apt-get upgrade
  • sudo apt-get install openssh-server

Sort out infiniband support

  • The cards I used were the old Voltaire ones, so a bit of hacking was needed:-
  • sudo nano /etc/modules - and add ib_mthca rdma_ucm ib_umad ib_uverbs ib_ipoib ib_srp ib_sdp
  • sudo modprobe ib_ipoib
  • sudo nano /etc/network/interfaces and add the below, where x is the node number+1. (eg, fi--didelx15 should be 12.0.0.16). Don't add anything about eth0 or eth1 or it will break.
auto ib0
iface ib0 inet static
    address 12.0.0.x
    netmask 255.255.255.0
    broadcast 12.0.0.255
  • We may need to disable IPv6.
  • sudo nano /etc/sysctl.conf, and add the following somewhere:
net.ipv6.conf.all.disable_ipv6=1
net.ipv6.conf.default.disable_ipv6=1
net.ipv6.conf.lo.disable_ipv6=1

Add the HPC mount for some useful bits

  • sudo mkdir -p /hpclinux
  • sudo apt-get install cifs-utils
  • sudo mount -t cifs //fi--didelxhn/HPCLinux /hpclinux -o user=adminuser,dom=dide.local

Adding to the domain

Install NTP support

  • sudo apt-get install ntp
  • sudo cp /hpclinux/linux_inst/ntp.conf /etc/ntp.conf
  • (That sets the only server to be time.imperial.ac.uk)
  • sudo /etc/init.d/ntp stop
  • sudo ntpdate time.imperial.ac.uk
  • sudo /etc/init.d/ntp start

Domain things

  • sudo apt-get install winbind libpam-winbind libnss-winbind krb5-user krb5-config libpam-krb5
  • The domain, when asked, is DIDE.local - case sensitive.
  • sudo cp /hpclinux/linux_inst/nsswitch.conf /etc/nsswitch.conf - adds winbind to passwd group, and removes [NOTFOUND=return] from hosts.
  • sudo cp /hpclinux/linux_inst/smb.conf /etc/samba/smb.conf - lots of config for DIDE.
  • sudo cp /hpclinux/linux_inst/krb5.conf /etc/krb5.conf - lots more config for DIDE.
  • ifconfig -a and make note of the IP address if you haven't already.
  • sudo nano /etc/hosts and replace with:-
127.0.0.1    localhost
129.31.x.y   fi--didelx99.dide.local fi--didelx99.dide.ic.ac.uk fi--didelx99
  • sudo net cache flush
  • sudo service smbd restart
  • sudo service nmbd restart
  • sudo service winbind restart
  • sudo kinit adminuser@DIDE.LOCAL
  • sudo net ads join -U adminuser

Preparing drive mounting

  • sudo apt-get install libpam-mount
  • sudo cp /hpclinux/linux_inst/pam_mount.conf.xml /etc/security/pam_mount.conf.xml - this enables looking for .pam_mount_conf.xml in the home folder, and automatically sets up a mount point (on fi--san02) to that folder beforehand.
  • sudo cp /hpclinux/linux_inst/.pam_mount.conf.xml /etc/skel - for convenience really. Suggest that users copy all the "." files from /etc/skel to their home folder, to get a nice experience when ssh-ing.
  • The home folder is set to /media/home, and automatically mounts \\fi--san02\homes\username. Users should edit .pam_mount.conf.xml in their home folder, and add the volumes they want mounted. For example:-
<?xml version="1.0" encoding="utf-8" ?>
<pam_mount>
  <volume options="nodev,nosuid" user="*" mountpoint="/media/f2gsim" path="GlobalSim" server="fi--didef2.dide.ic.ac.uk" fstype="cifs" />
  <volume options="vers=2.1,nodev,nosuid" user="*" mountpoint="/media/nas1gsim" path="Test" server="fi--didenas1.dide.ic.ac.uk" fstype="cifs" />
</pam_mount>

For some reason, NAS1 needs vers=2.1, whereas the other drives don't. No-one knows why.

Installing HPC

  • cd /hpclinux
  • sudo python setup.py -install -clusname:fi--didelxhn -certfile:cert.pfx (you'll need the magic password).
  • If you need to reinstall/readd, then sudo python setup.py -uninstall and redo the line above.