Difference between revisions of "H: drive on cluster"

From wiki
Jump to: navigation, search
(MiSeq Data Area Backup)
 
(9 intermediate revisions by one other user not shown)
Line 1: Line 1:
 
= Introduction =
 
= Introduction =
  
The H: is principally the following St. Andrews network drive:
+
The H: drive actually refers to the cfs.st-andrews.ac.uk machine.
 +
 
 +
NB: A major issue with any networking facilities in the University of St Andrews network, is that only minimum services are unblocked as by default per security policy. Facilities and ports may only be unblocked if specific requests are made to I.T. Services.
 +
 
 +
Some older machines may have quite an unblocked panorama, as over the years requests have made to unblock certain ports. However, any new service has a high likelihood of not working due to ports being blocked, not because of bad configuration or malfunction, but because of policy.
 +
 
 +
Accepting the importance of this point will help curtail the amount of exhaustive testing that sometime occurs when troubleshooting these issues.
 +
 
 +
Due the MS Windows nature of much of the St. Andrews network, and because H: is the windows device name it usually falls under, the H: drive is the name for the following St. Andrews network drive:
  
 
  //cfs.st-andrews.ac.uk/shared/Med_Research/res
 
  //cfs.st-andrews.ac.uk/shared/Med_Research/res
 
  
 
One can of course simply copy files over to the cluster from the H: drive, but for large datasets, this is costly in terms of diskspace. An viable alternative is to "mount" this network drive on marvin, which avoids this duplication. WHen set up this is bette than copying because the directory appears to be available locally.
 
One can of course simply copy files over to the cluster from the H: drive, but for large datasets, this is costly in terms of diskspace. An viable alternative is to "mount" this network drive on marvin, which avoids this duplication. WHen set up this is bette than copying because the directory appears to be available locally.
Line 10: Line 17:
 
However., mounting H: depends on individual authentication, and so is not easy to mount system wide. Every user, if they want it, must do it manually. This also means that it cannot be tested without the cooperation of the user, who must enter their ID and password.
 
However., mounting H: depends on individual authentication, and so is not easy to mount system wide. Every user, if they want it, must do it manually. This also means that it cannot be tested without the cooperation of the user, who must enter their ID and password.
  
So, it's is not an entirely easy thing to do, so several methods are presented.
+
So, as it's not an entirely easy thing to do, several methods are presented.
  
= Several methods ==
+
= Several methods =
  
 
==MiSeq Data Area Backup==
 
==MiSeq Data Area Backup==
  
A backup of MiSeq Data Area exists on the 138.251.175.12 machine and is synchronised regularly. It can be found at
+
Probably the only details a user now (Nov 2017) need know is that a mirror of the MiSeq Data Area is now held and updated nightly on marvin, and visible to all nodes (and therefore can be used on all queues at:
  138.251.175.12:/mnt/vst2/MiSeq_Data_Area
+
 
 +
/shelf/MiSeq_Data_Area_Backup
 +
 
 +
The actual procedure is very clunky, but the automatisation via cron jobs now makes the clunkiness transparent. Continue to read for all the gory details
 +
 
 +
* A backup of MiSeq Data Area exists on the 138.251.175.12 machine and is synchronised with H: drive everyday at midnight (by the ~/nutria/mylocal/bin/crot_.sh script).
 +
* This backup can then be mounted (only in user's kap6 area) via
 +
 
 +
  sshfs kap6@138.251.175.12:/mnt/vst2/MiSeq_Data_Area ~/miseqdabkmpt
 +
* Every night at 02:20, a user kap6 cronjob mounts this backup on marvin and performs and rsync to /shelf/MiSeq_Data_Area_Backup (using the script /storage/home/users/kap6/bin/ensuremsdabk.sh).
 +
* The log of this rsync is held in kap6 home directory, in folder mountpointlogs, i.e.
 +
 
 +
/storage/home/users/kap6/mountpointlogs/mt_to_shelf.txt
 +
 
 +
This is visible to member of the miseq0 group.
  
A simple way to mount this is via SSHFS
+
kap6 has the alias "miseqCopy" that will run this script if a copy is needed more urgently.
  
sshfs <username>@138.251.175.12:/mnt/vst2/MiSeq_Data_Area ~/mnt/MiSeq_Data_Backup
+
=== why only kap6 user? ===
  
This will mount the backup on the users directory (which needs to be already created of course) with the name '''mnt/MiSeq_Data_Backup'''. It has the advantage of being quite an easy command with no options. Other methods are not so simple.
+
A USERID is required on the 138.251.175.12 for this to work, and the users public key on marvin also needs to be recorded in the user's home on this machine.  
  
However a USERID is required on the 138.251.175.12 for this to work, and the users public key on marvin also needs to be recorded in the user's home on this machine. So it is a limited solution.
+
However, the /shelf/MiSeq_Data_Area_Backup is visible to all members of the miseq0 group, user kap6 is only carrying out the mirroring, and at night everyday, so this is a sufficient solution.
 +
 
 +
== Unmounting ==
  
 
To umount, the '''fusermount''' command may be used with the '''-u''' option, like so:
 
To umount, the '''fusermount''' command may be used with the '''-u''' option, like so:
  
 
  fusermount -u ~/mnt/MiSeq_Data_Backup
 
  fusermount -u ~/mnt/MiSeq_Data_Backup
 +
 +
NOTE: This mounts the directory inside a user's home directory, but the user's home directory is in turn exported via NFS to the nodes. These are two entirely different technologies and besides not being available on the nodes, the SSHFS is likely causing some trouble inside the NFS protocol which may lead to some instability. However, the trouble appears to be minor, so this solution, while ugly, is viable.
  
 
== The GVFS method ==
 
== The GVFS method ==
Line 41: Line 66:
 
This means when working with the raw data, only the marvin.q can be used.
 
This means when working with the raw data, only the marvin.q can be used.
  
= procedure =
+
= Administration Aspects =
  
 +
== Tests ==
  
 +
Note that smbclient (SAMBA's ftp-type client) is able to work well and navigate folders fine.
  
= Admin Aspects =
+
Debugging level can increased according to following [https://access.redhat.com/solutions/354703 link].
  
== Environment ==
+
== GVFS Environment ==
  
 
GVFS is part of the Gnome mega project.  
 
GVFS is part of the Gnome mega project.  
Line 62: Line 89:
  
 
This definitely appears to have the desired effect, as can be seen from this interaction:
 
This definitely appears to have the desired effect, as can be seen from this interaction:
 
  
 
  [root@marvin etc]# psg gdm
 
  [root@marvin etc]# psg gdm
Line 183: Line 209:
 
     -o from_code=CHARSET  original encoding of file names (default: UTF-8)
 
     -o from_code=CHARSET  original encoding of file names (default: UTF-8)
 
     -o to_code=CHARSET      new encoding of the file names (default: UTF-8)
 
     -o to_code=CHARSET      new encoding of the file names (default: UTF-8)
 +
 +
== Debian Jessie mounts it, Redhat doesn't ==
 +
 +
(Red Hat appears to mount H-drive, but cannot get into any of the subdirectories, except ''hallsport''. The command ''getcifsacl'' fails on Med_Research. It seems to be a wrapper for getxattr (GET eXternal ATTRibute). But all this may be saying the smae thing really, that the subdirs are just not visible.
 +
 +
Debian Linux has no problem mounting H-drive. ''mount'' reports its options as the following
 +
noauto,users,rw,credentials=/storage/home/users/ramon/.smbcredentials,nosuid,nodev,noexec,relatime,vers=1.0,sec=ntlm,cache=strict,uid=0,nofo    rceuid,gid=0,noforcegid,file_mode=0755,dir_mode=0755,nounix,serverino,mapposix,rsize=61440,wsize=65536,echo_interval=60,actimeo=1
 +
 +
Maybe these defaults are necessary? Attempt made but failed.
 +
 +
As well as version differences, there is the issue of the CIFS kernel module. There is a vague remembrance of this having worked before, and it's possible that there is a bug in the latest kernel, which is nasty.
 +
 +
Latest action on this was to post a description of the problem on the RedHat Customer Portal.
 +
As may be expected, Redhat has a less recent version of ''cifs-utils'' and, indeed, ''mount'' (which belongs to ''util-linux'').

Latest revision as of 08:31, 3 May 2018

Introduction

The H: drive actually refers to the cfs.st-andrews.ac.uk machine.

NB: A major issue with any networking facilities in the University of St Andrews network, is that only minimum services are unblocked as by default per security policy. Facilities and ports may only be unblocked if specific requests are made to I.T. Services.

Some older machines may have quite an unblocked panorama, as over the years requests have made to unblock certain ports. However, any new service has a high likelihood of not working due to ports being blocked, not because of bad configuration or malfunction, but because of policy.

Accepting the importance of this point will help curtail the amount of exhaustive testing that sometime occurs when troubleshooting these issues.

Due the MS Windows nature of much of the St. Andrews network, and because H: is the windows device name it usually falls under, the H: drive is the name for the following St. Andrews network drive:

//cfs.st-andrews.ac.uk/shared/Med_Research/res

One can of course simply copy files over to the cluster from the H: drive, but for large datasets, this is costly in terms of diskspace. An viable alternative is to "mount" this network drive on marvin, which avoids this duplication. WHen set up this is bette than copying because the directory appears to be available locally.

However., mounting H: depends on individual authentication, and so is not easy to mount system wide. Every user, if they want it, must do it manually. This also means that it cannot be tested without the cooperation of the user, who must enter their ID and password.

So, as it's not an entirely easy thing to do, several methods are presented.

Several methods

MiSeq Data Area Backup

Probably the only details a user now (Nov 2017) need know is that a mirror of the MiSeq Data Area is now held and updated nightly on marvin, and visible to all nodes (and therefore can be used on all queues at:

/shelf/MiSeq_Data_Area_Backup

The actual procedure is very clunky, but the automatisation via cron jobs now makes the clunkiness transparent. Continue to read for all the gory details

  • A backup of MiSeq Data Area exists on the 138.251.175.12 machine and is synchronised with H: drive everyday at midnight (by the ~/nutria/mylocal/bin/crot_.sh script).
  • This backup can then be mounted (only in user's kap6 area) via
sshfs kap6@138.251.175.12:/mnt/vst2/MiSeq_Data_Area ~/miseqdabkmpt
  • Every night at 02:20, a user kap6 cronjob mounts this backup on marvin and performs and rsync to /shelf/MiSeq_Data_Area_Backup (using the script /storage/home/users/kap6/bin/ensuremsdabk.sh).
  • The log of this rsync is held in kap6 home directory, in folder mountpointlogs, i.e.
/storage/home/users/kap6/mountpointlogs/mt_to_shelf.txt

This is visible to member of the miseq0 group.

kap6 has the alias "miseqCopy" that will run this script if a copy is needed more urgently.

why only kap6 user?

A USERID is required on the 138.251.175.12 for this to work, and the users public key on marvin also needs to be recorded in the user's home on this machine.

However, the /shelf/MiSeq_Data_Area_Backup is visible to all members of the miseq0 group, user kap6 is only carrying out the mirroring, and at night everyday, so this is a sufficient solution.

Unmounting

To umount, the fusermount command may be used with the -u option, like so:

fusermount -u ~/mnt/MiSeq_Data_Backup

NOTE: This mounts the directory inside a user's home directory, but the user's home directory is in turn exported via NFS to the nodes. These are two entirely different technologies and besides not being available on the nodes, the SSHFS is likely causing some trouble inside the NFS protocol which may lead to some instability. However, the trouble appears to be minor, so this solution, while ugly, is viable.

The GVFS method

The key to this is the Gnome Virtual File system, gvfs.

It is possible to get the h: drive mounted on the marvin frontend, mainly because it is running gnome.

However, the nodes are not, so currently they cannot mount the H: drive.

This means when working with the raw data, only the marvin.q can be used.

Administration Aspects

Tests

Note that smbclient (SAMBA's ftp-type client) is able to work well and navigate folders fine.

Debugging level can increased according to following link.

GVFS Environment

GVFS is part of the Gnome mega project.

Being the Window Manager it's a rather important component and cannot be dealt with abruptly. It's not clear how to restart it remotely. The usual Ctrl+Alt+Backspace

To restart gdm, the following rather rough method is actually the recommended one as can be seen here: https://access.redhat.com/solutions/36382

(This only applies for RHEL6 ... RHEL7 uses systemctl and the new Gnome 3, which are both coordinated and have a systemctl method for restarting)

The command is as follows

pkill -f gdm-binary

This definitely appears to have the desired effect, as can be seen from this interaction:

[root@marvin etc]# psg gdm
root     35930  0.0  0.0 134028  2040 ?        Ssl  Aug04   0:00 /usr/sbin/gdm-binary -nodaemon
root     35980  0.0  0.0 176912  3068 ?        Sl   Aug04   0:00 /usr/libexec/gdm-simple-slave --display-id /org/gnome/DisplayManager/Display1
root     35983  0.0  0.0 357396 29472 tty1     Ssl+ Aug04   5:23 /usr/bin/Xorg :0 -br -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-yv0pnl/database -nolisten tcp vt1
gdm      36020  0.0  0.0  20048   448 ?        S    Aug04   0:00 /usr/bin/dbus-launch --exit-with-session
gdm      36021  0.0  0.0  44060   848 ?        Ssl  Aug04   0:00 /bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session
gdm      36023  0.0  0.0 269204  7100 ?        Ssl  Aug04   0:00 /usr/bin/gnome-session --autostart=/usr/share/gdm/autostart/LoginWindow/
gdm      36026  0.0  0.0 133292  2412 ?        S    Aug04   0:06 /usr/libexec/gconfd-2
gdm      36027  0.0  0.0 120724  4856 ?        S    Aug04   0:05 /usr/libexec/at-spi-registryd
gdm      36031  0.1  0.0 435356 39176 ?        Ssl  Aug04  26:22 /usr/libexec/gnome-settings-daemon --gconf-prefix=/apps/gdm/simple-greeter/settings-manager-plugins
gdm      36033  0.0  0.0 358560  2848 ?        Ssl  Aug04   0:00 /usr/libexec/bonobo-activation-server --ac-activate --ior-output-fd=12
gdm      36040  0.0  0.0 135288  2164 ?        S    Aug04   0:00 /usr/libexec/gvfsd
gdm      36041  0.0  0.0 346416  8808 ?        S    Aug04   0:05 metacity
gdm      36042  0.0  0.0 442112 13656 ?        S    Aug04   0:36 /usr/libexec/gdm-simple-greeter
gdm      36044  0.0  0.0 248320  6344 ?        S    Aug04   0:00 /usr/libexec/polkit-gnome-authentication-agent-1
gdm      36045  0.0  0.0 273864  7624 ?        S    Aug04   0:18 gnome-power-manager
root     36054  0.0  0.0 141792  1968 ?        S    Aug04   0:00 pam: gdm-password
root     47707  0.0  0.0 122752  1580 pts/11   S+   15:05   0:00 grep gdm
[root@marvin etc]# pkill -f gdm-binary
[root@marvin etc]# psg gdm
root     47876  0.1  0.0 134028  2176 ?        Ssl  15:09   0:00 /usr/sbin/gdm-binary -nodaemon
root     47926  0.2  0.0 176912  3536 ?        Sl   15:09   0:00 /usr/libexec/gdm-simple-slave --display-id /org/gnome/DisplayManager/Display1
root     47929  5.2  0.0 354528 34536 tty1     Ssl+ 15:09   0:02 /usr/bin/Xorg :0 -br -verbose -audit 4 -auth /var/run/gdm/auth-for-gdm-QdHU0O/database -nolisten tcp vt1
gdm      47967  0.0  0.0  20048   696 ?        S    15:09   0:00 /usr/bin/dbus-launch --exit-with-session
gdm      47968  0.0  0.0  44060  1236 ?        Ssl  15:09   0:00 /bin/dbus-daemon --fork --print-pid 5 --print-address 7 --session
gdm      47970  0.2  0.0 269204  8644 ?        Ssl  15:09   0:00 /usr/bin/gnome-session --autostart=/usr/share/gdm/autostart/LoginWindow/
gdm      47973  0.2  0.0 133292  5276 ?        S    15:09   0:00 /usr/libexec/gconfd-2
gdm      47974  0.0  0.0 120724  5736 ?        S    15:09   0:00 /usr/libexec/at-spi-registryd
gdm      47978  1.0  0.0 408268 13224 ?        Ssl  15:09   0:00 /usr/libexec/gnome-settings-daemon --gconf-prefix=/apps/gdm/simple-greeter/settings-manager-plugins
gdm      47980  0.0  0.0 358560  3596 ?        Ssl  15:09   0:00 /usr/libexec/bonobo-activation-server --ac-activate --ior-output-fd=12
gdm      47987  0.0  0.0 135288  2168 ?        S    15:09   0:00 /usr/libexec/gvfsd
gdm      47988  0.1  0.0 346416 10704 ?        S    15:09   0:00 metacity
gdm      47989  0.4  0.0 452400 16716 ?        S    15:09   0:00 /usr/libexec/gdm-simple-greeter
gdm      47991  0.0  0.0 248320  8140 ?        S    15:09   0:00 /usr/libexec/polkit-gnome-authentication-agent-1
gdm      47992  0.1  0.0 273864  9476 ?        S    15:09   0:00 gnome-power-manager
root     48002  0.0  0.0 141792  2344 ?        S    15:09   0:00 pam: gdm-password
root     48020  0.0  0.0 122748  1572 pts/11   S+   15:10   0:00 grep gdm

Methods

GVFS will allow the user mount the filesystem, though it also requires a "running user d-bus session, typically started with desktop session on login".


Two tools are used for this: gvfs and fuse

  • a user must be a member of group "fuse"
  • a gvfs daemon must be running under user gdm: the system administrator should ensure this.
  • Script to use is
#!/bin/bash
export $(dbus-launch)
gvfs-mount smb://cfs.st-andrews.ac.uk/shared/med_research/res
/usr/libexec/gvfs-fuse-daemon ~/.gvfs

which can be launched as normal user,

Notes

  • gvfs-mount -l seems useless, reports nothing.

Relevant help pages

/usr/libexec/gvfs-fuse-daemon

usage: /usr/libexec/gvfs-fuse-daemon mountpoint [options]

general options:
    -o opt,[opt...]        mount options
    -h   --help            print help
    -V   --version         print version

FUSE options:
    -d   -o debug          enable debug output (implies -f)
    -f                     foreground operation
    -s                     disable multi-threaded operation

    -o allow_other         allow access to other users
    -o allow_root          allow access to root
    -o nonempty            allow mounts over non-empty file/dir
    -o default_permissions enable permission checking by kernel
    -o fsname=NAME         set filesystem name
    -o subtype=NAME        set filesystem type
    -o large_read          issue large read requests (2.4 only)
    -o max_read=N          set maximum size of read requests

    -o hard_remove         immediate removal (don't hide files)
    -o use_ino             let filesystem set inode numbers
    -o readdir_ino         try to fill in d_ino in readdir
    -o direct_io           use direct I/O
    -o kernel_cache        cache files in kernel
    -o [no]auto_cache      enable caching based on modification times (off)
    -o umask=M             set file permissions (octal)
    -o uid=N               set file owner
    -o gid=N               set file group
    -o entry_timeout=T     cache timeout for names (1.0s)
    -o negative_timeout=T  cache timeout for deleted names (0.0s)
    -o attr_timeout=T      cache timeout for attributes (1.0s)
    -o ac_attr_timeout=T   auto cache timeout for attributes (attr_timeout)
    -o intr                allow requests to be interrupted
    -o intr_signal=NUM     signal to send on interrupt (10)
    -o modules=M1[:M2...]  names of modules to push onto filesystem stack

    -o max_write=N         set maximum size of write requests
    -o max_readahead=N     set maximum readahead
    -o async_read          perform reads asynchronously (default)
    -o sync_read           perform reads synchronously
    -o atomic_o_trunc      enable atomic open+truncate support
    -o big_writes          enable larger than 4kB writes
    -o no_remote_lock      disable remote file locking

Module options:

[subdir]
    -o subdir=DIR           prepend this directory to all paths (mandatory)
    -o [no]rellinks         transform absolute symlinks to relative

[iconv]
    -o from_code=CHARSET   original encoding of file names (default: UTF-8)
    -o to_code=CHARSET      new encoding of the file names (default: UTF-8)

Debian Jessie mounts it, Redhat doesn't

(Red Hat appears to mount H-drive, but cannot get into any of the subdirectories, except hallsport. The command getcifsacl fails on Med_Research. It seems to be a wrapper for getxattr (GET eXternal ATTRibute). But all this may be saying the smae thing really, that the subdirs are just not visible.

Debian Linux has no problem mounting H-drive. mount reports its options as the following

noauto,users,rw,credentials=/storage/home/users/ramon/.smbcredentials,nosuid,nodev,noexec,relatime,vers=1.0,sec=ntlm,cache=strict,uid=0,nofo    rceuid,gid=0,noforcegid,file_mode=0755,dir_mode=0755,nounix,serverino,mapposix,rsize=61440,wsize=65536,echo_interval=60,actimeo=1

Maybe these defaults are necessary? Attempt made but failed.

As well as version differences, there is the issue of the CIFS kernel module. There is a vague remembrance of this having worked before, and it's possible that there is a bug in the latest kernel, which is nasty.

Latest action on this was to post a description of the problem on the RedHat Customer Portal. As may be expected, Redhat has a less recent version of cifs-utils and, indeed, mount (which belongs to util-linux).