ZFS focus on Ubuntu 20.04 LTS: ZSys for system administrators

09 June 2020 · 2913 words · 14 minute read

pu · ubuntu · zfs

ZFS focus on Ubuntu 20.04 LTS: ZSys for system administrators

Now that we covered the basics ZSys functionalities, I think you realize that the system is generally fully transparent to the users. However, most of system administrators are more likely to want to deep dive in more details on how you can tweak those behavior and observe more information on the current status. We getting more technical from now on and for the rest of the series covering ZSys & ZFS on ubuntu 20.04 LTS!

Client/Service architecture

As explained in the corresponding post, ZSys has a client/service architecture (mediated by polkit). The unix socket which activates on demand zsysd is using SO_PEERCRED to pass credentials (who/when) and with some (black magic) wizardry and multiple attempts, we were able to make it work over gRPC, the communication RPC framework we are using. As this whole combination was not really documented anywhere on the Internet, it may be useful to document that in a more technical blog post in the future. So if you are interested, let us know in the discourse link!

ZSys architecture

This is why we have a zsysctl service commands with a bunch of subcommands:

$ zsysctl help service
Service management

Usage:
  zsysctl service COMMAND [flags]
  zsysctl service [command]

Available Commands:
  dump        Dumps the current state of zsys.
  gc          Run daemon state saves garbage collection.
  loglevel    Sets the logging level of the daemon.
  refresh     Refreshes machines states.
  reload      Reloads daemon configuration.
  status      Shows the status of the daemon.
  stop        Stops zsys daemon.
  trace       Start profiling until you exit this command yourself or when duration is done. Default is CPU profiling with a 30s timeout.

Flags:
  -h, --help   help for service

Global Flags:
  -v, --verbose count   issue INFO (-v) and DEBUG (-vv) output

Use "zsysctl service [command] --help" for more information about a command.

Daemon logs and request identifiers

All logs end up in the systemd journal. By default, it will only print warnings and more important messages. You can change the daemon loglevel for the current requests with zsysctl service loglevel subcommand. Note that for now, it’s not persistent and reset once the daemon stops.

The default will only print warnings or errors:

## Making a client request
$ zsysctl service refresh
## Looking at the logs, nothing printed is good
$ journalctl -b -u zsysd
mai 12 10:32:21 casanier systemd[1]: Starting ZSYS daemon service...
mai 12 10:32:21 casanier systemd[1]: Started ZSYS daemon service.
mai 12 10:33:21 casanier systemd[1]: zsysd.service: Succeeded.

We can tweak the log level by requesting to print starting with INFO level with “1” or having ton of logs with DEBUG or above with “2”:

## Change the log level to INFO
$ zsysctl service loglevel 1
## Making a client request
$ zsysctl service refresh
## Looking at the logs, note the INFO messages
$ journalctl -b -u zsysd
mai 12 10:32:21 casanier systemd[1]: Starting ZSYS daemon service...
mai 12 10:32:21 casanier systemd[1]: Started ZSYS daemon service.
mai 12 10:32:44 casanier zsysd[81729]: level=info msg="new incoming request Refresh() for \"ae564ded:eaa4f052\""
mai 12 10:32:44 casanier zsysd[81729]: level=info msg="[[ae564ded:eaa4f052]] Requesting a refresh"
mai 12 10:33:44 casanier systemd[1]: zsysd.service: Succeeded.

INFO level will log any incoming requests with the request identifier in addition to more important messages. Note that each request has an identifier ID. This one is broken up in two parts, each part being generated by the service: “[[ae564ded:eaa4f052]]”. The first part before “:” is the client ID.

Multiple requests by the same process (for instance, when we request your feedback, like a confirmation of state deletion which has dependencies) or future GUI will have an unique client ID so that you can track those requests.

The second part identifies the request itself and will be different on each gRPC call (client/daemon).

## Change the log level to INFO and make a call with multiple requests from the same instance
$ zsysctl service loglevel 1
## Client will call the service twice
$ zsysctl state remove 5uoaut   ## <- first gRPC request
rpool/USERDATA/didrocks_wbdgr3@autozsys_5uoaut has a dependency on some datasets:
  - rpool/myclone

Would you like to proceed [y/N]? y   ## <- second gRPC request
## Let check this
$ journalctl -b -u zsysd
mai 12 10:48:19 casanier zsysd[94405]: level=info msg="new incoming request RemoveUserState() for \"08a2f44c:1871e35b\""
mai 12 10:48:19 casanier zsysd[94405]: level=info msg="[[08a2f44c:1871e35b]] Requesting to remove user state \"5uoaut\" for user didrocks"
mai 12 10:48:22 casanier zsysd[94405]: level=info msg="new incoming request RemoveUserState() for \"08a2f44c:1cfb125a\""
mai 12 10:48:22 casanier zsysd[94405]: level=info msg="[[08a2f44c:1cfb125a]] Requesting to remove user state \"5uoaut\" for user didrocks"

You will note here that the 2 requests have been made consequently by the same client (same process) that we identified by “08a2f44c”, but each call has a different trailing identifier.

Note that if multiple requests are running in parallel (all read-only call can be run in parallel for instance), the log will be intertwined. However, each log on the daemon side will be prefixed by the identifier ID like [[08a2f44c:1cfb125a]] so that you can follow client/requests in flights and their impacts.

Log streaming

Checking logs on the service side is great, but I’m sure you will be more interested in seeing the logs attached to your specific request itself from the client!

Independent of the service log level itself, you can request streaming of your related logs directly to your client. This will be filtered out and only logs for your current request will be streamed to a particular client, even when you have multiple parallel requests in flights from the same or different clients! This can be controlled via -v (INFO) or -vv (DEBUG) flags for any client requests.

Here is an example with the service being at the default log level (WARNING and more), while the client is requesting debug logs:

$ zsysctl save -vv
level=debug msg="/zsys.Zsys/SaveUserState() call logged as [0cdb03d3:a318e109]"
DEBUG Check if grpc request peer is authorized     
DEBUG Polkit call result, authorized: true         
INFO Requesting to save state for user "didrocks" 
DEBUG ZFS: trying to snapshot "rpool/USERDATA/didrocks_wbdgr3", recursive: false 
DEBUG Trying to snapshot "rpool/USERDATA/didrocks_wbdgr3" 
DEBUG property "com.ubuntu.zsys:mountpoint" on snapshot "rpool/USERDATA/didrocks_wbdgr3@autozsys_r8vem8": "/home/didrocks:local" 
DEBUG property "com.ubuntu.zsys:canmount" on snapshot "rpool/USERDATA/didrocks_wbdgr3@autozsys_r8vem8": "on:local" 
DEBUG ZFS: ending transaction                      
DEBUG ZFS: transaction done                        
DEBUG ignoring "rpool": either an orphan clone or not a boot, user or system datasets and canmount isn't on 
DEBUG ignoring "bpool/BOOT": either an orphan clone or not a boot, user or system datasets and canmount isn't on 
DEBUG ignoring "rpool/ROOT": either an orphan clone or not a boot, user or system datasets and canmount isn't on 
DEBUG ignoring "rpool/USERDATA": either an orphan clone or not a boot, user or system datasets and canmount isn't on 
DEBUG current machines scanning layout:
{
   "All": {
      "rpool/ROOT/ubuntu_e2wti1": {
         "IsZsys": true,
         "ID": "rpool/ROOT/ubuntu_e2wti1",
         "LastUsed": "2020-05-12T08:39:34+02:00",
         "Datasets": {
            "bpool/BOOT/ubuntu_e2wti1": [
[…]
} 
DEBUG ZFS: ending transaction                      
DEBUG ZFS: ending transaction                      
DEBUG ZFS: transaction done                        
DEBUG ZFS: transaction done                        
Successfully saved as "autozsys_r8vem8"

In the previous output, that the service is streaming some DEBUG and INFO logs to the client, after signalling the associated requester and request ID to the client. This will ease tracing those requests also in the journal if you want to look at all logs directly on the daemon.

You can note in the above logs that we first validate authorization for a given request.

Polkit authorizations

Each request is checked given the current context (which user is requesting what) and this is confronted to our authorizer framework against polkit rules.

User, request and authorization

A successful authorization will log, after polkit prompt:

DEBUG Check if grpc request peer is authorized     
DEBUG Polkit call result, authorized: true

Polkit request for performing system write operation

A denied authorization (wrong password or not an user authorized to perform this authorization by polkit) will return an error to the user:

$ zsysctl service refresh
ERROR Permission denied: Polkit denied access

If we print debug logs, we can see the polkit request and denial:

$ zsysctl service refresh -vv
level=debug msg="/zsys.Zsys/Refresh() call logged as [330d1dc6:92f4ff3a]"
DEBUG Check if grpc request peer is authorized     
DEBUG Polkit call result, authorized: false        
level=error msg="Permission denied: Polkit denied access"

Root user will always be authorized and polkit won’t even be called:

$ sudo zsysctl service refresh -vv
level=debug msg="/zsys.Zsys/Refresh() call logged as [f581c74d:5dab3600]"
DEBUG Check if grpc request peer is authorized     
DEBUG Authorized as being administrator

Some permissions like saving your own user state is permitted via our default polkit policy and no prompt for priviledge escalation will be displayed:

$ zsysctl save -vv
level=debug msg="/zsys.Zsys/SaveUserState() call logged as [dbd64755:b9583f7c]"
DEBUG Check if grpc request peer is authorized     
DEBUG Polkit call result, authorized: true         
INFO Requesting to save state for user "didrocks" 
[…]

Default policies and relaxing rules

The default policy is installed to /usr/share/polkit-1/actions/com.ubuntu.zsys.policy on the system. This can be relaxed by system administrators following polkit rules.

Let’s have a look at our default rule set. By default, in summary, people can:

Always list machines and states associated to machines (rule com.ubuntu.zsys.system-list)
Always save and remove their own state (rule com.ubuntu.zsys.user-write-self)
Need administrator permissions to save or remove other users states (rule com.ubuntu.zsys.user-write-others)
Need administrator permissions to save or remove system states (rule com.ubuntu.zsys.system-write)
Need administrator permissions for any service management (rule com.ubuntu.zsys.manage-service)

If you have a long running process, like a future GUI, we specified the auth_admin_keep value as default to avoid asking every time for permissions. This is why you only have one polkit permission request when requesting destroying some states having dependencies, even if this is made of 2 consecutive requests. However, as any command line execution is a new process, the permission will be requested each time.

As when you are root (via sudo), polkit is bypassed, this is not reflected here.

Remaining service management commands

Those commands are more for debugging purpose and shouldn’t be needed directly by any users. This is in particular exercised for internal processing or by our apport hook which collects a lot of information on your system to help debugging it.

Just a note, if you report bugs, please use ubuntu-bug zsys to report anything or follow the instructions on the upstream github repo template to help us collecting every necessary information. :)

zsysctl service dump

This dumps a bunch of info on the current state of the world that ZSys sees. You will see all machines, current machines, current unmanaged ZFS datasets and more.

zsysctl service gc

This is what is called by systemd timers to call for state garbage collection, as explained in our previous post.

zsysctl service refresh

This forces ZSys to rescan any ZFS datasets on the system and take them into account. Note that this isn’t needed in general as the refresh is done for you by ZSys or we will have hooks when the users do changes outside of ZSys.

zsysctl service reload

This forces reloading any zsys.conf configuration, as explained in a previous blog post.

zsysctl service status

This is used internally to look if zsysd is healthy.

zsysctl service stop

It forces the daemon to shut down immediately rather after a timeout.

zsysctl service trace

This command starts profiling until you exit this command yourself or when duration is done. Default is CPU profiling with a 30s timeout. You can trigger pprof memory profile or systrace.

Systemd services, timers and more system administrator tools

Another way for the system administrator to control their service is to know what triggers zsysd periodically and how. You can manage most of them by systemd tooling. Here is a summary:

Periodic user state saves

Connected users have periodic user state saved by a systemd user timer associated to its user service:

$ cat /usr/lib/systemd/user/zsys-user-savestate.timer
[Unit]
Description=Save current user state periodically

[Timer]
OnStartupSec=1min
OnUnitActiveSec=1h

[Install]
WantedBy=timers.target


$ cat /usr/lib/systemd/user/zsys-user-savestate.service
[Unit]
Description=Save current user state periodically

# We can't run it in a container
ConditionVirtualization=!container

[Service]
Type=oneshot
ExecStart=/sbin/zsysctl state save

System state saves on apt changes

System state saved are made in two parts. Before the first apt operation (in case you have multiple of them processed by do-release-upgrade, see the (corresponding blog post](https://didrocks.fr/2020/05/26/zfs-focus-on-ubuntu-20.04-lts-zsys-general-presentation/) for more details), we take the current system state (and all users) snapshots in an autogenerated state save. Then, after the first apt operation, we rebuild the boot menu to take into account the newly create state save. This will allow reverting when needed. We delayed the menu build only after the first apt operation to ensure we limit the waiting time to have the new software installed, upgraded or removed: building the bootloader menu can indeed take some time.

$ apt <something>
INFO Requesting to save current system state      
Successfully saved as "autozsys_ip60to"
[installing/remove/upgrading package(s)]
INFO Updating GRUB menu

Both hooks are listed in /etc/apt/apt.conf.d/90_zsys_system_autosnapshot which are calling an intermediate shell script to handle do-release-upgrade successive apt calls.

State garbage collector triggering

Similarly to user state saving, we have a system timer and unit running once a day:

$ cat /lib/systemd/system/zsys-gc.timer
[Unit]
Description=Clean up old snapshots to free space

[Timer]
OnStartupSec=10min
OnUnitActiveSec=24h

[Install]
WantedBy=timers.target


$ cat /lib/systemd/system/zsys-gc.service
[Unit]
Description=Clean up old snapshots to free space

# We can't run it in a container
ConditionVirtualization=!container

[Service]
Type=oneshot
ExecStart=/sbin/zsysctl service gc

Zsysd activation

The ZSys daemon, zsysd, is socket activated (the unix socket is /run/zsysd.sock) associated to its service:

$ cat /lib/systemd/system/zsysd.socket
[Unit]
Description=Socker activation for zsys daemon
PartOf=zsysd.service

[Socket]
ListenStream=/run/zsysd.sock

[Install]
WantedBy=sockets.target


$ cat /lib/systemd/system/zsysd.service
[Unit]
Description=ZSYS daemon service
After=zsysd.socket
Requires=zsysd.socket

# We can't run it in a container as there is no ZFS namespaces
ConditionVirtualization=!container

[Service]
Type=notify
ExecStart=/sbin/zsysd

# Some daemon restrictions
NoNewPrivileges=true
PrivateNetwork=true

The service will notify systemd once it’s ready and take the incoming request which is pending on the socket once activated. After a minute of idling without any new requests, it will stop.

If no socket was created and you execute zsysd manually, the daemon will attempt to create one and refuse to start if it can’t.

Early bootup

We are using zfs new generator to speed up the boot process, which generates .mount files and let systemd mounting the path when required:

$ ls /run/systemd/generator/*.mount
 /run/systemd/generator/boot-efi.mount
 /run/systemd/generator/boot-grub.mount
 /run/systemd/generator/boot.mount
 /run/systemd/generator/home-didrocks.mount
 /run/systemd/generator/root.mount
 /run/systemd/generator/srv.mount
 /run/systemd/generator/usr-local.mount
 /run/systemd/generator/var-games.mount
 /run/systemd/generator/var-lib-AccountsService.mount
 /run/systemd/generator/var-lib-apt.mount
 /run/systemd/generator/var-lib-dpkg.mount
 /run/systemd/generator/var-lib.mount
 /run/systemd/generator/var-lib-NetworkManager.mount
 /run/systemd/generator/var-log.mount
 /run/systemd/generator/var-mail.mount
 /run/systemd/generator/var-snap.mount
 /run/systemd/generator/var-spool.mount
 /run/systemd/generator/var-www.mount

All of them but boot-efi.mount and boot-grub.mount, which are coming from /etc/fstab (though the systemd fstab generator) are created by the zfs mount generator.

If nothing has changed since the last boot, this is generally a no-op and .mount files will be generated as upstream intended to. ZSys is going out of the way in that case, which is the most common one.

However, if you reverting to a previous system state or if some cache is not available (/etc/zfs/zpool.cache or /etc/zfs/zfs-list.cache/* files being empty), we are going a little bit further and try to reconciliate with the desired state. Under those conditions, the hidden synchronous command zsysd boot-prepare is executed, which will clone any needed datasets and associate user states to it: current user state if no user revert was selected or previous user state if user revert option was selected. Then, we regenerate the cache list and let the generator do its magic with the .mount files generation.

If anything goes badly, you will be dropped in emergency state so that you can debug and fix what went wrong. At worst, you should be able to reboot on current state without any glitches!

Committing current boot

Preparing the system is great, but ensuring that we reboot successfully is interesting as well. The last ZSys service starts after we hit default.target (which is a symlink to graphical.target) to commit the boot as being successful. This updates the current boot time (last-used updated) and which kernel we successfully booted with (to ensure future saved states knows on which kernel to rely on).

If we were reverting to a previous state (current state being different from last succesful boot state), we recursively promote (in term of ZFS promotion) all current system and user datasets from the current states so that the other datasets becomes clones of it (all snapshots done before the revert point will migrate to it). Then, if necessary, we update the bootloader menu to store all that and put on top of the menu the current state.

The service in charge of that is calling the hidden zsysctl boot commit command.

$ cat /lib/systemd/system/zsys-commit.service
[Unit]
Description=Mark current ZSYS boot as successful
After=default.target
Requires=default.target

# We can't run it in a container
ConditionVirtualization=!container

[Service]
Type=oneshot
RemainAfterExit=yes
ExecStart=/sbin/zsysctl boot commit

# Some daemon restrictions
NoNewPrivileges=true
PrivateNetwork=true

[Install]
WantedBy=default.target

Note that both zsysd boot-prepare and zsysctl boot commit are idemnpotent.

Final note

We hope that gives you some information and tools, as a sysadmin, to understand a little bit more how ZSys integrates with the system. We tried to shed some lights on how you can tweak it, with traditional systemd administrators tooling, by annotating and changing parameters in various service files. Logs can help you monitoring and understanding what exactly is happening on your system.

By the way, an awesome easy way to contribute and help without too much programming knowledge is contributing on logs (when do we have too many of them, are they using the right level of logging, are they any missing or unclear which should be reformatted?). If you are interested in such contributions, do not hesitate to head down to the ZSys repository, open issues, PR…

We are now going to start looking in details how this all interact with the underlying lower level material, like partitions, ZFS datasets and various inherent properties. See you there :)

Meanwhile, join the discussion via the dedicated Ubuntu discourse thread.