AuriStor File System News

The AuriStorFS v2021.05-38 release is an important update for all systems.

New v2021.05-38 (29 February 2024)

As with other AuriStorFS releases since the beginning of 2024, this release includes additional improvements to the Rx RPC implementation which are related to the possibility of silent data corruption when Rx jumbograms are in use. Prior releases disabled the negotiation of Rx jumbograms such that the v2021.05-37 Rx peer will refuse to send Rx jumbograms and will request that the remote peer does not send them. However, a bad actor could choose to send Rx jumbograms even though they were asked not to. v2021.05-38 introduces additional protections to ensure that a corrupt Rx jumbogram is dropped instead of being accepted.

The v2021.05-38 Rx RPC implementation also includes two optimizations. First, when Rx initiators complete a call they will no longer send an extra ACK packet to the Rx acceptor of the completed call. The sending of this unnecessary ACK creates additional work for the server which can result in increased latency for other calls being processed by the server.

Second, all AuriStor Rx services require a reach check for incoming calls from Rx peers to help protect against Distributed Reflection Denial of Service (DRDoS) attacks and execution of RPCs when the response cannot be delivered to the caller. A new reach check is required for each new call that arrives more than 60 seconds after the prior reach check completed. v2021.05-38 Rx considers the successful acknowledgment of a response DATA packet as a reach check validation. With this change reach checks will not be periodically required for a peer that completes at least one call per 60 seconds. A 1 RTT delay is therefore avoided each time a reach check can be avoided. In addition, reach checks require the service to process an additional ACK packet. Eliminating a large number of reach checks can improve overall service performance.

The final Rx RPC change in this release is specific to kernel implementations. Prior releases restricted the frequency of executing time scheduled Rx events to a granularity no smaller than 500ms. As a result an RTO timer event for a lost packet could not be shorter than 500ms even if the measured RTT for the connection is significantly smaller. The minimum RTO for a connection in AuriStor Rx is 200ms. The inability to schedule shorter timeouts impacts recovery from packet loss.

For client systems, the v2021.05-38 release contains fixes for two bugs that have resulted in system crashes on Linux when resource limits have been exceeded either by the system as a whole or for the process accessing /afs.

CrayOS SLES 5.14.21 is now a supported client platform.

The AuriStorFS v2021.05-37 release is an important update for all systems.

  • New Platforms

    • Linux 6.8 kernels
  • Rx improvements

    • The v2021.05-36 release permanently disabled all use of Rx jumbograms due to a risk of silent data corruption. However, when advertising the number of acceptable datagrams in the ACK trailer a missing htonl() set the value to 16777216 instead of 1 on little endian systems.

    • When sending a PING ACK as a reachability test, ensure that the "previousPacket" field is properly assigned to the largest accepted DATA packet sequence number instead of zero.

    • Replace the initialization state flag with two flags. One that indicates that Rx initialization began and the other that it succeeded. The first prevents multiple attempts at initialization after failure. The second prevents shutdown from accessing uninitialized structures if initialization failed.

  • Cache Manager Improvements:

    • No longer refuse to start if both the 'cachedir' and 'memcache' options are present in the configuration file.

  • Location Service:

    • If the VLDB contains a corrupted multi-homed server entry, skip it, but do not refuse to process subsequent multi-homed server entries.

    • SVL_ChangeAddr RPC called by "vos changeaddr" is now capable of

      • replacing IPv4 addresses in a multi-homed server entry without corrupting it;
      • refusing to replace a server address with a loopback address; and
      • logging more details about the changes it makes to the VLDB.

  • vos command line tool:

    • vos examine can display volume information from busy volumes.

    • vos move now reports the correct destination vice partition if a ROVOL instance exists on a partition other than the requested partition.

    • The cell name is now added to "unknown to location service" error messages.

    • Use of the -srcport option introduced in v2021.05-35 can result in an assertion failure if the requested port number is already in use. This release configures the Rx stack to fallback to a random port number if the requested port is already in use.

  • Protection Service:

    • Prevent potential NULL pointer exception when converting single component Kerberos v5 names where the component name is present in the Kerberos v4 service conversion table. For example "rcmd@REALM" or "http@REALM".

The AuriStorFS v2021.05-36 release is an important update for all systems.

  • Rx improvements

    • Permanently disable all use of Rx jumbograms due to a risk of silent data corruption.

      IBM derived Rx RPC implementations since OpenAFS 1.0, and possibly before, include a race condition when transmiting Rx packets. The race is possible because there is no protection for data that is being copied into the kernel by sendmsg() after the Rx call lock is dropped at the start of packet transmission. It is critical that this packet data is not modified by another thread. However, races exist between the application, listener, and event threads that can lead to retransmissions starting whilst an original transmission is still in progress. This can lead to the packet headers being overwritten, and either the original transmission, the retransmission or both sending corrupt data to the peer.

      This corruption can affect the packet serial number or packet flags. It is particularly harmful when the packet flags are corrupted, as this can lead to multiple Rx packets which were intended to be sent as Rx jumbograms being delivered and misinterpreted as a single large packet. The eventual result of this depends on the Rx security class in play, but it can cause decrypt integrity errors (rxgk:crypt and rxgk:auth) or corruption of the data stream (rxnull, rxgk:clear or rxkad:auth).

      All AuriStorFS servers, OpenAFS 1.6 or later servers, and the Windows cache manager have been shipped with Rx jumbograms disabled by default. The UNIX cache managers however are shipped with jumbograms enabled. There are many AFS cells around the world that continue to deploy OpenAFS 1.4 or earlier fileservers which continue to negotiate the use of Rx jumbograms.

      It is worth noting that all AuriStorFS v0.198 and later fileservers and cache managers implement explicit checks that will recognize the corrupted application data stream and prevent corrupted file content from being stored either into an AFS vnode's backing store in the volume's object store or the cache manager's AFS cache. OpenAFS cache managers and fileservers do not have these protections.

      With Rx jumbograms disabled the maximum number of Rx packets in a datagram is reduced from 6 to 1; the maximum number of send and receive datagram fragments is reduced from 4 to 1; and the maximum advertised MTU is restricted to 1444 - the maximum rx packet size prior to the introduction of jumbograms in IBM AFS 3.5.

    • If the rx call flow state transitions from either the RECOVERY or RESCUE states to the LOSS state as a result of an RTO resend event while writing packets to the network, cease transmission of any new DATA packets if there are packets in the resend queue.

    • When the call flow state is LOSS and all packets in the resend queue have been retransmitted and yet the recovery point has not been reached, then permit new DATA packets to be sent in order to maintain a full congestion window.

    • Add a safety check to prevent the estimated RTT from underflowing when the actual roundtrip time is smaller than 125us.

    • Fix the computation of the padding required for rxgk encrypted packets. This bug results in packets sending 8 bytes fewer per packets than the network permits. This bug accidentally prevented the construction of Rx jumbograms when a call is protected by rxgk:crypt.

    • Replace the random number generator with a more security source of random bytes.

  • Cache Manager Improvements:

    • In Linux kernels with folio mapping functionality, prior releases of AuriStorFS cache manager could trigger an infinite loop when getting a page. This release converts to using the new folio mapping functionality instead of page mapping when available.

    • afsd will now log the set of network interfaces in use whether or not rxbind is configured.

    • afsd will no longer drop user-defined mount options if SELinux is disabled.

    • Prevent possible memory corruption when listing tokens.

  • Volume Management Service:

    • Improved compliance with the 2009 Dump stream standard adopted by OpenAFS. The maximum length of a TLV tag and the indefinite length TLV tag are properly enforced.

    • When restoring a volume dump, the contents of the restore cookie must override the matching fields in all D_VOLUMEHEADER regions not only the first.

  • vos command line tool:

    • vos rename could log the wrong volume location information if neither VLF_RWEXISTS nor VLF_BACKEXISTS is set in the volume location entry received from the location service.

    • vos examine will now display volume information for a busy volume if the source volserver is AuriStorFS v0.198 or later. The source volume is a vos release is reported in the VBUSY state even though it is online.

The AuriStorFS v2021.05-34 release is an important update for Linux client systems.

  • Linx cache manager improvements

    • v2021.05-33 introduced a critical bug for Linux cache managers. Creating a hard link produces an undercount of the linked inode's i_count. This undercount can result in a kernel module assertion failure if the inode is garbage collected due to memory pressure. The following message will be logged to dmesg

           "yfs: inode freed while on LRU"

      followed by a kernel BUG report. This bug is fixed in v2021.05-34.

    • If the oom-killer terminates a process while it is executing within the AuriStorFS kernel module it is possible for memory allocations to fail. This can lead to failures reading from the auristorfs cache. This release includes additional logic to permit failing the cache request without triggering a NULL pointer dereference.

    • If the auristorfs disk cache filesystem is remounted read-only then the disk cache will become unusable. Instead of triggering a system panic when attempts to read or write fail, log a warning and fail the request.

The AuriStorFS v2021.05-33 release is a recommended update for all systems.

  • New Platforms

    • Ubuntu 23.10 Mantic Minotaur
    • Linux 6.7 kernels
  • New Features

    • -srcport option for vos, pts, bos

      Each execution of bos, pts and vos creates a new udp socket bound to a new port number. Rapid execution of these commands results in short bursts of udp traffic from multiple ports which might fill a router's port mapping table or be interpreted as a denial of service attack by security gateways. Communication failures have been observed when traversing AWS networks.

      The -srcport option permits bos, pts and vos to explicitly bind to a specific port. If the requested port is in use, then a random port will be used instead.

    • Apache 2 mod_auth_waklog module

      A major rewrite of the Apache2 mod_auth_waklog module for use with AuriStorFS and Linux kafs clients. The new mod_auth_waklog is built using libyfs_acquire for token acquisition and obtains both yfs-rxgk and rxkad tokens.

  • Linx cache manager improvements

    Improved compatibility when AuriStorFS is used in conjunction with overlayfs

  • Rx RPC network transport improvements

    • Improved accuracy of initial Rx call RTO value.
    • Reduced risk of Rx BUSY response after the prior call timeout by Rx initiator.
  • File server updates

    • The file service RemoveFile and RemoveDirectory RPCs are updated to return success whenever the requested directory entry can be removed even if the referenced object no longer exists.
    • The file service Lock RPCs are updated to populate the VolSync output metadata when read locks are granted on readonly and backup volumes. Note: most clients do not ask the file service to grant locks on readonly volumes.
    • The volume service will no longer process a DeleteVolume RPC when requested via an ITBusy transaction instead of an ITOffline transaction. Versions of vos movesite prior to v2021.05-33 removed the source RO volume using an ITBusy volserver transaction instead of an ITOffline transaction. This is problematic because the file service continues to serve data from a RO volume that is in use by an ITBusy transaction.
  • Cell service database updates

    • cellservdb.conf has been synchronized with the 31 Oct 2023 update to the CellServDB file.