udebug - Reports Ubik process status for a database service instance
udebug -server <server machine> [-port <IP port>] [-long [<print all info: yes|no>]] [-coordinator [<print coordinator info: quorum|unanimity>]] [-waitfor [<recovery state: quorum|founddb|havedb|sentdb|modifieddb>]] [-duration <seconds (0 = forever)>] [-help]
The udebug command displays the status of the Ubik database service identified by the -server and -port arguments. The output identifies the addresses where Ubik peers are running, which of them is the Ubik coordinator (synchronization site), and the status of the connections between them.
Combined with the -port argument names the database service for which to display status information. Provide a fully qualified hostname or IP address.
Identifies the database server process for which to display status information, either by its process name, IANA service name or port number. Provide one of the following values:
ptserver
,
afs3-prserver
or 7002
for the Protection Service.
vlserver
,
afs3-vlserver
or 7003
for the Location Service (the default).
buserver
,
afs3-budbserver
or 7021
for the Backup Service.
Reports additional information about each peer of the Ubik service named by the -server argument. The information appears by default if that instance is the Ubik coordinator.
Display the status of the active coordinator,
if any.
The default is quorum
.
If unanimity
is specified,
then output is generated only if the coordinator was elected by unanimous consent.
When combined with -coordinator, specifies a recovery state to wait for. By default, the command will wait forever. Use -duration to specify a finite number of seconds to wait. The output will be generated either after the recovery state has been reached or the wait duration has passed.
When combined with -waitfor, specifies the number of seconds to wait for the recovery state. The default value of 0 seconds means "wait forever".
Prints the online help for this command. All other valid options are ignored.
Several of the messages in the output provide basic status information about the Ubik service specified by the -server and -port arguments, and the remaining messages are useful mostly for debugging purposes.
To check basic Ubik status, issue the command for each Ubik service instance in turn. In the output for each, one of the following messages appears in the top third of the output.
I am coordinator . . . (<#_sites> servers) I am not coordinator
For the Ubik coordinator, the following messages indicates that all sites have the same version of the database, which implies that Ubik is functioning correctly.
Recovery state 17 (have best db version; sync complete) Recovery state 1f (have best db version; sync complete; db modified)
For correct Ubik operation, time must be synchronized across all machines hosting Ubik service instances. The following messages, which are the third and fourth lines in the output, report the current date and time according to the host's clock and the clock on the system where the udebug command is issued.
Host's <IP_addr> time is <dbserver_date/time> Local time is <local_date/time> (time differential <skew> secs)
The <skew> is the difference between the Ubik service clock and the local clock. Its absolute value is not vital for Ubik functioning, but a difference of more than two seconds between the skew values for the Ubik service instances indicates that their clocks are not synchronized and Ubik performance can be hampered.
Following is a description of all messages in the output. As noted, it is useful mostly for debugging and most meaningful to someone who understands Ubik's implementation.
The output begins with the following messages. The first message reports the IPv4 or IPv6 address the response was received from. The second message reports the IPv4 addresses that are configured with the operating system on the machine specified by the -server argument. As previously noted, the third and fourth messages report the current date and time according to the clocks on the database server machine and the machine where the udebug command is issued, respectively. All subsequent timestamps in the output are expressed in terms of the local clock rather than the database server machine clock.
First response received from <IP_addr> Host's addresses are: <list_of_IPv4_addrs> Host's time is <dbserver_date/time> Local time is <local_date/time> (time differential <skew> secs)
If the udebug command is issued during the coordinator election process and voting has not yet begun, the following message appears next.
Last yes vote not cast yet
Otherwise, the output continues with the following messages.
Last yes vote for <IPv4_addr> was <last_vote> secs ago (coordinator); Last vote started <vote_start> secs ago (at <date/time>) Local db version is <db_version>
The first indicates which peer this Ubik process last voted for as coordinator (it can vote for itself) and how long ago it sent the vote. The second message indicates how long ago the Ubik coordinator requested confirming votes from the secondary sites. Usually, the <last_vote> and <vote_start> values are the same; a difference between them can indicate clock skew or a slow network connection between the two database server machines. A small difference is not harmful. The third message reports the current version number <db_version> of the database maintained by this Ubik process. It has two fields separated by a period. The field before the period is the ubik epoch when the database was modified, and the field after the period indicates the ubik transaction id that made the change. A new ubik epoch begin whenever a coordinator is newly elected or when the transaction id exceeds 1073741823 (0x3fffffff).
The output continues with messages that differ depending on whether the Ubik process is the coordinator or not.
If there is only one Ubik site eligible to run for election (aka a non-clone), it is always the coordinator, as indicated by the following message.
I am coordinator forever (1 server)
If there are multiple Ubik sites, and the response is from the coordinator, the output continues with the following two messages.
I am coordinator until <expiration> secs from now (at <date/time>) (<#_sites> servers) Recovery state <flags> (<explanation of flags>)
The first message (which is reported on one line) reports how much longer the site remains coordinator even if the next election fails to complete, and how many sites are participating in the quorum. The flags field in the second message is a hexadecimal number that indicates the current state of the quorum. A value of 17
or 1f
indicates complete database synchronization, whereas a value of 7
means that the coordinator has the correct database but cannot contact all secondary sites to determine if they also have it. Lesser values are acceptable if the udebug command is issued during coordinator election, but they denote a problem if they persist. The individual flags have the following meanings:
This machine is the coordinator.
The coordinator has determined which site has the database with the highest version number.
The coordinator has a copy of the database with the highest version number.
The database has been updated since the start of the current ubik epoch.
All sites have the database with the highest version number.
If the udebug command is issued while the coordinator is writing a change into the database, the following additional message appears.
I am currently managing write transaction I<transaction identifier>
Otherwise the coordinator reports.
The last trans I handled was I<transaction identifier>
If the response is from a site that is a secondary site (not the Ubik coordinator), the output continues with the following messages.
I am not coordinator Lowest host <lowest_IPv4_addr> was set <low_time> secs ago Coordinator host <sync_IPv4_addr> was set <sync_time> secs ago
The <lowest_IPv4_addr> is the lowest IPv4 address of any peer from which the Ubik process has received a message recently, whereas the <sync_IPv4_addr> is the primary IPv4 address of the current coordinator. If they differ, the machine with the lowest IPv4 address is not currently the coordinator. The Ubik process continues voting for the current coordinator as long as they remain in contact, which provides for maximum stability. However, in the event of communication loss with the coordinator, this Ubik site votes for the <lowest_IP_addr> site instead (assuming they are in contact), because it has a bias to vote in elections for the site with the lowest IPv4 address.
For both the Ubik coordinator and secondary sites, the output continues with the following messages. The first message reports the version number of the database at the coordinator, which needs to match the <db_version> reported by the preceding Local db version
message. The second message indicates how many database pages are currently locked for any operation or for writing in particular. The values are nonzero if the udebug command is issued while a read or write transaction is in progress.
Coordinator's db version is <db_version> <locked> locked pages, <writes> of them for write
The following message appears next only if there is one or more read transaction actively reading from the database.
There are read locks held
The following message appears next only if there is an active write transaction committing changes to the database.
There are write locks held
Similarly, one or more of the following messages appear next only if there are any read or write transactions in progress when the udebug command is issued:
There is an active write transaction There is at least one active read transaction Transaction tid is <transaction identifier>
If the response was received from the coordinator, the next message reports when the current coordinator last updated the database.
Last time a new db version was labelled was: <last_epoch_time> secs ago (at <date/time>)
If the machine named by the -server argument is the coordinator, the output concludes with an entry for each secondary site that is participating in the quorum, in the following format.
Server (<IPv4_address>): (db <db_version>) last vote rcvd <last_vote> secs ago (at <date/time>), last beacon sent <last_beacon> secs ago (at <date/time>), last vote was { yes | no } dbcurrent={ 0 | 1 }, up={ 0 | 1 } beaconSince={ 0 | 1 }
The first line reports the site's primary IPv4 address and the version number of the database it possesses. The <last_vote> field reports how long ago the coordinator received a vote message from the Ubik process at the site, and the <last_beacon> field how long ago the coordinator last requested a vote message. The coordinator sends beacon messages only to sites that are "up"
. The following messages appear if the secondary site has yet to receive a beacon message.
Last vote never rcvd Last beacon never sent
On the final line of each entry, the fields have the following meaning:
dbcurrent
is 1
if the site has the database with the highest version number, 0
if it does not.
up
is 1
if the Ubik process at the site is reachable from the coordinator, 0
if it is not.
beaconSince
is 1
if the site has cast a vote since the last time the coordinator lost connectivity with the site, 0
if it has not.
Including the -long flag produces peer entries even when the response is received from a secondary site, but in that case only the IPv4_address field is guaranteed to be accurate. For example, the value in the <db_version> field is usually 0.0
, because secondary sites do not poll their peers for this information. The values in the last_vote and last_beacon fields indicate when this site last received or requested a vote as coordinator; they generally indicate the time of the last coordinator election.
This example checks the status of the Ubik process for the Location Service on the machine db1.your-cell-name.com
, which is the Ubik coordinator.
% udebug db1.your-cell-name.com vlserver First response received from [2001:db8::d0d7:d546:dd6:60ef]:7003 Host's addresses are: 192.168.154.37 Host's time is Sat Oct 23 16:55:26 2021 Local time is Sat Oct 23 16:55:26 2021 Last yes vote for 192.168.154.37 was 3 secs ago (coordinator); Last vote started 3 secs ago (at Sat Oct 23 16:55:22 2021) Local db version is 1634562705.302 I am coordinator until 18 secs from now (at Sat Oct 23 16:55:43 2021) (4 servers) Recovery state 1f (have best db version; sync complete; db modified) The last trans I handled was 1634562705.17630 Coordinator's db version is 1634562705.302 0 locked pages, 0 of them for write Last time a new db version was labelled was: 459821 secs ago (at Mon Oct 18 09:11:44 2021) Server (192.168.154.24): (db 1634562705.302) last vote rcvd 3 secs ago (at Sat Oct 23 16:55:22 2021), last beacon sent 3 secs ago (at Sat Oct 23 16:55:22 2021), last vote was yes dbcurrent=1, up=1 beaconSince=1 Server (192.168.0.252 172.16.16.66): (db 1634562705.302) last vote rcvd 3 secs ago (at Sat Oct 23 16:55:22 2021), last beacon sent 3 secs ago (at Sat Oct 23 16:55:22 2021), last vote was yes dbcurrent=1, up=1 beaconSince=1 Server (192.168.154.47): (db 1634562705.302) last vote rcvd 3 secs ago (at Sat Oct 23 16:55:22 2021), last beacon sent 3 secs ago (at Sat Oct 23 16:55:22 2021), last vote was yes dbcurrent=1, up=1 beaconSince=1
This example checks the status of the Protection Service on the machine with IP address 192.168.154.47, which is a secondary site.
% udebug 192.168.154.47 7002 First response received from [192.168.154.47]:7002 Host's addresses are: 192.168.154.47 Host's time is Sat Oct 23 17:00:37 2021 Local time is Sat Oct 23 17:00:35 2021 (time differential -2 secs) Last yes vote for 192.168.154.37 was 2 secs ago (coordinator); Last vote started 2 secs ago (at Sat Oct 23 17:00:33 2021) Local db version is 1628522707.48 I am not coordinator Lowest host 192.168.154.37 was set 2 secs ago Coordinator host 192.168.154.37 was set 2 secs ago The last trans I handled was 0.3638 Coordinator's db version is 1628522707.48 0 locked pages, 0 of them for write
None
buserver(8), ptserver(8), vlserver(8)
IBM Corporation 2000. http://www.ibm.com/ All Rights Reserved.
This documentation is covered by the IBM Public License Version 1.0. It was converted from HTML to POD by software written by Chas Williams and Russ Allbery, based on work by Alf Wachsmann and Elizabeth Cassell.
"AFS" is a registered mark of International Business Machines Corporation, used under license. (USPTO Registration 1598389)
"OpenAFS" is a registered mark of International Business Machines Corporation. (USPTO Registration 4577045)
The "AuriStor" name, log 'S' brand mark, and icon are registered marks of AuriStor, Inc. (USPTO Registrations 4849419, 4849421, and 4928460) (EUIPO Registration 015539653).
"Your File System" is a registered mark of AuriStor, Inc. (USPTO Registrations 4801402 and 4849418).
"YFS" and "AuriStor File System" are trademarks of AuriStor, Inc.