A while ago I did some database upgrades to version 18.104.22.168. One of those upgrades made me a bit nervous. I used “dbca” to perform the upgrade and it was kind of stuck. So I went to the command prompt and started to investigate this. What I found was this:
A session was waiting for “ADR block file read” again and again. The SQL that was causing this waits can be seen in the screenshot too. It was gathering statistics for X$DBKFDG, part of the “gather fixed object stats” step.
A quick research in My Oracle Support did not bring up any helpful information. My first idea was to check the ADR but I found it to be nearly empty, at least in the suspicous directories like “trace” etc. So I posted this issue on Twitter and asked for help. @realadrienne pointed to a similar issue related to Recovery Advisor and the list of critical issues. So I checked that using
SQL> select * from v$ir_failure;
which returned loads of records that had a priority of “critical” and a status of “closed”. So I tried to get rid of these records using “adrci”.
adrci> purge -age 1 -type hm
This took quite a while, about 10 minutes. to complete. But in the end the v$ir_failure view had no records anymore and the fixed object stats where gathered quickly.
So now there is an additional pre-upgrade check on my personal list, that says “clean up v$ir_failure”. You should add this to your list too to prevent unnecessary delays during database upgrades.
Thanks to @realadrienne, @FranckPachot and @OracleSK for the quick and very helpful assistance.
Thanks also to @MikeDietrichDE for the feedback on this issue.
Yesterday one of my customers wanted to patch two 2-node clusters with the current PSU October 2016 (161018). Both are running 22.214.171.124 Grid Infrastructure and 126.96.36.199 Database. The servers run SPARC Solaris 10. When applying the patch on the first cluster using “opatchauto” everything went fine until the “trying to apply SQL Patch” part on the 2nd node. So I went to the log directory and found the following:
$ cd $ORACLE_BASE/cfgtoollogs/sqlpatch/sqlpatch_27075_2016_11_30_17_12_08
$ tail sqlpatch_catcon_0.log
SQL> GRANT SELECT ON sys.gv_$instance TO dv_secanalyst
At that line it was stuck. Searching My Oracle Support brought up nothing helpful. So I had a look at the database sessions:
SQL> select sid, username, event, state, seconds_in_wait
2 from v$session where username='SYS';
SID USERNAME EVENT STATE SECONDS_IN_WAIT
---------- ------------------------------ ---------------------------------------------------------------- ----------- -------- ---------------
13 SYS SQL*Net message from client WAITING 226
30 SYS SQL*Net message from client WAITING 473
32 SYS SQL*Net message to client WAITED SHOR T TIME 0
411 SYS SQL*Net message from client WAITING 473
783 SYS library cache lock WAITING 211
786 SYS SQL*Net message from client WAITING 4
1155 SYS SQL*Net message from client WAITING 467
The session is waiting for something dictionary related. Since the waiting statement was related to RAC, I stopped the other instance which made sqlplatch continue immediately. So the workaround looked like this:
$ srvctl stop instance -db <dbname> -node <node1>
$ srvctl start instance -db <dbname> -node <node1>
This happened on both clusters. So be aware of that in case you are applying that PSU patch to RAC databases.
In case you missed to stop the 1st instance in time, the GRANT statement will run into a timeout (ORA-4021) and the SQL-Patch will be marked with “ERROR” in DBA_REGISTRY_SQLPATCH. In such case, just re-run “datapatch” again and monitor the logfile.
I was not able to reproduce this issue on a Linux x86-64 system. So there is a chance that the issue is OS related.
Finally I reproduced this issue on my Linux x86-64 test system. Now I opened a SR for that.
Thanks to a quick and efficient Oracle Support guy (yes, there are such people!) we found the root cause of that issue. There is a bug in the Enterprise Manager Agent (DB Express maybe too) that it holds a shared lock on some GV$ views during the whole lifetime of a session. That’s why datapatch got stuck. If you just stop the Agent, datapatch will continue immediatly. There is no need to stop the whole instance. We just need to get rid of the Agent’s sessions.
Thanks a lot to Prakash from Oracle Support for his engagement in investigating this issue.