In the following example, an RS6000 has 3 disks, 2 of which have the AIX
filesystems mirrored on. The boolist contains both hdisk0 and hdisk1.
There are no other logical volumes in rootvg other than the AIX system
logical volumes. hdisk0 has failed and need replacing, both hdisk0 and hdisk1
are in "Hot Swap" carriers and therefore the machine does not need shutting
down.
lspv
hdisk0 00522d5f22e3b29d rootvg
hdisk1 00522d5f90e66fd2 rootvg
hdisk2 00522df586d454c3 datavg
lsvg -l rootvg
rootvg:
LV NAME TYPE LPs PPs PVs LV STATE MOUNT POINT
hd6 paging 4 8 2 open/syncd N/A
hd5 boot 1 2 2 closed/syncd N/A
hd8 jfslog 1 2 2 open/syncd N/A
hd4 jfs 1 2 2 open/syncd /
hd2 jfs 12 24 2 open/syncd /usr
hd9var jfs 1 2 2 open/syncd /var
hd3 jfs 2 4 2 open/syncd /tmp
hd1 jfs 1 2 2 open/syncd /home
1, Reduce the logical volume copies from both disks to hdisk1 only :-
rmlvcopy hd6 1 hdisk0
rmlvcopy hd5 1 hdisk0
rmlvcopy hd8 1 hdisk0
rmlvcopy hd4 1 hdisk0
rmlvcopy hd2 1 hdisk0
rmlvcopy hd9var 1 hdisk0
rmlvcopy hd3 1 hdisk0
rmlvcopy hd1 1 hdisk0
2, Check that no logical volumes are left on hdisk0 :-
lspv -p hdisk0
hdisk0:
PP RANGE STATE REGION LV ID TYPE MOUNT POINT
1-101 free outer edge
102-201 free outer middle
202-301 free center
302-401 free inner middle
402-501 free inner edge
3, Remove the volume group from hdisk0
reducevg -df rootvg hdisk0
4, Recreate the boot logical volume on hdisk1, and reset bootlist:-
bosboot -a -d /dev/hdisk1
bootlist -m normal rmt0 cd0 hdisk1
5, Check that everything has been removed from hdisk0 :-
lspv
hdisk0 00522d5f22e3b29d None
hdisk1 00522d5f90e66fd2 rootvg
hdisk2 00522df586d454c3 datavg
6, Delete hdisk0 :-
rmdev -l hdisk0 -d
7, Remove the failed hard drive and replace with a new hard drive.
8, Configure the new disk drive :-
cfgmgr
9, Check new hard drive is present :-
lspv
10, Include the new hdisk in root volume group :-
extendvg rootvg hdisk? (where hdisk? is the new hard disk)
11, Re-create the mirror :-
mirrorvg rootvg hdisk? (where hdisk? is the new hard disk)
12, Syncronise the mirror :-
syncvg -v rootvg
13, Reset the bootlist :-
bootlist -m normal rmt0 cd0 hdisk0 hdisk1
14, Turn off Quorum checking on rootvg :-
chvg -Q n rootvg
Tuesday, December 16, 2008
Monday, December 8, 2008
Fix a Full / Filesystems
Use the Following steps:
1. Use the who command to read the contents of the /etc/security/failedlogin file:
# who /etc/security/failedlogin
The condition of TTYs respawning too rapidly can create failed login entries.
To clear the file after reading or saving the output, execute the following
command:
# cp /dev/null /etc/security/failedlogin
2. Check for very large files that might be removed using the find command. For
example, to find all files in the root (/) directory larger than 1 MB, use the
following command:
# find / -xdev -size +2048 -ls |sort -r +6
Before removing any files, use the command fuser to ensure a file is not
currently in use by a user process:
fuser filename
1. Use the who command to read the contents of the /etc/security/failedlogin file:
# who /etc/security/failedlogin
The condition of TTYs respawning too rapidly can create failed login entries.
To clear the file after reading or saving the output, execute the following
command:
# cp /dev/null /etc/security/failedlogin
2. Check for very large files that might be removed using the find command. For
example, to find all files in the root (/) directory larger than 1 MB, use the
following command:
# find / -xdev -size +2048 -ls |sort -r +6
Before removing any files, use the command fuser to ensure a file is not
currently in use by a user process:
fuser filename
Fix a full /var Filsystem
You have to do the following Steps:
1. You can use the find command to look for large files in the /var directory. For
example:
# find /var -xdev -size +2048 -ls| sort -r +6
2. Check for obsolete or leftover files in /var/tmp.
3. Check the size of the /var/adm/wtmp file, which logs all logins, rlogins and
telnet sessions. The log will grow indefinitely unless system accounting is
running. System accounting clears it out nightly. The /var/adm/wtmp file can
be cleared out or edited to remove old and unwanted information. To clear it,
use the following command:
# cp /dev/null /var/adm/wtmp
4. Clear the error log in the /var/adm/ras directory using the following procedure.
The error log is never cleared unless it is manually cleared.
a. Stop the error daemon using the following command:
# /usr/lib/errstop
b. Remove or move to a different file system the error log file by using one of
the following commands:
# rm /var/adm/ras/errlog
or
# mv /var/adm/ras/errlog filename
c. Restart the error daemon using the following command:
# /usr/lib/errdemon
5. Check whether the trcfile file in this directory is large.
If it is large and trace is not currently being run,
you can remove the file using the following
command:
# rm /var/adm/ras/trcfile
6. If your dump device is set to hd6 (which is the default), there might be a
number of vmcore* files in the /var/adm/ras directory. If their file dates are old
or you do not want to retain them, you can remove them with the rm
command.
7. Check the /var/spool directory, which contains the queuing subsystem files.
Clear the queuing subsystem using the following commands:
# stopsrc -s qdaemon
0513-044 The qdaemon Subsystem was requested to stop.
# rm /var/spool/lpd/qdir/*
# rm /var/spool/lpd/stat/*
# rm /var/spool/qdaemon/*
# startsrc -s qdaemon
0513-059 The qdaemon Subsystem has been started. Subsystem PID is 291042.
#
8. Check the /var/adm/acct directory, which contains accounting records. If
accounting is running, this directory may contain several large files.
9. Modify the /var/tmp/snmpd.log, which records events from the snmpd daemon.
If the file is removed it will be recreated by the snmpd daemon.
10. Modify the /var/adm/sulog file, which records the number of attempted uses of
the su command and whether each was successful. This is a flat file and can
be viewed and modified with a favorite editor. If it is removed, it will be
recreated by the next attempted su command
1. You can use the find command to look for large files in the /var directory. For
example:
# find /var -xdev -size +2048 -ls| sort -r +6
2. Check for obsolete or leftover files in /var/tmp.
3. Check the size of the /var/adm/wtmp file, which logs all logins, rlogins and
telnet sessions. The log will grow indefinitely unless system accounting is
running. System accounting clears it out nightly. The /var/adm/wtmp file can
be cleared out or edited to remove old and unwanted information. To clear it,
use the following command:
# cp /dev/null /var/adm/wtmp
4. Clear the error log in the /var/adm/ras directory using the following procedure.
The error log is never cleared unless it is manually cleared.
a. Stop the error daemon using the following command:
# /usr/lib/errstop
b. Remove or move to a different file system the error log file by using one of
the following commands:
# rm /var/adm/ras/errlog
or
# mv /var/adm/ras/errlog filename
c. Restart the error daemon using the following command:
# /usr/lib/errdemon
5. Check whether the trcfile file in this directory is large.
If it is large and trace is not currently being run,
you can remove the file using the following
command:
# rm /var/adm/ras/trcfile
6. If your dump device is set to hd6 (which is the default), there might be a
number of vmcore* files in the /var/adm/ras directory. If their file dates are old
or you do not want to retain them, you can remove them with the rm
command.
7. Check the /var/spool directory, which contains the queuing subsystem files.
Clear the queuing subsystem using the following commands:
# stopsrc -s qdaemon
0513-044 The qdaemon Subsystem was requested to stop.
# rm /var/spool/lpd/qdir/*
# rm /var/spool/lpd/stat/*
# rm /var/spool/qdaemon/*
# startsrc -s qdaemon
0513-059 The qdaemon Subsystem has been started. Subsystem PID is 291042.
#
8. Check the /var/adm/acct directory, which contains accounting records. If
accounting is running, this directory may contain several large files.
9. Modify the /var/tmp/snmpd.log, which records events from the snmpd daemon.
If the file is removed it will be recreated by the snmpd daemon.
10. Modify the /var/adm/sulog file, which records the number of attempted uses of
the su command and whether each was successful. This is a flat file and can
be viewed and modified with a favorite editor. If it is removed, it will be
recreated by the next attempted su command
Sunday, December 7, 2008
Aix Tunable parameter settings
Parameters Commands Expected Results.
Operating system Command oslevel -r 5.3.0.0 - 02, or Latest Level with Latest ML
Output
Command lppchk -c No Errors
Output
OS Settings Command date Date and Time
Output
Command date -u Diff 5:30 Hrs
Output
Command cat /etc/enviorment | grep TZ TZ=IST-5:30,, or TZ=IST-5:30
Output
Command lsattr -El mem0 As per the config
Output
Command lsps -a 2xMemory Size, if the memory Size is less than 8GB. If it s more than 8 GB Then 1.5 Times and if it s more than 16GB, same as much as Physical memory. Also note that If the paging space is in the rootvg only then make one paging space only.
Output
Command lsattr -El sys0 | grep maxuproc Value must be more 2048, ( minimus or as per the customer requirement)
Output
Command lsattr -El sys0 | grep maxpout Value must 32 ( If there is Cluster only)
Output
Command lsattr -El sys0 | grep minpout Value must be 24 ( If there is Cluster only)
Output
Command ulimit -a Ulimited for all except core size.
Output
RootVG Settings Command df -k / more than 512 MB
Output
Command df -k /var More than 1GB
Output
Command df -k /tmp More than 1 GB
Output
Command df -k /usr More than 2.5GB
Output
Command df -k /opt More than 512MB
Output
Command df -k /home More than 512MB
Output
Command lsvg -l rootvg All Lvs mirrored and syncd except
Output lg_dumplv
Command sysdumpdev -l always allow dump true, dump compression on
Output
Command sysdumpdev -e as per the memory size
Output
Command lslv lg_dumplv above value x 1.2
Output
Virtual Memory Settings Command vmo -L | grep grep maxfree 128*No CPUs
Output
Command vmo -L | grep minfree 120* No CPUs
Output
Command vmo -L | grep maxperm 20% Pages
Output
Command vmo -L | grep minperm 10 % pages
Output
Network Parameters Command no -L | grep sb_max value 1310720
Output
Command no -L | grep rfc1323 value 1
Output
Command no -L | grep tcp_sendspace value 221184
Output
Command no -L | grep tcp_recievespace value 221184
Output
Command no -L | grep udp_sendspace value 65536
Output
Command no -L | grep udp_recievespace. value 655360
Output
Command no -L | grep ipqmaxlen Value=512
Output Needs reboot.
IO Parameters Command ioo -L | grep sync_release_ilock Value 1
Output
Error Report Command errpt NO Permanent Hardware Errors
Output
Command diag NO Problem found in system Verification mode
Output after selecting all the resources
Operating system Command oslevel -r 5.3.0.0 - 02, or Latest Level with Latest ML
Output
Command lppchk -c No Errors
Output
OS Settings Command date Date and Time
Output
Command date -u Diff 5:30 Hrs
Output
Command cat /etc/enviorment | grep TZ TZ=IST-5:30,, or TZ=IST-5:30
Output
Command lsattr -El mem0 As per the config
Output
Command lsps -a 2xMemory Size, if the memory Size is less than 8GB. If it s more than 8 GB Then 1.5 Times and if it s more than 16GB, same as much as Physical memory. Also note that If the paging space is in the rootvg only then make one paging space only.
Output
Command lsattr -El sys0 | grep maxuproc Value must be more 2048, ( minimus or as per the customer requirement)
Output
Command lsattr -El sys0 | grep maxpout Value must 32 ( If there is Cluster only)
Output
Command lsattr -El sys0 | grep minpout Value must be 24 ( If there is Cluster only)
Output
Command ulimit -a Ulimited for all except core size.
Output
RootVG Settings Command df -k / more than 512 MB
Output
Command df -k /var More than 1GB
Output
Command df -k /tmp More than 1 GB
Output
Command df -k /usr More than 2.5GB
Output
Command df -k /opt More than 512MB
Output
Command df -k /home More than 512MB
Output
Command lsvg -l rootvg All Lvs mirrored and syncd except
Output lg_dumplv
Command sysdumpdev -l always allow dump true, dump compression on
Output
Command sysdumpdev -e as per the memory size
Output
Command lslv lg_dumplv above value x 1.2
Output
Virtual Memory Settings Command vmo -L | grep grep maxfree 128*No CPUs
Output
Command vmo -L | grep minfree 120* No CPUs
Output
Command vmo -L | grep maxperm 20% Pages
Output
Command vmo -L | grep minperm 10 % pages
Output
Network Parameters Command no -L | grep sb_max value 1310720
Output
Command no -L | grep rfc1323 value 1
Output
Command no -L | grep tcp_sendspace value 221184
Output
Command no -L | grep tcp_recievespace value 221184
Output
Command no -L | grep udp_sendspace value 65536
Output
Command no -L | grep udp_recievespace. value 655360
Output
Command no -L | grep ipqmaxlen Value=512
Output Needs reboot.
IO Parameters Command ioo -L | grep sync_release_ilock Value 1
Output
Error Report Command errpt NO Permanent Hardware Errors
Output
Command diag NO Problem found in system Verification mode
Output after selecting all the resources
Problem Determination Server hungs at 546 LED due to bosboot failure
1. As server had not booted in normal mode. Once server gets hung & stucked on 546 LED during booting.It means its bosboot command to re-create boot image failed on server. Boot server in maintainence mode via CD. Access shell whithout mounting filesystem.
1. Check file system consistency
fsck /dev/hd4
fsck /dev/hd2
fsck /dev/hd3
fsck /dev/hd9var
fsck /dev/hd1
2. Format JFS log
/usr/sbin/logform /dev/hd8
3. Check disk in root vg
lsvg -p vg --> hdisk0, hdisk3
4. Check boot partitions
lslv -m hd5 --> hdisk0
5. Check bootlist in normal mode.
bootlist -m normal -o --> hdiks0
cd /dev, ls -l | grep ipl --> nothing comes back as there is proble,
Actual output should be if every thing is ok
# cd /dev
# ls -l | grep ipl
crw-rw---- 2 root system 10, 1 Aug 04 18:21 ipl_blv
crw------- 2 root system 17, 0 Aug 04 18:19 ipldevice
#
6. Create link as mentioned below
ln rhd5 ipl_blv -->ok
ln hdisk0 ipldevice -->ok
7. Set bootlist
bootlist -m normal hdisk0 -->ok
bootlist -m normal -o -->
8. Re-create boot image
bosboot -ad /dev/ipldevice --> 0301-162 save base failed /dev/hdisk0
0301-165 bosboot failed do not attempt to boot device
savebase -->ok
bosboot -ad /dev/ipldevice --> 0301-162 save base failed /dev/hdisk0
0301-165 bosboot failed do not attempt to boot device
9. List hd5 in rootvg.
lsvg -l rootvg --> hd5 has 1 pp and not mirrored
10. Remove hd5
rmlv hd5
11. Clear boot image
chpv -c hdisk0
chpv -c hdisk3
11. Create hd5
mklv -t boot -y hd5 -ae -c 1 rootvg 1 hdisk0 -->savebase failed
savebase -d /dev/hdisk0 -->ok
mklv -t boot -y hd5 -ae -c 1 rootvg 1 hdisk0 -->savebase failed
synclvodm -Pv rootvg
savebase -d /dev/hdisk0
mklv -t boot -y hd5 -ae -c 1 rootvg 1 hdisk0 -->savebase failed
lsvg -l rootvg --> hd5 was there
savebase -d /dev/hdisk0
ln /dev/rhd5 /dev/ipl_blv
ls -al | grep ipl -> make sure his has ipldevice and ipl_blv
bosboot -ad /dev/ipldevice
ipl_varyon -i --> yes in front of hdisk0
bootlist -m normal -o --> hdisk0 blv=hd5
sync;sync;sync;reboot --> worked we are booted up
1. Check file system consistency
fsck /dev/hd4
fsck /dev/hd2
fsck /dev/hd3
fsck /dev/hd9var
fsck /dev/hd1
2. Format JFS log
/usr/sbin/logform /dev/hd8
3. Check disk in root vg
lsvg -p vg --> hdisk0, hdisk3
4. Check boot partitions
lslv -m hd5 --> hdisk0
5. Check bootlist in normal mode.
bootlist -m normal -o --> hdiks0
cd /dev, ls -l | grep ipl --> nothing comes back as there is proble,
Actual output should be if every thing is ok
# cd /dev
# ls -l | grep ipl
crw-rw---- 2 root system 10, 1 Aug 04 18:21 ipl_blv
crw------- 2 root system 17, 0 Aug 04 18:19 ipldevice
#
6. Create link as mentioned below
ln rhd5 ipl_blv -->ok
ln hdisk0 ipldevice -->ok
7. Set bootlist
bootlist -m normal hdisk0 -->ok
bootlist -m normal -o -->
8. Re-create boot image
bosboot -ad /dev/ipldevice --> 0301-162 save base failed /dev/hdisk0
0301-165 bosboot failed do not attempt to boot device
savebase -->ok
bosboot -ad /dev/ipldevice --> 0301-162 save base failed /dev/hdisk0
0301-165 bosboot failed do not attempt to boot device
9. List hd5 in rootvg.
lsvg -l rootvg --> hd5 has 1 pp and not mirrored
10. Remove hd5
rmlv hd5
11. Clear boot image
chpv -c hdisk0
chpv -c hdisk3
11. Create hd5
mklv -t boot -y hd5 -ae -c 1 rootvg 1 hdisk0 -->savebase failed
savebase -d /dev/hdisk0 -->ok
mklv -t boot -y hd5 -ae -c 1 rootvg 1 hdisk0 -->savebase failed
synclvodm -Pv rootvg
savebase -d /dev/hdisk0
mklv -t boot -y hd5 -ae -c 1 rootvg 1 hdisk0 -->savebase failed
lsvg -l rootvg --> hd5 was there
savebase -d /dev/hdisk0
ln /dev/rhd5 /dev/ipl_blv
ls -al | grep ipl -> make sure his has ipldevice and ipl_blv
bosboot -ad /dev/ipldevice
ipl_varyon -i --> yes in front of hdisk0
bootlist -m normal -o --> hdisk0 blv=hd5
sync;sync;sync;reboot --> worked we are booted up
Subscribe to:
Posts (Atom)