Information About

  • Subscribe to our RSS feed.
  • Twitter
  • StumbleUpon
  • Reddit
  • Facebook
  • Digg

Monday, 24 September 2007

Histogram Statistics [DB2 9 for z/OS]

Posted on 09:31 by Unknown

Another utility upgrade that has found its way into DB2 9 for z/OS is the ability to gather histogram statistics. This feature is already available in DB2 for Linux, Unix, and Windows… and after you migrate to DB2 V9 it will be available to you on z/OS.

What is it? Well, let’s first define what a histogram is for those of you who are not statistics experts. A histogram is a way of summarizing data that is measured on an interval scale. A histogram is particularly helpful to quickly highlight how data is distributed; to determine if data is symmetrical or skewed; and to indicate whether or not outliers exists.

The histogram is only appropriate for variables whose values are numerical and measured on an interval scale. It is generally used when dealing with large data sets. Histogram statistics can be quite useful to the optimizer for certain types of queries.

Instead of the frequency statistics, which are collected for only a subset of the data, sometimes DB2 can improve access path selection by estimating predicate selectivity from histogram statistics, which are collected over all values in a table space.

Consider collecting histogram statistics to improve access paths for troublesome queries with RANGE, LIKE, and BETWEEN predicates. They can also help in some cases for =, IS NULL, IN LIST and COL op COL predicates.

How to Collect Histogram Statistics

IBM RUNSTATS in DB2 V9 can collect statistics by quantiles. DB2 allows up to 100 quantiles. The user can specify how many quantiles DB2 is to use from 1 to 100. Of course, avoid 1 because it will not help.

You can tell RUNSTATS to collect histogram statistics by coding the HISTOGRAM keyword in conjunction with the COLGROUP option. In this way you can collect histogram statistics for a group of columns. You must also tell DB2 the number of quantiles to collect by specifying the NUMQUANTILES parameter. NUMQUANTILES can also be specified with the INDEX parameter, in which can it indicates that histogram statistics are to be collected for the columns of the index.

A single value can never broken into more than one interval. This means that the maximum number of intervals is equal to the number of distinct column values. Therefore, be sure that you do not specify a value for NUMQUANTILES that is greater than the total number of distinct values for the column (or column group) specified. Also, keep in mind that any NULLs will occupy a single interval.

So then, how do you decide on the number of quantiles to collect? If you do not specify NUMQUANTILES, the default value of 100 will be used, and then based on the number of records in the table, the number will be readjusted to an optimal number. Therefore, unless you have a good understanding of the application or a viable reason to deviate, a good rule of thumb is to simply let the NUMQUANTILES default and let DB2 work it out.

RUNSTATS will produce an equal-depth histogram. This means that each interval will have about the same number of rows. Please note that this does not mean the same number of values – it is the same number of rows. This means that in some cases a highly frequent single value could potentially occupy an interval all by itself.

The histogram statistics are collected in three new columns: QUANTILENO, LOWVALUE, and HIGHVALUE. These columns can be found in the following six DB2 Catalog tables:

  • SYSIBM.SYSCOLDIST
  • SYSIBM.SYSKEYTGTDIST
  • SYSIBM.SYSCOLDIST_HIST
  • SYSIBM.SYSCOLDISTSTATS
  • SYSIBM.SYSKEYTGTDIST_HIST
  • SYSIBM.SYSKEYTGTDISTSTATS.

Here is an example of a RUNSTATS to gather histogram statistics for the key columns of the indexes.:

RUNSTATS TABLESPACE DB07.CSMTS02
INDEX ALL
HISTOGRAM NUMCOLS 2 NUMQUANTILES 10
SHRLEVEL(CHANGE)
UPDATE ALL
REPORT YES

Summary

Histogram statistics is a very powerful new capability of the RUNSTATS utility that can be used to gather distribution statistics across all data values. These statistics can be helpful when you need additional distribution data to enable the optimizer to arrive at a better access path for certain queries/predicates.

Email ThisBlogThis!Share to XShare to Facebook
Posted in | No comments
Newer Post Older Post Home

0 comments:

Post a Comment

Subscribe to: Post Comments (Atom)

Popular Posts

  • DB2 Locking, Part 8: LOBs and Locking
    When a row is read or modified in a table containing LOB columns, the application will obtain a normal transaction lock on the base table. T...
  • DB2 for z/OS Version 9 Beta Announcement
    On May 2, 2006 IBM announced the beta for the next version of mainframe DB2: namely, DB2 V9.1 for z/OS. You can view the announcement here ....
  • Index on Expressions [DB2 9 for z/OS]
    DB2 9 for z/OS offers, for the first time, the ability to create an index on data that is not technically in the table. At this point, you m...
  • When Not to Index
    Answering a question I got via e-mail on indexing... Every now and then I take the opportunity to blog about a question I get through e-...
  • CLONE Tables [DB2 9 for z/OS]
    This new feature in DB2 V9 might sound like an old monster movie ( Invasion of the Clone Tables!!! ), but it is actually a nifty new capabil...
  • DB2 Locking, Part 2: Table Space and Table Locks
    Today's post is the second in our DB2 locking series and it covers the topic of table space and table locks. Table Space Locks A table s...
  • IBM Manages the Data Lifecycle
    Data lifecycle is a somewhat new-ish term, at least in terms of what I plan to talk about in this blog posting. The data lifecycle – and dat...
  • IBM Announces DB2 10 for z/OS Beta Program
    IBM announced the beta program for the next version of DB2 today, now "officially" known as DB2 10 (no more DB2 X). It is a closed...
  • DB2 Locking, Part 5: Lock Suspensions, Timeouts, and Deadlocks
    The longer a lock is held, the greater the potential impact to other applications. When an application requests a lock that is already held ...
  • DB2 Locking, Part 15: Tackling Timeout Troubles
    Many shops battle with locking issues and frequently, the cause of performance issues can be traced back to locking issues, more specificall...

Categories

  • .NET
  • ACID
  • ALTER
  • analytics
  • articles
  • automation
  • award
  • backup
  • best practices
  • BETWEEN
  • BI
  • Big Data
  • BIND
  • blogging
  • book review
  • bufferpool
  • buffers
  • CASE
  • change management
  • claim
  • Cognos
  • COMMIT
  • compliance
  • compression
  • conference
  • constraints
  • COPY
  • data
  • data breaches
  • data quality
  • data security
  • Data Sharing
  • data types
  • data warehouse
  • database archiving
  • database auditing
  • database design
  • date
  • DB2
  • DB2 10
  • DB2 11
  • DB2 9
  • DB2 Analystics Accelerator
  • DB2 Catalog
  • DB2 conversion
  • DB2 Developer's Guide
  • DB2 X
  • DB2-L
  • DBA
  • DDL
  • developerWorks
  • dirty read
  • DISPLAY
  • DL/1
  • drain
  • DSNZPARM
  • Dynamic SQL
  • eBook
  • education
  • enclave SRB
  • encryption
  • ERP
  • FETCH FIRST
  • Freakonomics
  • functions
  • generosity factor
  • Happy Holidays
  • Happy New Year
  • Hibernate
  • HIPAA
  • history
  • IBM
  • ICF
  • IDUG
  • IFL
  • IMS
  • index
  • Information Agenda
  • Informix
  • InfoSphere
  • infrastructure
  • integrity
  • IOD
  • IOD11
  • IOD2009
  • IOD2011
  • IODGC
  • IRLM
  • ISOLATION
  • Java
  • JDBC
  • load balancing
  • LOBs
  • locking
  • LUW
  • mainframe
  • Malcolm Gladwell
  • manuals
  • memory
  • middleware
  • migration
  • misc
  • monitoring
  • natural key
  • Netezza
  • new blog location
  • NoSQL
  • nulls
  • OLAP
  • optimization
  • Oracle versus DB2
  • packages
  • PCI-DSS
  • performance
  • PIECESIZE
  • poll
  • primary key
  • production data
  • programming
  • Q+A
  • QMF
  • REBIND
  • recovery
  • RedBook
  • regulatory compliance
  • reliability
  • REORG
  • research
  • RI
  • RTO
  • salaries
  • SAP
  • scalability
  • security
  • smarter planet
  • SoftwareOnZ
  • sort
  • SOX
  • specialty processors
  • SPUFI
  • SQL
  • Stage 1
  • Stage 2
  • standards
  • Steelers
  • storage
  • stored procedures
  • stream computing
  • surrogate key
  • SYSADM
  • Sysadmin
  • table expressions
  • table space
  • TechDoc
  • tips and tricks
  • Top Ten
  • trace
  • training
  • triggers
  • Twitter
  • UDFs
  • UNION
  • unstructured data
  • user groups
  • utilities
  • V1
  • V10
  • V2
  • V3
  • V4
  • V5
  • V6
  • V7
  • V8
  • V9
  • variables
  • views
  • VOLATILE
  • Web 2.0
  • webinar
  • Wordle
  • XML
  • z/OS
  • zAAP
  • zIIP

Blog Archive

  • ►  2014 (2)
    • ►  January (2)
  • ►  2013 (50)
    • ►  December (6)
    • ►  November (6)
    • ►  October (5)
    • ►  September (5)
    • ►  August (3)
    • ►  July (7)
    • ►  June (4)
    • ►  May (4)
    • ►  April (5)
    • ►  March (1)
    • ►  February (2)
    • ►  January (2)
  • ►  2012 (17)
    • ►  December (1)
    • ►  November (2)
    • ►  October (3)
    • ►  August (2)
    • ►  July (1)
    • ►  May (1)
    • ►  April (1)
    • ►  March (2)
    • ►  February (2)
    • ►  January (2)
  • ►  2011 (27)
    • ►  December (1)
    • ►  November (1)
    • ►  October (6)
    • ►  September (2)
    • ►  August (3)
    • ►  July (2)
    • ►  June (3)
    • ►  May (2)
    • ►  April (3)
    • ►  March (1)
    • ►  February (3)
  • ►  2010 (29)
    • ►  December (1)
    • ►  October (6)
    • ►  September (1)
    • ►  August (2)
    • ►  July (2)
    • ►  June (1)
    • ►  May (3)
    • ►  April (3)
    • ►  March (3)
    • ►  February (4)
    • ►  January (3)
  • ►  2009 (43)
    • ►  December (5)
    • ►  November (4)
    • ►  October (6)
    • ►  September (2)
    • ►  August (1)
    • ►  July (3)
    • ►  June (2)
    • ►  May (3)
    • ►  April (2)
    • ►  March (4)
    • ►  February (5)
    • ►  January (6)
  • ►  2008 (44)
    • ►  December (1)
    • ►  November (4)
    • ►  October (4)
    • ►  September (6)
    • ►  August (1)
    • ►  July (4)
    • ►  June (3)
    • ►  May (5)
    • ►  April (4)
    • ►  March (4)
    • ►  February (2)
    • ►  January (6)
  • ▼  2007 (51)
    • ►  December (2)
    • ►  November (3)
    • ►  October (5)
    • ▼  September (3)
      • Histogram Statistics [DB2 9 for z/OS]
      • COPY Improvements [DB2 9 for z/OS]
      • MODIFY RECOVERY [DB2 9 for z/OS]
    • ►  August (6)
    • ►  July (4)
    • ►  June (4)
    • ►  May (5)
    • ►  April (8)
    • ►  March (5)
    • ►  February (4)
    • ►  January (2)
  • ►  2006 (60)
    • ►  November (4)
    • ►  October (8)
    • ►  September (4)
    • ►  August (11)
    • ►  July (7)
    • ►  June (2)
    • ►  May (7)
    • ►  April (3)
    • ►  March (6)
    • ►  February (4)
    • ►  January (4)
  • ►  2005 (11)
    • ►  December (3)
    • ►  November (6)
    • ►  October (2)
Powered by Blogger.

About Me

Unknown
View my complete profile