Month: December 2009

user centered design

It’s the day before the new year and I have two topics related to development that I am hoping to complete today to wrap up 2009. Then I plan to refocus this blog on its originally intended topic – measurement. I can’t promise not to wander off topic again but I will do my best.A few weeks ago, Cary Millsap sent me a link to The Fable of the User-Centered Designer. It’s short and an enjoyable

user centered design

It’s the day before the new year and I have two topics related to development that I am hoping to complete today to wrap up 2009. Then I plan to refocus this blog on its originally intended topic – measurement. I can’t promise not to wander off topic again but I will do my best.A few weeks ago, Cary Millsap sent me a link to The Fable of the User-Centered Designer. It’s short and an enjoyable

2009’s top blog posts

Below are my top six blog postings of 2009, in order of the number of page views as calculated by Google Analytics:

  1. Announcing release of HadoopDB (longer version), and (shorter version). Combined visits: 31,228 (26,650 and 4,638 for the longer and shorter versions respectively).

    The post gave an overview of the HadoopDB project that was released from my research group at Yale over the summer. The basic idea is to combine the scalability and ease-of-use of Hadoop with the performance on structured data of relational database systems.

  2. A tour through hybrid column/row-oriented DBMS schemes. 2,922 visits.

    This might be the post I’m most proud of, from the ones on this list. It went through different ways one can combine row-store and column-store technology in a single DBMS. I believe that hybrid database systems along the lines described in the post are going to take off in 2010, and the predictions made in the post will soon come to fruition.

  3. ParAccel and their puzzling TPC-H results. 2,499 visits.

    Of the posts on this list, this is the one I like the least, and the one that needs to be rewritten the most. I don’t recommend reading it, but if you do, make sure you read the corrections in the comment thread in addition to the main article. However, the post is very stale at this point, since it discusses some TPC-H results from ParAccel that were challenged by a competitor and ParAccel later withdrew in September. ParAccel has not yet rerun these 30TB TPC-H benchmark results.

  4. Watch out for VectorWise. 1,595 visits.

    This post discussed the technology behind Ingres’ new column-store storage engine designed for analytical workloads, based on some research out of CWI. I am very high on this technology and my research group has had a chance to play around with it a little this winter.

  5. Analysis of the “MapReduce Online” paper. 1,108 visits.

    I’m surprised this post got so many visits, since it was really geared only for readers in the research community. This post reviewed some research performed at the University of California, Berkeley, which explored how to pipeline results from different phases in MapReduce jobs to improve performance and enable early estimations of results. Rumor has it that this paper was accepted to NSDI 2010. I think the model of releasing papers as technical reports and independently reviewing and recommending them in public on venues like blogs is an interesting model to consider for the next decade, rather than the private ‘accept’ or ‘reject’ reviewing process currently used today.

  6. Kickfire’s approach to parallelism. 1,042 visits.

    This post takes a deeper dive into Kickfire’s technology than you’ll find in most other places. I find the way that they use FPGA technology to maximize the parallelism that can be achieved on analytical queries in a single-box machine to be quite impressive.

The post from 2009 that I feel is the most underrated (meaning that the number of visits did not match up with what I felt was the quality of the post) was:

  • What is the right way to measure scale?

    I really feel that people thought about scale in the wrong way in 2009. People assume that if a database can fit a lot of data inside of it (i.e. many petabytes), it must be really scalable. But if the data is not very accessible (i.e. it is stored on slow media, or it takes forever to scan through it all because there are not enough disk spindles or CPUs to process it), then the system is not nearly as scalable as it would seem.

Bottom line, if you only have time to read three of my postings from 2009, I would like them to be:

  1. The longer HadoopDB post (this is the only post about my own research)
  2. The hybrid column/row-store post
  3. The post on measuring scale

Overall, I’m pleased with the impact my blog seems to have had, and I intend to continue write posts for it in 2010.

2009’s top blog posts

Below are my top six blog postings of 2009, in order of the number of page views as calculated by Google Analytics:Announcing release of HadoopDB (longer version), and (shorter version). Combined visits: 31,228 (26,650 and 4,638 for the longer and shor…

Hinting Sub-Queries on Oracle

This posting is a purely Oracle RDBMS discussion about how to correctly apply hints to sub-queries. However, it is particularly relevant to PeopleSoft, which it makes extensive use of correlated sub-queries. The point of this story is not the particular hints that I applied, but where I placed the hints, and how I scoped them to the sub-queries.

The following SQL extract is from the delivered Global Payroll GPGB_EDI process (although I have already made some minor changes, and PS_GP_RSLT_PIN is partitioned). Notice that each sub-query joins two tables together. PS_GP_RSLT_PIN is second largest of Global Payroll result tables. PS_GP_PIN is a look-up table, and the criterion on PIN_CODE will only return a single row.

UPDATE Table(GPGB_EDIE_TMP) X
SET X.GPGB_WK53_IND = (
SELECT %TrimSubstr(%Sql(FUNCLIB_HR_CHAR,A.CALC_RSLT_VAL),1,2)
FROM PS_GP_RSLT_PIN A
,PS_GP_PIN B

WHERE A.EMPLID = %Table(GPGB_EDIE_TMP).EMPLID
...
AND A.PIN_NUM = B.PIN_NUM 
AND B.PIN_CODE = 'TAX VR PERIOD GBR'
AND A.SLICE_BGN_DT = (
SELECT MAX(D.SLICE_BGN_DT)
FROM PS_GP_RSLT_PIN D
WHERE D.EMPLID = %Table(GPGB_EDIE_TMP).EMPLID
...
AND D.INSTANCE = A.INSTANCE
AND D.PIN_NUM = B.PIN_NUM)
...
)
WHERE EXISTS (
...
)
AND PROCESS_INSTANCE = %Bind(PROCESS_INSTANCE)

This is part of the initial SQL execution plan. The problem is that the sub-query starts by scanning through the PS_GP_RSLT_PIN table, 4 times because there are three correlated sub-queries, and only at the very end does it look up the PIN_CODE by the PIN_NUM.

---------------------------------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart|
---------------------------------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | 65 | 113M (93)| 39:44:51 | |
| 1 | UPDATE | PS_GPGB_EDIE_TMP4 | | | | | |
|* 2 | FILTER | | | | | | |
|* 3 | TABLE ACCESS FULL | PS_GPGB_EDIE_TMP4 | 673K| 41M| 9967 (6)| 00:00:13 |
| 4 | NESTED LOOPS | | 1 | 108 | 4 (0)| 00:00:01 | |
| 5 | PARTITION RANGE SINGLE | | 1 | 83 | 3 (0)| 00:00:01 | KEY |
| 6 | PARTITION LIST SINGLE | | 1 | 83 | 3 (0)| 00:00:01 | KEY |
|* 7 | INDEX RANGE SCAN | PS_GP_RSLT_PIN | 1 | 83 | 3 (0)| 00:00:01 | KEY |
| 8 | SORT AGGREGATE | | 1 | 72 | | | |
| 9 | PARTITION RANGE SINGLE | | 1 | 72 | 3 (0)| 00:00:01 | KEY |
| 10 | PARTITION LIST SINGLE | | 1 | 72 | 3 (0)| 00:00:01 | KEY |
|* 11 | INDEX RANGE SCAN | PS_GP_RSLT_PIN | 1 | 72 | 3 (0)| 00:00:01 | KEY |
| 12 | SORT AGGREGATE | | 1 | 83 | | | |
| 13 | PARTITION RANGE SINGLE | | 1 | 83 | 3 (0)| 00:00:01 | KEY |
| 14 | PARTITION LIST SINGLE | | 1 | 83 | 3 (0)| 00:00:01 | KEY |
| 15 | FIRST ROW | | 1 | 83 | 3 (0)| 00:00:01 | |
|* 16 | INDEX RANGE SCAN (MIN/MAX) | PS_GP_RSLT_PIN | 1 | 83 | 3 (0)| 00:00:01 |
| 17 | SORT AGGREGATE | | 1 | 83 | | | |
| 18 | PARTITION RANGE SINGLE | | 1 | 83 | 158 (99)| 00:00:01 | KEY |
| 19 | PARTITION LIST SINGLE | | 1 | 83 | 158 (99)| 00:00:01 | KEY |
| 20 | FIRST ROW | | 1 | 83 | 158 (99)| 00:00:01 | |
|* 21 | INDEX RANGE SCAN (MIN/MAX)| PS_GP_RSLT_PIN | 1 | 83 | 158 (99)| 00:00:01 | KEY |
|* 22 | INDEX RANGE SCAN | PSAGP_PIN | 1 | 25 | 1 (0)| 00:00:01 | |
...
---------------------------------------------------------------------------------------------------------------------------

It would be much better if we started with the PS_GP_PIN table, looked up the PIN_NUM with the PIN_CODE (there is a suitable index on this column) and used the PIN_NUM value as a part of the lookup on PS_GP_RSLT_PIN (again PIN_CODE is in the unique index on that table).

It is tempting to add LEADING hints to the sub-queries, but such a hint does not work because the hint is not scoped to the sub-query and is considered to be invalid because the entire query cannot start at this point.

The only supported place in the query to add a LEADING hint would be the first query, in this case after the UPDATE keyword.

In this case, I have named the query blocks in the sub-queries with the QB_NAME hint. It is valid to put this hint into the sub-query. Then I added LEADING hints for each sub-query after the UPDATE keyword, but I specified their scope using the name of the sub-query specified in the QB_NAME hint. Each sub-query must now start with the PS_GP_PIN table.

UPDATE /*+LEADING(@SUB1 B@SUB1) LEADING(@SUB2 B@SUB2)*/ %Table(GPGB_EDIE_TMP) X
SET X.GPGB_WK53_IND = (
SELECT /*+QB_NAME(SUB1)*/ %TrimSubstr(%Sql(FUNCLIB_HR_CHAR,A.CALC_RSLT_VAL),1,2)
FROM PS_GP_RSLT_PIN A
,PS_GP_PIN B
WHERE A.EMPLID = %Table(GPGB_EDIE_TMP).EMPLID
...
AND A.PIN_NUM = B.PIN_NUM
AND B.PIN_CODE = 'TAX VR PERIOD GBR'
AND A.SLICE_BGN_DT = (
SELECT MAX(D.SLICE_BGN_DT)
FROM PS_GP_RSLT_PIN D
WHERE D.EMPLID = %Table(GPGB_EDIE_TMP).EMPLID
...
AND D.INSTANCE = A.INSTANCE
AND D.PIN_NUM = B.PIN_NUM)
...
)
WHERE EXISTS (
SELECT /*+QB_NAME(SUB2)*/ 'X'
FROM PS_GP_RSLT_PIN A1
,PS_GP_PIN B1
WHERE A1.EMPLID = %Table(GPGB_EDIE_TMP).EMPLID
...
AND A1.PIN_NUM = B1.PIN_NUM
AND B1.PIN_CODE = 'TAX VR PERIOD GBR'
...
)
AND PROCESS_INSTANCE = %Bind(PROCESS_INSTANCE)

This is the new execution plan. The sub-query starts with an access of index PSAGP_PIN (which leads on PIN_CODE) on PS_GP_PIN, and then does the lookups on PS_GP_RSLT_PIN. The cost is the same, but the execution time was considerably reduced.

----------------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time | Pstart|
----------------------------------------------------------------------------------------------------
| 0 | UPDATE STATEMENT | | 1 | 65 | 113M (93)| 39:44:51 | |
| 1 | UPDATE | PS_GPGB_EDIE_TMP4 | | | | | |
|* 2 | FILTER | | | | | | |
|* 3 | TABLE ACCESS FULL | PS_GPGB_EDIE_TMP4 | 673K| 41M| 9967 (6)| 00:00:13 | |
| 4 | NESTED LOOPS | | 1 | 108 | 4 (0)| 00:00:01 | |
|* 5 | INDEX RANGE SCAN | PSAGP_PIN | 1 | 25 | 2 (0)| 00:00:01 | |
| 6 | PARTITION RANGE SINGLE | | 1 | 83 | 2 (0)| 00:00:01 | KEY |
| 7 | PARTITION LIST SINGLE | | 1 | 83 | 2 (0)| 00:00:01 | KEY |
|* 8 | INDEX RANGE SCAN | PS_GP_RSLT_PIN | 1 | 83 | 2 (0)| 00:00:01 | KEY |
| 9 | SORT AGGREGATE | | 1 | 72 | | | |
| 10 | PARTITION RANGE SINGLE | | 1 | 72 | 3 (0)| 00:00:01 | KEY |
| 11 | PARTITION LIST SINGLE | | 1 | 72 | 3 (0)| 00:00:01 | KEY |
|* 12 | INDEX RANGE SCAN | PS_GP_RSLT_PIN | 1 | 72 | 3 (0)| 00:00:01 | KEY |
| 13 | SORT AGGREGATE | | 1 | 83 | | | |
| 14 | PARTITION RANGE SINGLE | | 1 | 83 | 3 (0)| 00:00:01 | KEY |
| 15 | PARTITION LIST SINGLE | | 1 | 83 | 3 (0)| 00:00:01 | KEY |
| 16 | FIRST ROW | | 1 | 83 | 3 (0)| 00:00:01 | |
|* 17 | INDEX RANGE SCAN (MIN/MAX) | PS_GP_RSLT_PIN | 1 | 83 | 3 (0)| 00:00:01 | |
| 18 | SORT AGGREGATE | | 1 | 83 | | | |
| 19 | PARTITION RANGE SINGLE | | 1 | 83 | 158 (99)| 00:00:01 | KEY |
| 20 | PARTITION LIST SINGLE | | 1 | 83 | 158 (99)| 00:00:01 | KEY |
| 21 | FIRST ROW | | 1 | 83 | 158 (99)| 00:00:01 | |
|* 22 | INDEX RANGE SCAN (MIN/MAX)| PS_GP_RSLT_PIN | 1 | 83 | 158 (99)| 00:00:01 | KEY |
| 23 | TABLE ACCESS BY LOCAL INDEX ROWID | PS_GP_RSLT_PIN | 1 | 86 | 3 (0)| 00:00:01 | |
...
---------------------------------------------------------------------------------------------------------------------------

Hinting Sub-Queries on Oracle

This posting is a purely Oracle RDBMS discussion about how to correctly apply hints to sub-queries. However, it is particularly relevant to PeopleSoft, which it makes extensive use of correlated sub-queries. The point of this story is not the particu…

Correlated Delete & Update

Sometimes you want to delete or update rows in one table based on the existence of matching rows in another table. Examples include building up a list of records to be deleted in another table, or new data values being first loaded into a staging tabl…

Correlated Delete & Update

Sometimes you want to delete or update rows in one table based on the existence of matching rows in another table. Examples include building up a list of records to be deleted in another table, or new data values being first loaded into a staging table before updating the main data set. I call this a Correlated Delete or Update, because the rows affected in the main table must correlate with matching rows in the other table. There are different ways we can write such correlated actions in Oracle. Unfortunately one way results in a very poor execution plan and incredibly bad performance – several hours estimated in one example – while the other way might take only a second or two. So it can be very useful to know which is which, and why.

This scenario is easiest seen with a Delete i.e. Delete from tableA where matching rows exist in tableB. We might naturally think of this as being a kind of join between tableA and tableB, assuming tableB has a foreign key to the primary key of tableA. It turns out that Sybase has implemented an extension to its DELETE and UPDATE statements that lets us use join syntax to specify this kind of correlated action, with an additional ‘from’ clause. In Sybase our delete would be:

delete tableA from tableA, tableB where tableA.pkey = tableB.fkey

Unfortunately this is an extension to the ANSI SQL syntax, and Oracle does not have an equivalent syntax. So in Oracle we can only refer to one table in the main table, and need to use sub-queries to refer to the other tables. One way I came across the other day to do this is:

delete from tableA
where exists (select 1 from tableB where tableA.pkey = tableB.fkey)

On the face of it this is correct – we only delete rows in tableA that have a match in tableB. Unfortunately it suffers from terrible performance. In the case I came across I saw that Oracle would take 3 hours to scan tableA (table names changed from their original ones):

-------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
-------------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 33 | 1075K (2)| 03:35:07 |
| 1 | DELETE | A | | | | |
|* 2 | FILTER | | | | | |
| 3 | TABLE ACCESS FULL | A | 320M| 9G| 1072K (2)| 03:34:30 |
|* 4 | TABLE ACCESS BY INDEX ROWID| B | 1 | 82 | 0 (0)| 00:00:01 |
|* 5 | INDEX RANGE SCAN | X1_B | 1 | | 0 (0)| 00:00:01 |
-------------------------------------------------------------------------------------------

This is because the sub-query is correlated – it refers to the outer table (tableA) and so must be executed for each row of tableA from the outer, main query. This results in a full table scan of tableA, and then a join to tableB for the correlated sub-query. If tableA is large and tableB is small with a list of rows to delete, then the performance is very bad indeed.

The solution is to rewrite the sub-query so that it is not correlated, and we can do that using IN i.e. where a row in tableA has a matching row IN tableB. The delete then becomes:

delete from tableA where (tableA.pkey) IN (select tableB.fkey from tableB)

The meaning of this is exactly the same as the other delete, but now the sub-query is not correlated. Oracle can now choose an execution plan that scans tableB to produce a set of rows, and join to tableA using an index. This executes much faster, as you would expect.


------------------------------------------------------------------------------------------
| Id | Operation | Name | Rows | Bytes | Cost (%CPU)| Time |
------------------------------------------------------------------------------------------
| 0 | DELETE STATEMENT | | 1 | 113 | 6 (17)| 00:00:01 |
| 1 | DELETE | A | | | | |
|* 2 | FILTER | | | | | |
| 3 | NESTED LOOPS | | 1 | 113 | 6 (17)| 00:00:01 |
| 4 | SORT UNIQUE | | 1 | 80 | 2 (0)| 00:00:01 |
|* 5 | TABLE ACCESS FULL | B | 1 | 80 | 2 (0)| 00:00:01 |
|* 6 | INDEX RANGE SCAN | X1_A | 1 | 33 | 3 (0)| 00:00:01 |
------------------------------------------------------------------------------------------

As you can see the estimated cost has reduced from over 1 million to just 6, and the estimated elapsed time from 3 and a half hours to just one second! A major improvement. Obviously this improvement is down to the fact that tableB is at least 1,000 times smaller than tableA.

Either way, I believe that using IN is a better way of phrasing the sub-query, because it is clearer to Oracle which way round the relationship is (A depends on B, not vice versa), and gives the optimizer more flexibility of how to execute the SQL statement. If tableB were big enough with respect to tableA, then there is no reason why the optimizer could not go back to the original execution plan – scanning A and joining to B.

Where it really makes a difference is where you have other conditions on tableB within the sub-query – not just the foreign key join to tableA. When using EXISTS, Oracle will ignore extra indexes on tableB and treat the correlated sub-query in the same way – check tableB for each row in tableA. When using IN, Oracle can take advantage of extra indexes on tableB if they help with other conditions on columns of tableB in its sub-query. Thus it can use an index for the initial access to data in tableB, and then use the primary key index into tableA. This results in efficient execution plans for the DELETE.

The problem also occurs with Updates, but is compounded by needing to refer to the same table twice within the update statement. Again, the Sybase syntax is much simpler and more straightforward:


update tableA
set tableA.column1 = tableB.column1, tableA.column2 = tableB.column2
from tableA, tableB
where tableA.pkey = tableB.fkey

In Oracle we need to use a sub-query for tableB in the WHERE clause in the same way as for the Delete statement, and we also need a sub-query for tableB in the SET clause so that it is updated to the values in the matching row in tableB. This is important – without both sub-queries we would either update the wrong rows (WHERE clause problem) or update them to the wrong values (SET clause problem). The Oracle equivalent, using the same rewrite as before is:


update tableA
set (column1, column2) = (select tableB.column1, tableB.column2
from tableB where tableA.pkey = tableB.fkey)
where pkey in (select fkey from tableB)

As already mentioned this particular update will perform well when executed. Typically Oracle will scan tableB if it is much smaller for the sub-query, then join to tableA on its key columns assuming there is an index on them, and then join back to tableB to get the values to update the columns to.

TEL/電話+86 13764045638
Email service@parnassusdata.com
QQ 47079569