Sunday, 3 March 2013

plpgsql: Creating Functions and saving intermetiate query results


Sometimes SQL queries are difficult or impossible to optimize if you are using pure standard SQL.

I started this because I just wanted to find a way to save the result of a sub-query to use later. In my example I wanted to save the average + the standard deviation of a large set of values for some research. I found that Posgresql supports standard SQL only so you can not save query results. Instead you can use plpgsql with is a procedural langauge that can be used with Postgresql (psql).

These are the two tables being considered for these examples.


CREATE TABLE IF NOT EXISTS test 

(

  test_id integer NOT NULL primary key,

  algorithm_data_id integer NOT NULL references 

  test_timestamp timestamp NOT NULL DEFAULT CURRENT_TIMESTAMP,

  test_status integer NOT NULL,

  num_agents integer NOT NULL,

  num_obstacles integer NOT NULL

) ;



CREATE TABLE IF NOT EXISTS composite1

    benchmark_id integer NOT NULL references test(test_id) ON DELETE CASCADE,

    collisions integer NOT NULL,    

    time double precision NOT NULL,    

    success integer NOT NULL,    

    num_unique_collisions integer NOT NULL,       

    total_distance_travelled double precision NOT NULL,       

    agent_complete integer NOT NULL,    

    agent_success integer NOT NULL,    

    optimal_path_length double precision NOT NULL,    

);


Basic syntax for a function definition


CREATE OR REPLACE FUNCTION multiply(x real, y real) RETURNS real AS 
CREATE OR REPLACE The replace part is useful for development if testing a function and updating it but also useful if it so happens that there already exist a function with the same name you want to override it to ensure proper functionality.

After the AS comes the function definition. The most non obvious part that I found was the use of '$$'. Think of the $$ as {} from many programming languages that are used to symbolize a body of code, in this case the function body.


CREATE OR REPLACE FUNCTION multiply(x real, y real) RETURNS real AS
$$
BEGIN
    RETURN x * y;
END;
$$ LANGUAGE plpgsql;
 
This is a simple example of a simple definition that can be used to multiply two numbers.
 
Note the uses of ';' they are only at the end of return statements and queries. This will be seen more later.

A more advanced example:


CREATE OR REPLACE FUNCTION getAVG_WRTTotalTicksforScenario(scenarioID INT) RETURNS double precision AS

$BODY$

DECLARE

average double precision;

BEGIN

    

SELECT AVG(total_total_ticks_accumulated/total_number_of_times_executed) INTO STRICT average

    FROM ppr_ai_data pd2, test test, algorithm_data

    WHERE test.algorithm_data_id = algorithm_data.algorithm_data_id and

           -- test.scenario_group = scenario_group and

           algorithm_data.algorithm_data_id = pd2.ppr_ai_id and

            test.scenario_group = scenarioID;

    RETURN average;

END

$BODY$ LANGUAGE plpgsql; 


Any variables that are going to be used should be defined in the declare section.

The return types of functions can be of the basic types for can be of a time defined for a table. For example



CREATE OR REPLACE FUNCTION getAllSignificantTestsForScenario(scenarioID INT, algorithmName TEXT) RETURNS SETOF Test AS

$BODY$

DECLARE

average double precision; 

std_dev double precision;

r test%rowtype;





BEGIN



average := getAVG_WRTTotalTicksforScenario(scenarioID)

        

std_dev :=    getSTDDEV_WRTTotalTicksforScenario(scenarioID)

                

RAISE NOTICE 'Average is %', average;

RAISE NOTICE 'Standard deviation is %', std_dev;



RETURN QUERY     SELECT * from test where test_id IN (

SELECT t.test_id

    FROM test t, ppr_ai_data pd, algorithm al, algorithm_data Adata

    WHERE t.algorithm_data_id = Adata.algorithm_data_id and

            pd.ppr_ai_id = Adata.algorithm_data_id and

            Adata.algorithm_type = al.algorithm_id and

            al.name = algorithmName and

            t.scenario_group = scenarioID and

        pd.total_total_ticks_accumulated / pd.total_number_of_times_executed > (average + std_dev));

                

RETURN;



END


Return can be used a in few ways. In the above case it is used to return the result of an entire query. In the above example the return is typed to that of a row of the table test. A small trick was used to get around having to return a full test row. The use of Raise in this case is just used to inform the user of the values being used in the sub query. Usually RAISE is used to inform the user issues in the function and can be used to throw exception like many other programming languages [4].

References
  1. http://www.postgresql.org/docs/9.1/static/plpgsql-statements.html
  2. http://www.postgresql.org/docs/8.4/static/plpgsql-declarations.html
  3. http://www.postgresql.org/docs/9.1/static/plpgsql-control-structures.html#PLPGSQL-STATEMENTS-RETURNING
  4. http://www.postgresql.org/docs/9.1/static/plpgsql-errors-and-messages.html

No comments:

Post a Comment