Performance Testing
Use Pherf to generate data and run Phoenix performance tests.

Overview
Pherf is a standalone tool for performance and functional testing through Phoenix. Pherf can be used both to generate highly customized datasets and to measure SQL performance against that data.
Build all of Phoenix (includes Pherf default profile)
mvn clean package -DskipTestsRunning
- Edit
config/env.shto include the required property values. bin/pherf-standalone.py -h- Example:
bin/pherf-standalone.py -drop all -l -q -z [zookeeper] -schemaFile .*user_defined_schema.sql -scenarioFile .*user_defined_scenario.xml
Example run commands
List all scenario files available to run.
./pherf-standalone.py -listFilesDrop all existing tables, load and query data specified in all scenario files.
./pherf-standalone.py -drop all -l -q -z localhostPherf arguments:
-hHelp-lApply schema and load data-qExecutes multi-threaded query sets and writes results-z [quorum]ZooKeeper quorum-mEnable monitor for statistics-monitorFrequency [frequency in ms]Frequency at which the monitor will snapshot stats to log file-drop [pattern]Regex drop all tables with schema name as PHERF. Example drop Event tables:-drop .*(EVENT).*. Drop all:-drop .*or-drop all-scenarioFileRegex or file name of a specific scenario file to run-schemaFileRegex or file name of a specific schema file to run-exportExports query results to CSV files inCSV_EXPORTdirectory-diffCompares results with previously exported results-hintExecutes all queries with specified hint. Example:SMALL-rowCountOverride-rowCountOverride [number of rows]Specify number of rows to be upserted rather than using row count specified in schema
Adding Rules for Data Creation
Review test_scenario.xml for syntax examples.
- Rules are defined as
<columns />and are applied in the order they appear in file. - Rules of the same type override the values of a prior rule of the same type. If
<userDefined>true</userDefined>is set, rule will only apply override when type and name match the column name in Phoenix. <prefix>tag is set at the column level. It can be used to define a constant string appended to the beginning ofCHARandVARCHARdata type values.- Required field Supported Phoenix types:
VARCHAR,CHAR,DATE,DECIMAL,INTEGER- denoted by the
<type>tag
- denoted by the
- User defined true changes rule matching to use both name and type fields to determine equivalence.
- Default is false if not specified and equivalence will be determined by type only. An important note here is that you can still override rules without the user defined flag, but they will change the rule globally and not just for a specified column.
- Required field Supported Data Sequences
RANDOM: Random value which can be bound by other fields such as length.SEQUENTIAL: Monotonically increasing long prepended to random strings.- Only supported on
VARCHARandCHARtypes
- Only supported on
LIST: Means pick values from predefined list of values
- Required field Length defines boundary for random values for
CHARandVARCHARtypes.- denoted by the
<length>tag
- denoted by the
- Column level Min/Max value defines boundaries for numerical values. For
DATEs, these values supply a range between which values are generated. At the column level the granularity is a year. At a specific data value level, the granularity is down to the Ms.- denoted by the
<minValue>tag - denoted by the
<maxValue>tag
- denoted by the
- Null chance denotes the probability of generating a null value. From [0-100]. The higher the number, the more likely
the value will be null. Denoted by
<nullChance>. - Name can either be any text or the actual column name in the Phoenix table.
- denoted by the
<name>tag
- denoted by the
- Value List is used in conjunction with
LISTdata sequences. Each entry is aDataValuewith a specified value to be used when generating data.- Denoted by the
<valueList><datavalue><value/></datavalue></valueList>tags - If the distribution attribute on the datavalue is set, values will be created according to that probability.
- When distribution is used, values must add up to 100%.
- If distribution is not used, values will be randomly picked from the list with equal distribution.
- Denoted by the
Defining Scenario
A scenario can have multiple querySets. Consider the following example: concurrency of 1-4 means that each query will be executed starting with concurrency level of 1 and reach up to maximum concurrency of 4. Per thread, query would be executed to a minimum of 10 times or 10 seconds (whichever comes first). QuerySet by default is executed serially but you can change executionType to PARALLEL so queries are executed concurrently. Each Query may have an optional timeoutDuration field that defines the amount of time (in milliseconds) before execution for that Query is cancelled. Scenarios are defined in XML files stored in the resource directory.
<scenarios>
<querySet concurrency="1-4" executionType="PARALLEL" executionDurationInMs="10000" numberOfExecutions="10">
<query id="q1" verifyRowCount="false" statement="select count(*) from PHERF.TEST_TABLE"/>
<query id="q2" tenantId="1234567890" timeoutDuration="10000" ddl="create view if not exists myview(mypk varchar not null primary key, mycol varchar)" statement="upsert select ..."/>
</querySet>
<querySet concurrency="3" executionType="SERIAL" executionDurationInMs="20000" numberOfExecutions="100">
<query id="q3" verifyRowCount="false" statement="select count(*) from PHERF.TEST_TABLE"/>
<query id="q4" statement="select count(*) from PHERF.TEST_TABLE WHERE TENANT_ID='00D000000000062'"/>
</querySet>
</scenarios>Results
Results are written real time in results directory. Open the result that is saved in .jpg format for real time visualization. Results are written using DataModelResult objects, which are modified over the course of each Pherf run.
XML results
Pherf XML results have a similar format to the corresponding scenario.xml file used for the Pherf run, but also include additional information, such as the execution time of queries, whether queries timed out, and result row count.
<queryResults expectedAggregateRowCount="100000" id="q1" statement="SELECT COUNT(*) FROM PHERF.USER_DEFINED_TEST" timeoutDuration="0">
<threadTimes threadName="1,1">
<runTimesInMs elapsedDurationInMs="1873" resultRowCount="100000" startTime="2020-04-09T11:28:12.623-07:00" timedOut="true"/>
<runTimesInMs elapsedDurationInMs="1793" resultRowCount="100000" startTime="2020-04-09T11:28:14.511-07:00" timedOut="true"/>
<runTimesInMs elapsedDurationInMs="1764" resultRowCount="100000" startTime="2020-04-09T11:28:16.319-07:00" timedOut="true"/>
</threadTimes>
</queryResults>CSV results
Each row in a CSV result file represents a single execution of a query and provides details about a query execution's
runtime, timeout status, result row count, and more. The header file format can be found in Header.java.
Testing
Default quorum is localhost. If you want to override set the system variable.
Run unit tests: mvn test -DZK_QUORUM=localhost
Run a specific method: mvn -Dtest=ClassName#methodName test
More to come...