Package org.apache.calcite.profile
Class SimpleProfiler
java.lang.Object
org.apache.calcite.profile.SimpleProfiler
- All Implemented Interfaces:
Profiler
Basic implementation of
Profiler
.-
Nested Class Summary
Nested classes/interfaces inherited from interface org.apache.calcite.profile.Profiler
Profiler.Column, Profiler.Distribution, Profiler.FunctionalDependency, Profiler.Profile, Profiler.RowCount, Profiler.Statistic, Profiler.Unique
-
Constructor Summary
Constructors -
Method Summary
Modifier and TypeMethodDescriptionprofile
(Iterable<List<Comparable>> rows, List<Profiler.Column> columns, Collection<ImmutableBitSet> initialGroups) Creates a profile of a data set.static double
surprise
(double expected, double actual) Returns a measure of how much an actual value differs from expected.
-
Constructor Details
-
SimpleProfiler
public SimpleProfiler()
-
-
Method Details
-
profile
public Profiler.Profile profile(Iterable<List<Comparable>> rows, List<Profiler.Column> columns, Collection<ImmutableBitSet> initialGroups) Description copied from interface:Profiler
Creates a profile of a data set.- Specified by:
profile
in interfaceProfiler
- Parameters:
rows
- List of rows. Can be iterated over more than once (maybe not cheaply)columns
- Column definitionsinitialGroups
- List of combinations of columns that should be profiled early, because they may be interesting- Returns:
- A profile describing relationships within the data set
-
surprise
public static double surprise(double expected, double actual) Returns a measure of how much an actual value differs from expected. The formula isabs(expected - actual) / (expected + actual)
.Examples:
- surprise(e, a) is always between 0 and 1;
- surprise(e, a) is 0 if e = a;
- surprise(e, 0) is 1 if e > 0;
- surprise(0, a) is 1 if a > 0;
- surprise(5, 0) is 100%;
- surprise(5, 3) is 25%;
- surprise(5, 4) is 11%;
- surprise(5, 5) is 0%;
- surprise(5, 6) is 9%;
- surprise(5, 16) is 52%;
- surprise(5, 100) is 90%;
- Parameters:
expected
- Expected valueactual
- Actual value- Returns:
- Measure of how much expected deviates from actual
-