Class ProfilerImpl

  • All Implemented Interfaces:

    public class ProfilerImpl
    extends java.lang.Object
    implements Profiler
    Implementation of Profiler that only investigates "interesting" combinations of columns.
    • Field Detail

      • combinationsPerPass

        private final int combinationsPerPass
        The number of combinations to consider per pass. The number is determined by memory, but a value of 1,000 is typical. You need 2KB memory per sketch, and one sketch for each combination.
      • interestingCount

        private final int interestingCount
        The minimum number of combinations considered "interesting". After that, a combination is only considered "interesting" if its surprise is greater than the median surprise.
    • Constructor Detail

      • ProfilerImpl

        ProfilerImpl​(int combinationsPerPass,
                     int interestingCount,
                     java.util.function.Predicate<Pair<ProfilerImpl.Space,​Profiler.Column>> predicate)
        Creates a ProfilerImpl.
        combinationsPerPass - Maximum number of columns (or combinations of columns) to compute each pass
        interestingCount - Minimum number of combinations considered interesting
        predicate - Whether a successor is considered interesting enough to analyze
    • Method Detail

      • profile

        public Profiler.Profile profile​(java.lang.Iterable<java.util.List<java.lang.Comparable>> rows,
                                        java.util.List<Profiler.Column> columns,
                                        java.util.Collection<ImmutableBitSet> initialGroups)
        Description copied from interface: Profiler
        Creates a profile of a data set.
        Specified by:
        profile in interface Profiler
        rows - List of rows. Can be iterated over more than once (maybe not cheaply)
        columns - Column definitions
        initialGroups - List of combinations of columns that should be profiled early, because they may be interesting
        A profile describing relationships within the data set