improver.utilities.statistical_operations module

Module to contain statistical operations.

class improver.utilities.statistical_operations.ProbabilitiesFromPercentiles2D(percentiles_cube, output_name)[source]

Bases: improver.BasePlugin

Generate a 2-dimensional field of probabilities by interpolating a percentiled cube of data to required points.

Examples

Given a reference field of values against a percentile coordinate, an interpolation is performed using another field of values of the same type (e.g. height). This returns the percentile with which these heights would be associated in the reference field. This effectively uses the field of values as a 2-dimensional set of thresholds, and the percentiles looked up correspond to the probabilities of these thresholds being reached.

Snow-fall level:

Reference field: Percentiled snow fall level (m ASL)
Other field: Orography (m ASL)

300m ----------------- 30th Percentile snow fall level
200m ----_------------ 20th Percentile snow fall level
100m ---/-\----------- 10th Percentile snow fall level
000m --/---\----------  0th Percentile snow fall level
______/     \_________ Orogaphy

The orography heights are compared against the heights that correspond with percentile values to find the band in which they fall; this diagram hides the 2-dimensional variability of the snow fall level. The percentile values are then interpolated to the height of the point being considered. This constructs a 2-dimensional field of probabilities that snow will be falling at each point in the orography field.

__init__(percentiles_cube, output_name)[source]

Initialise class. Sets an inverse_ordering (bool) switch to true for cases where the percentiled data increases in the opposite sense to the percentile coordinate:

e.g. 0th Percentile - Value = 10

10th Percentile - Value = 5 20th Percentile - Value = 0

Parameters
  • percentiles_cube (iris.cube.Cube) – The percentiled field from which probabilities will be obtained using the input cube. This cube should contain a percentiles dimension, with fields of values that correspond to these percentiles. The cube passed to the process method will contain values of the same diagnostic (e.g. height) as this reference cube.

  • output_name (str) – The name of the cube being created, e.g.’probability_of_snow_falling_level_below_ground_level’

_abc_cache = <_weakrefset.WeakSet object>
_abc_negative_cache = <_weakrefset.WeakSet object>
_abc_negative_cache_version = 213
_abc_registry = <_weakrefset.WeakSet object>
create_probability_cube(cube, threshold_cube)[source]

Create a 2-dimensional probability cube in which to store the calculated probabilities.

Parameters
  • cube (iris.cube.Cube) – Template for the output probability cube. This is a slice created in process, containing a percentile coordinate as well as x and y coordinates. We keep all the metadata from this cube but dispose of the percentile coordinate as we will be filling the cube with probabilities.

  • threshold_cube (iris.cube.Cube) – A 2-dimensional cube of “threshold” values containing metadata required to construct a probability cube.

Returns

A new 2-dimensional probability cube with suitable metadata.

Return type

iris.cube.Cube

percentile_interpolation(threshold_cube, percentiles_cube)[source]

Using a percentiles_cube containing a distinct percentile distribution for each point on a 2-dimensional grid, we can interpolate through each distribution to obtain a probability. The point to which we interpolate is defined by the threshold_cube. Note that the current implementation assumes that in cases of a degenerate percentile distribution, the right most bin in which a threshold value is found is chosen.

e.g.

Percentile: 0 10 20 30 40 50 ...
Height (m): 0 0 0 15 30 40 ...

A height of 0m will be associated with a probabilty of 20%. This is not correct, but nor is the approach of taking 0%. The percentile approach is not suitable with these degenerate distributions, so be wary of the returned probabilities.

Examples

This simple linear interpolator works in the following way.

percentiles_cube:

[ [[2.0, 2.0, 2.0],
   [2.0, 2.0, 2.0],
   [2.0, 2.0, 2.0]],

  [[4.0, 4.0, 4.0],
   [4.0, 4.0, 4.0],
   [4.0, 4.0, 4.0]] ]

threshold_cube:

[ [1.0, 1.0, 1.0],
  [3.0, 3.0, 3.0],
  [5.0, 5.0, 5.0] ]

value_bounds:

[ [[np.nan, np.nan, np.nan],
   [np.nan, np.nan, np.nan],
   [np.nan, np.nan, np.nan]],

  [[np.nan, np.nan, np.nan],
   [np.nan, np.nan, np.nan],
   [np.nan, np.nan, np.nan]] ]

percentile_bounds:

[ [[-1, -1, -1],
   [-1, -1, -1],
   [-1, -1, -1]],

  [[-1, -1, -1],
   [-1, -1, -1],
   [-1, -1, -1]] ]
  1. Create slices over each percentile, and using the correct inequality (as determined by inverse_ordering) compare the threshold values to the percentiles slice; here we assume inverse_ordering is False, so we use >=. We then populate the value_bounds and percentile_bounds arrays.

    Slice 0 - 0th Percentile:

    [[1.0 >= 2.0, 1.0 >= 2.0, 1.0 >= 2.0],
     [3.0 >= 2.0, 3.0 >= 2.0, 3.0 >= 2.0],
     [5.0 >= 2.0, 5.0 >= 2.0, 5.0 >= 2.0]]
    
    [[False, False, False],
     [True, True, True],
     [True, True, True]]
    

    The value_bounds array has a leading dimensions with 2 indices to be associated with the lower [0] and upper bounds [1] about the threshold being considered. The [0] index is populated with the values in the slice of percentiles_cube at every True index. The [1] index is populated with the values in the next slice of percentiles_cube.

    [ [[np.nan, np.nan, np.nan],
       [2.0, 2.0, 2.0],
       [2.0, 2.0, 2.0]],
    
      [[np.nan, np.nan, np.nan],
       [4.0, 4.0, 4.0],
       [4.0, 4.0, 4.0]] ]
    

    The percentile_bounds array is also contains a leading dimension associated with lower and upper bounds about the thresholds. The lower bound array is populated at every True index with the current percentile value (0 in this first slice), whilst the upper bound array takes the percentile value from the next slice.

    [ [[-1, -1, -1],
       [0, 0, 0],
       [0, 0, 0]],
    
      [[-1, -1, -1],
       [50, 50, 50],
       [50, 50, 50]] ]
    

    After the same process is applied to the next slice, the 50th percentile, we end up with value_bounds:

    [ [[np.nan, np.nan, np.nan],
       [2.0, 2.0, 2.0],
       [4.0, 4.0, 4.0]],
    
      [[np.nan, np.nan, np.nan],
       [4.0, 4.0, 4.0],
       [4.0, 4.0, 4.0]] ]
    

    And percentile bounds:

    [ [[-1, -1, -1],
       [0, 0, 0],
       [50, 50, 50]],
    
      [[-1, -1, -1],
       [50, 50, 50],
       [50, 50, 50]] ]
    

    Note that where there is no availble +1 index in the percentiles_cube the upper bound is set to be the same as the lower_bound.

  2. When all slices have been interated over, the interpolants are calculated using the threshold values and the values_bounds.

    (threshold_cube.data - lower_bound) /
    (upper_bound - lower_bound)
    

    If the upper_bound and lower_bound are the same this leads to a divide by 0 calculation, resulting in np.inf as the output.

  3. The interpolants are used to calculate the percentile value at each point in the array using the percentile_bounds.

    lower_percentile_bound + interpolants *
    (upper_percentile_bounds - lower_percentile_bounds)
    

    The percentiles are divided by 100 to give a fractional probability.

  1. Any probabilities that are calculated to be np.inf indicate that the associated point has a threshold value that is above the top percentile band. These points are given a probability value of 1.

  1. Any points for which the calculated probability is np.nan had threshold values that were never found to fall within a percentile band, and so must be below the lowest band. These points are given a probability value of 0.

Parameters
  • threshold_cube (iris.cube.Cube) – A 2-dimensional cube of “threshold” values for which it is desired to obtain probability values from the percentiled reference cube. This cube should have the same x and y dimensions as percentiles_cube.

  • percentiles_cube (iris.cube.Cube) – A 3-dimensional cube, 1 dimension describing the percentile distributions, and 2-dimensions shared with the threshold_cube, typically x and y.

Returns

A 2-dimensional cube of probabilities obtained by interpolating between percentile values.

Return type

iris.cube.Cube

process(threshold_cube)[source]

Slice the percentiles cube over any non-spatial coordinates (realization, time, etc) if present, and call the percentile interpolation method for each resulting cube.

Parameters

threshold_cube (iris.cube.Cube) – A cube of values, that effectively behave as thresholds, for which it is desired to obtain probability values from a percentiled reference cube.

Returns

A cube of probabilities obtained by interpolating between percentile values at the “threshold” level.

Return type

iris.cube.Cube