Chris@87: """ Chris@87: Set operations for 1D numeric arrays based on sorting. Chris@87: Chris@87: :Contains: Chris@87: ediff1d, Chris@87: unique, Chris@87: intersect1d, Chris@87: setxor1d, Chris@87: in1d, Chris@87: union1d, Chris@87: setdiff1d Chris@87: Chris@87: :Notes: Chris@87: Chris@87: For floating point arrays, inaccurate results may appear due to usual round-off Chris@87: and floating point comparison issues. Chris@87: Chris@87: Speed could be gained in some operations by an implementation of Chris@87: sort(), that can provide directly the permutation vectors, avoiding Chris@87: thus calls to argsort(). Chris@87: Chris@87: To do: Optionally return indices analogously to unique for all functions. Chris@87: Chris@87: :Author: Robert Cimrman Chris@87: Chris@87: """ Chris@87: from __future__ import division, absolute_import, print_function Chris@87: Chris@87: import numpy as np Chris@87: Chris@87: Chris@87: __all__ = [ Chris@87: 'ediff1d', 'intersect1d', 'setxor1d', 'union1d', 'setdiff1d', 'unique', Chris@87: 'in1d' Chris@87: ] Chris@87: Chris@87: Chris@87: def ediff1d(ary, to_end=None, to_begin=None): Chris@87: """ Chris@87: The differences between consecutive elements of an array. Chris@87: Chris@87: Parameters Chris@87: ---------- Chris@87: ary : array_like Chris@87: If necessary, will be flattened before the differences are taken. Chris@87: to_end : array_like, optional Chris@87: Number(s) to append at the end of the returned differences. Chris@87: to_begin : array_like, optional Chris@87: Number(s) to prepend at the beginning of the returned differences. Chris@87: Chris@87: Returns Chris@87: ------- Chris@87: ediff1d : ndarray Chris@87: The differences. Loosely, this is ``ary.flat[1:] - ary.flat[:-1]``. Chris@87: Chris@87: See Also Chris@87: -------- Chris@87: diff, gradient Chris@87: Chris@87: Notes Chris@87: ----- Chris@87: When applied to masked arrays, this function drops the mask information Chris@87: if the `to_begin` and/or `to_end` parameters are used. Chris@87: Chris@87: Examples Chris@87: -------- Chris@87: >>> x = np.array([1, 2, 4, 7, 0]) Chris@87: >>> np.ediff1d(x) Chris@87: array([ 1, 2, 3, -7]) Chris@87: Chris@87: >>> np.ediff1d(x, to_begin=-99, to_end=np.array([88, 99])) Chris@87: array([-99, 1, 2, 3, -7, 88, 99]) Chris@87: Chris@87: The returned array is always 1D. Chris@87: Chris@87: >>> y = [[1, 2, 4], [1, 6, 24]] Chris@87: >>> np.ediff1d(y) Chris@87: array([ 1, 2, -3, 5, 18]) Chris@87: Chris@87: """ Chris@87: ary = np.asanyarray(ary).flat Chris@87: ed = ary[1:] - ary[:-1] Chris@87: arrays = [ed] Chris@87: if to_begin is not None: Chris@87: arrays.insert(0, to_begin) Chris@87: if to_end is not None: Chris@87: arrays.append(to_end) Chris@87: Chris@87: if len(arrays) != 1: Chris@87: # We'll save ourselves a copy of a potentially large array in Chris@87: # the common case where neither to_begin or to_end was given. Chris@87: ed = np.hstack(arrays) Chris@87: Chris@87: return ed Chris@87: Chris@87: def unique(ar, return_index=False, return_inverse=False, return_counts=False): Chris@87: """ Chris@87: Find the unique elements of an array. Chris@87: Chris@87: Returns the sorted unique elements of an array. There are two optional Chris@87: outputs in addition to the unique elements: the indices of the input array Chris@87: that give the unique values, and the indices of the unique array that Chris@87: reconstruct the input array. Chris@87: Chris@87: Parameters Chris@87: ---------- Chris@87: ar : array_like Chris@87: Input array. This will be flattened if it is not already 1-D. Chris@87: return_index : bool, optional Chris@87: If True, also return the indices of `ar` that result in the unique Chris@87: array. Chris@87: return_inverse : bool, optional Chris@87: If True, also return the indices of the unique array that can be used Chris@87: to reconstruct `ar`. Chris@87: return_counts : bool, optional Chris@87: .. versionadded:: 1.9.0 Chris@87: If True, also return the number of times each unique value comes up Chris@87: in `ar`. Chris@87: Chris@87: Returns Chris@87: ------- Chris@87: unique : ndarray Chris@87: The sorted unique values. Chris@87: unique_indices : ndarray, optional Chris@87: The indices of the first occurrences of the unique values in the Chris@87: (flattened) original array. Only provided if `return_index` is True. Chris@87: unique_inverse : ndarray, optional Chris@87: The indices to reconstruct the (flattened) original array from the Chris@87: unique array. Only provided if `return_inverse` is True. Chris@87: unique_counts : ndarray, optional Chris@87: .. versionadded:: 1.9.0 Chris@87: The number of times each of the unique values comes up in the Chris@87: original array. Only provided if `return_counts` is True. Chris@87: Chris@87: See Also Chris@87: -------- Chris@87: numpy.lib.arraysetops : Module with a number of other functions for Chris@87: performing set operations on arrays. Chris@87: Chris@87: Examples Chris@87: -------- Chris@87: >>> np.unique([1, 1, 2, 2, 3, 3]) Chris@87: array([1, 2, 3]) Chris@87: >>> a = np.array([[1, 1], [2, 3]]) Chris@87: >>> np.unique(a) Chris@87: array([1, 2, 3]) Chris@87: Chris@87: Return the indices of the original array that give the unique values: Chris@87: Chris@87: >>> a = np.array(['a', 'b', 'b', 'c', 'a']) Chris@87: >>> u, indices = np.unique(a, return_index=True) Chris@87: >>> u Chris@87: array(['a', 'b', 'c'], Chris@87: dtype='|S1') Chris@87: >>> indices Chris@87: array([0, 1, 3]) Chris@87: >>> a[indices] Chris@87: array(['a', 'b', 'c'], Chris@87: dtype='|S1') Chris@87: Chris@87: Reconstruct the input array from the unique values: Chris@87: Chris@87: >>> a = np.array([1, 2, 6, 4, 2, 3, 2]) Chris@87: >>> u, indices = np.unique(a, return_inverse=True) Chris@87: >>> u Chris@87: array([1, 2, 3, 4, 6]) Chris@87: >>> indices Chris@87: array([0, 1, 4, 3, 1, 2, 1]) Chris@87: >>> u[indices] Chris@87: array([1, 2, 6, 4, 2, 3, 2]) Chris@87: Chris@87: """ Chris@87: ar = np.asanyarray(ar).flatten() Chris@87: Chris@87: optional_indices = return_index or return_inverse Chris@87: optional_returns = optional_indices or return_counts Chris@87: Chris@87: if ar.size == 0: Chris@87: if not optional_returns: Chris@87: ret = ar Chris@87: else: Chris@87: ret = (ar,) Chris@87: if return_index: Chris@87: ret += (np.empty(0, np.bool),) Chris@87: if return_inverse: Chris@87: ret += (np.empty(0, np.bool),) Chris@87: if return_counts: Chris@87: ret += (np.empty(0, np.intp),) Chris@87: return ret Chris@87: Chris@87: if optional_indices: Chris@87: perm = ar.argsort(kind='mergesort' if return_index else 'quicksort') Chris@87: aux = ar[perm] Chris@87: else: Chris@87: ar.sort() Chris@87: aux = ar Chris@87: flag = np.concatenate(([True], aux[1:] != aux[:-1])) Chris@87: Chris@87: if not optional_returns: Chris@87: ret = aux[flag] Chris@87: else: Chris@87: ret = (aux[flag],) Chris@87: if return_index: Chris@87: ret += (perm[flag],) Chris@87: if return_inverse: Chris@87: iflag = np.cumsum(flag) - 1 Chris@87: iperm = perm.argsort() Chris@87: ret += (np.take(iflag, iperm),) Chris@87: if return_counts: Chris@87: idx = np.concatenate(np.nonzero(flag) + ([ar.size],)) Chris@87: ret += (np.diff(idx),) Chris@87: return ret Chris@87: Chris@87: def intersect1d(ar1, ar2, assume_unique=False): Chris@87: """ Chris@87: Find the intersection of two arrays. Chris@87: Chris@87: Return the sorted, unique values that are in both of the input arrays. Chris@87: Chris@87: Parameters Chris@87: ---------- Chris@87: ar1, ar2 : array_like Chris@87: Input arrays. Chris@87: assume_unique : bool Chris@87: If True, the input arrays are both assumed to be unique, which Chris@87: can speed up the calculation. Default is False. Chris@87: Chris@87: Returns Chris@87: ------- Chris@87: intersect1d : ndarray Chris@87: Sorted 1D array of common and unique elements. Chris@87: Chris@87: See Also Chris@87: -------- Chris@87: numpy.lib.arraysetops : Module with a number of other functions for Chris@87: performing set operations on arrays. Chris@87: Chris@87: Examples Chris@87: -------- Chris@87: >>> np.intersect1d([1, 3, 4, 3], [3, 1, 2, 1]) Chris@87: array([1, 3]) Chris@87: Chris@87: """ Chris@87: if not assume_unique: Chris@87: # Might be faster than unique( intersect1d( ar1, ar2 ) )? Chris@87: ar1 = unique(ar1) Chris@87: ar2 = unique(ar2) Chris@87: aux = np.concatenate((ar1, ar2)) Chris@87: aux.sort() Chris@87: return aux[:-1][aux[1:] == aux[:-1]] Chris@87: Chris@87: def setxor1d(ar1, ar2, assume_unique=False): Chris@87: """ Chris@87: Find the set exclusive-or of two arrays. Chris@87: Chris@87: Return the sorted, unique values that are in only one (not both) of the Chris@87: input arrays. Chris@87: Chris@87: Parameters Chris@87: ---------- Chris@87: ar1, ar2 : array_like Chris@87: Input arrays. Chris@87: assume_unique : bool Chris@87: If True, the input arrays are both assumed to be unique, which Chris@87: can speed up the calculation. Default is False. Chris@87: Chris@87: Returns Chris@87: ------- Chris@87: setxor1d : ndarray Chris@87: Sorted 1D array of unique values that are in only one of the input Chris@87: arrays. Chris@87: Chris@87: Examples Chris@87: -------- Chris@87: >>> a = np.array([1, 2, 3, 2, 4]) Chris@87: >>> b = np.array([2, 3, 5, 7, 5]) Chris@87: >>> np.setxor1d(a,b) Chris@87: array([1, 4, 5, 7]) Chris@87: Chris@87: """ Chris@87: if not assume_unique: Chris@87: ar1 = unique(ar1) Chris@87: ar2 = unique(ar2) Chris@87: Chris@87: aux = np.concatenate((ar1, ar2)) Chris@87: if aux.size == 0: Chris@87: return aux Chris@87: Chris@87: aux.sort() Chris@87: # flag = ediff1d( aux, to_end = 1, to_begin = 1 ) == 0 Chris@87: flag = np.concatenate(([True], aux[1:] != aux[:-1], [True])) Chris@87: # flag2 = ediff1d( flag ) == 0 Chris@87: flag2 = flag[1:] == flag[:-1] Chris@87: return aux[flag2] Chris@87: Chris@87: def in1d(ar1, ar2, assume_unique=False, invert=False): Chris@87: """ Chris@87: Test whether each element of a 1-D array is also present in a second array. Chris@87: Chris@87: Returns a boolean array the same length as `ar1` that is True Chris@87: where an element of `ar1` is in `ar2` and False otherwise. Chris@87: Chris@87: Parameters Chris@87: ---------- Chris@87: ar1 : (M,) array_like Chris@87: Input array. Chris@87: ar2 : array_like Chris@87: The values against which to test each value of `ar1`. Chris@87: assume_unique : bool, optional Chris@87: If True, the input arrays are both assumed to be unique, which Chris@87: can speed up the calculation. Default is False. Chris@87: invert : bool, optional Chris@87: If True, the values in the returned array are inverted (that is, Chris@87: False where an element of `ar1` is in `ar2` and True otherwise). Chris@87: Default is False. ``np.in1d(a, b, invert=True)`` is equivalent Chris@87: to (but is faster than) ``np.invert(in1d(a, b))``. Chris@87: Chris@87: .. versionadded:: 1.8.0 Chris@87: Chris@87: Returns Chris@87: ------- Chris@87: in1d : (M,) ndarray, bool Chris@87: The values `ar1[in1d]` are in `ar2`. Chris@87: Chris@87: See Also Chris@87: -------- Chris@87: numpy.lib.arraysetops : Module with a number of other functions for Chris@87: performing set operations on arrays. Chris@87: Chris@87: Notes Chris@87: ----- Chris@87: `in1d` can be considered as an element-wise function version of the Chris@87: python keyword `in`, for 1-D sequences. ``in1d(a, b)`` is roughly Chris@87: equivalent to ``np.array([item in b for item in a])``. Chris@87: Chris@87: .. versionadded:: 1.4.0 Chris@87: Chris@87: Examples Chris@87: -------- Chris@87: >>> test = np.array([0, 1, 2, 5, 0]) Chris@87: >>> states = [0, 2] Chris@87: >>> mask = np.in1d(test, states) Chris@87: >>> mask Chris@87: array([ True, False, True, False, True], dtype=bool) Chris@87: >>> test[mask] Chris@87: array([0, 2, 0]) Chris@87: >>> mask = np.in1d(test, states, invert=True) Chris@87: >>> mask Chris@87: array([False, True, False, True, False], dtype=bool) Chris@87: >>> test[mask] Chris@87: array([1, 5]) Chris@87: """ Chris@87: # Ravel both arrays, behavior for the first array could be different Chris@87: ar1 = np.asarray(ar1).ravel() Chris@87: ar2 = np.asarray(ar2).ravel() Chris@87: Chris@87: # This code is significantly faster when the condition is satisfied. Chris@87: if len(ar2) < 10 * len(ar1) ** 0.145: Chris@87: if invert: Chris@87: mask = np.ones(len(ar1), dtype=np.bool) Chris@87: for a in ar2: Chris@87: mask &= (ar1 != a) Chris@87: else: Chris@87: mask = np.zeros(len(ar1), dtype=np.bool) Chris@87: for a in ar2: Chris@87: mask |= (ar1 == a) Chris@87: return mask Chris@87: Chris@87: # Otherwise use sorting Chris@87: if not assume_unique: Chris@87: ar1, rev_idx = np.unique(ar1, return_inverse=True) Chris@87: ar2 = np.unique(ar2) Chris@87: Chris@87: ar = np.concatenate((ar1, ar2)) Chris@87: # We need this to be a stable sort, so always use 'mergesort' Chris@87: # here. The values from the first array should always come before Chris@87: # the values from the second array. Chris@87: order = ar.argsort(kind='mergesort') Chris@87: sar = ar[order] Chris@87: if invert: Chris@87: bool_ar = (sar[1:] != sar[:-1]) Chris@87: else: Chris@87: bool_ar = (sar[1:] == sar[:-1]) Chris@87: flag = np.concatenate((bool_ar, [invert])) Chris@87: indx = order.argsort(kind='mergesort')[:len(ar1)] Chris@87: Chris@87: if assume_unique: Chris@87: return flag[indx] Chris@87: else: Chris@87: return flag[indx][rev_idx] Chris@87: Chris@87: def union1d(ar1, ar2): Chris@87: """ Chris@87: Find the union of two arrays. Chris@87: Chris@87: Return the unique, sorted array of values that are in either of the two Chris@87: input arrays. Chris@87: Chris@87: Parameters Chris@87: ---------- Chris@87: ar1, ar2 : array_like Chris@87: Input arrays. They are flattened if they are not already 1D. Chris@87: Chris@87: Returns Chris@87: ------- Chris@87: union1d : ndarray Chris@87: Unique, sorted union of the input arrays. Chris@87: Chris@87: See Also Chris@87: -------- Chris@87: numpy.lib.arraysetops : Module with a number of other functions for Chris@87: performing set operations on arrays. Chris@87: Chris@87: Examples Chris@87: -------- Chris@87: >>> np.union1d([-1, 0, 1], [-2, 0, 2]) Chris@87: array([-2, -1, 0, 1, 2]) Chris@87: Chris@87: """ Chris@87: return unique(np.concatenate((ar1, ar2))) Chris@87: Chris@87: def setdiff1d(ar1, ar2, assume_unique=False): Chris@87: """ Chris@87: Find the set difference of two arrays. Chris@87: Chris@87: Return the sorted, unique values in `ar1` that are not in `ar2`. Chris@87: Chris@87: Parameters Chris@87: ---------- Chris@87: ar1 : array_like Chris@87: Input array. Chris@87: ar2 : array_like Chris@87: Input comparison array. Chris@87: assume_unique : bool Chris@87: If True, the input arrays are both assumed to be unique, which Chris@87: can speed up the calculation. Default is False. Chris@87: Chris@87: Returns Chris@87: ------- Chris@87: setdiff1d : ndarray Chris@87: Sorted 1D array of values in `ar1` that are not in `ar2`. Chris@87: Chris@87: See Also Chris@87: -------- Chris@87: numpy.lib.arraysetops : Module with a number of other functions for Chris@87: performing set operations on arrays. Chris@87: Chris@87: Examples Chris@87: -------- Chris@87: >>> a = np.array([1, 2, 3, 2, 4, 1]) Chris@87: >>> b = np.array([3, 4, 5, 6]) Chris@87: >>> np.setdiff1d(a, b) Chris@87: array([1, 2]) Chris@87: Chris@87: """ Chris@87: if not assume_unique: Chris@87: ar1 = unique(ar1) Chris@87: ar2 = unique(ar2) Chris@87: aux = in1d(ar1, ar2, assume_unique=True) Chris@87: if aux.size == 0: Chris@87: return aux Chris@87: else: Chris@87: return np.asarray(ar1)[aux == 0]