pandahandler.indexes
Tools for working with pandas indexes.
Classes
A functional wrapper around pandas indexes. |
Functions
|
Assert that the index is trivial, i.e. equal to the default RangeIndex. |
|
Safely unset the index. |
Module Contents
- pandahandler.indexes.is_unnamed_range_index(index: pandas.Index) bool[source]
Assert that the index is trivial, i.e. equal to the default RangeIndex.
- pandahandler.indexes.unset(df: pandas.DataFrame, require_names: bool = True) pandas.DataFrame[source]
Safely unset the index.
Details:
This is more an “unset” than a “reset” in the sense that it makes the index as trivial as possible. If we could totally remove the index of the data frame, that’s what this function would do, but the unnamed range index is the next closest thing.
This is “safe” in the sense that it will not drop any existing data encoded in the index. (We assume that an unnamed range index does not count as “data” in this context.) Existing index columns get converted to regular columns in the data frame.
- Parameters:
df – The data frame to unset the index on.
require_names –
Whether to raise an error if the existing index is unnamed:
This setting is ignored whenever the index is an unnamed RangeIndex.
With require_names=False, a unnamed index typically is converted to a new column called “index”.
Using require_names=True (default) forces users to declare how to handle an unnamed index. Either:
Call reset_index(drop=True) directly to drop the index instead of calling this function.
Set the name(s) of the index prior to calling this function.
- Raises:
ValueError – If the data frame column names overlap with the index names.
ValueError –
If all of the following apply:
require_names is True (the default)
the index is unnamed
the index is not a trivial RangeIndex
- Returns:
A copy of the input data frame with the index columns reset as regular columns. The new index is a simple RangeIndex.
- class pandahandler.indexes.Index[source]
A functional wrapper around pandas indexes.
An instance of this class can be used to simplify the following types of operations:
Coercing an input data frame to have a particular index.
Enforcing or ensuring various index properties such as monotonicity or uniqueness.
Unset and reset indexes cleanly before and after operations that require columnar access to index columns.
This class applies both for pandas.Index and MultiIndex objects to reduce the amount of special-casing needed based on the number of columns in the index.
Example
# Set an index with sorting df = pd.DataFrame({"a": [1, 3, 2], "b": [4, 5, 6]}) index = Index(names=["a"], sort=True) df = index(df) assert df.index.tolist() == [1, 2, 3] # Switch over to another column as the index
- allow_null: bool[source]
Whether to allow null values in the index. Applicable only for single-column indexes, since pandas does not support null values in a MultiIndex.
- require_unique: bool[source]
Whether to require the index to be unique. E.g. if True, raise an error if the index is not unique.
- validate(index: pandas.Index) None[source]
Assert that the provided index complies with this index specification.
- __call__(df: pandas.DataFrame, coerce_dtypes: bool = False, filter_nulls: bool = False) pandas.DataFrame[source]
Set the index on the data frame.
Any named columns of the current df index that are not part of the new index will be converted to new data frame columns.
- Parameters:
df – The data frame to set the index on.
coerce_dtypes – Whether to coerce the types of the index columns to the specified dtypes.
filter_nulls – Whether to filter out rows with null values in the index columns before setting the index. This is useful when allow_null is False and you want to keep the rows with non-null values.
- Raises:
ValueError – If coerce_dtypes is True but dtypes is not specified.
DuplicateValueError – If the index has duplicate values and require_unique is True.
NullValueError – If the index has null values and allow_null is False.
DTypeError – If the index dtypes do not match the specified dtypes and coerce_dtypes is False.
ValueError – If the index names do not match the specified names.