pandahandler.schema

Tools to learn or coerce the schema of an open-ended input data frame.

Classes

Schema

Using and applying a data frame's schema information.

Functions

categorize_non_numerics(→ pandas.DataFrame)

Categorize columns that are neither categorical nor numeric.

Module Contents

pandahandler.schema.categorize_non_numerics(df: pandas.DataFrame) pandas.DataFrame[source]

Categorize columns that are neither categorical nor numeric.

class pandahandler.schema.Schema[source]

Using and applying a data frame’s schema information.

The primary intended use case is in open-world data exploration, where the schema of the input data is not known in advance. If you know the schema in advance, consider using a more declarative approach such as pandera.

Note that the categorical encodings attribute is important information that’s not traditionally captured in “schema” information, although it is important for encoding any new data in a way that’s consistent with training data for scoring in machine learning applications.

types_: pandas.Series[source]

The data types of the columns.

categorical_encodings: dict[Hashable, pandas.Index][source]

The categories of the categorical columns. The keys are the column names (for columns of categorical type) and the values are index objects expressing the integeger-category mappings defining that column’s categorical encoding.

__post_init__() None[source]

Run consistency checks.

classmethod from_df(df: pandas.DataFrame) typing_extensions.Self[source]

Create a ColumnTypes object from a data frame.

property categoricals: list[Hashable][source]

Return the names of categorical columns.

property numerics: list[Hashable][source]

Return the names of numeric columns.

property others: list[Hashable][source]

Return the names of columns that are neither categorical nor numeric.

__call__(df: pandas.DataFrame) pandas.DataFrame[source]

Coerce the data frame to the schema.