categoricals
Sklearn-compatible transformer to encode non-numeric columns as pandas categorical features.
Classes
Categorical encoding of the values for a pandas Series. |
|
Sklearn-compatible transformer to encode non-numeric columns as pandas categorical features. |
Functions
|
Identify columns that should be coded as categorical. |
Module Contents
- class categoricals.SeriesEncoder[source]
Categorical encoding of the values for a pandas Series.
- classmethod fit(series: pandas.Series) SeriesEncoder[source]
Learn categorical codes of a data series.
- Parameters:
series – The pandas Series to fit.
- Returns:
A SeriesEncoder object with the unique categories and null presence.
- __call__(series: pandas.Series) pandas.Series[source]
Encode a series as categorical.
This encoder maintains a distinction between null values and “never-before-seen” values: - Pandas maps null values to -1. - Pandas also maps any never-before-seen values to -1, which loses information. Instead, we identify
such values and postpend them to the list of categories, so that they are encoded as new categories.
This distinction can matter for certain ML algorithms, such as XGBoost. See also https://github.com/microsoft/LightGBM/issues/6908.
- categoricals.infer_categoricals(df: pandas.DataFrame) list[str][source]
Identify columns that should be coded as categorical.
- class categoricals.Encoder(specified_columns: list[str] | None = None)[source]
Bases:
sklearn.base.BaseEstimator,sklearn.base.TransformerMixinSklearn-compatible transformer to encode non-numeric columns as pandas categorical features.
This stores category mappings during fit and applies them during transform, ensuring consistency in the value-to-code mapping between training and scoring.
Initialize the Encoder.
- Parameters:
specified_columns – Optional list of column names to be encoded as pandas categorical. If not specified, the fit method will automatically detect non-numeric columns and treat them all as categorical.
- encoders: dict[str, SeriesEncoder][source]
- fit(X: pandas.DataFrame, y: Any = None) Encoder[source]
Fit the encoder to the data.
Learns the unique categories for each specified categorical column and saves the category codes.
- Parameters:
X – The data to fit the encoder on.
y – Ignored, present for API consistency.
- Raises:
ValueError – If the encoder has already been fitted.