categoricals ============ .. py:module:: categoricals .. autoapi-nested-parse:: Sklearn-compatible transformer to encode non-numeric columns as pandas categorical features. Classes ------- .. autoapisummary:: categoricals.SeriesEncoder categoricals.Encoder Functions --------- .. autoapisummary:: categoricals.infer_categoricals Module Contents --------------- .. py:class:: SeriesEncoder Categorical encoding of the values for a pandas Series. .. attribute:: categories The list of distinct non-null categories. .. py:attribute:: categories :type: list[Any] .. py:method:: fit(series: pandas.Series) -> SeriesEncoder :classmethod: Learn categorical codes of a data series. :param series: The pandas Series to fit. :returns: A SeriesEncoder object with the unique categories and null presence. .. py:method:: __call__(series: pandas.Series) -> pandas.Series Encode a series as categorical. This encoder maintains a distinction between null values and "never-before-seen" values: - Pandas maps null values to -1. - Pandas also maps any never-before-seen values to -1, which loses information. Instead, we identify such values and postpend them to the list of categories, so that they are encoded as new categories. This distinction can matter for certain ML algorithms, such as XGBoost. See also https://github.com/microsoft/LightGBM/issues/6908. .. py:function:: infer_categoricals(df: pandas.DataFrame) -> list[str] Identify columns that should be coded as categorical. .. py:class:: Encoder(specified_columns: list[str] | None = None) Bases: :py:obj:`sklearn.base.BaseEstimator`, :py:obj:`sklearn.base.TransformerMixin` Sklearn-compatible transformer to encode non-numeric columns as pandas categorical features. This stores category mappings during fit and applies them during transform, ensuring consistency in the value-to-code mapping between training and scoring. Initialize the Encoder. :param specified_columns: Optional list of column names to be encoded as pandas categorical. If not specified, the fit method will automatically detect non-numeric columns and treat them all as categorical. .. py:attribute:: specified_columns :value: None .. py:attribute:: encoders :type: dict[str, SeriesEncoder] .. py:method:: fit(X: pandas.DataFrame, y: Any = None) -> Encoder Fit the encoder to the data. Learns the unique categories for each specified categorical column and saves the category codes. :param X: The data to fit the encoder on. :param y: Ignored, present for API consistency. :raises ValueError: If the encoder has already been fitted. .. py:method:: transform(X: pandas.DataFrame) -> pandas.DataFrame Apply the category encodings to new data. :param X: The data to transform Returns: The transformed data with consistent categorical encodings.