skmiscpy.plotting¶

Functions¶

`plot_mirror_histogram`(→ None)	Plots a mirror histogram of a variable by another grouping binary variable.
`plot_smd`(→ None)	Plots the standardized mean difference (SMD) for variables as a point plot (also known as a love plot),

Module Contents¶

skmiscpy.plotting.plot_mirror_histogram(data: pandas.DataFrame, var: str, group: str, bins: int = 50, weights: str | None = None, xlabel: str | None = None, ylabel: str | None = None, title: str | None = None) → None[source]¶

Plots a mirror histogram of a variable by another grouping binary variable.

Parameters:¶

data : pd.DataFrame¶: A pandas DataFrame containing the var and group column.
var : str¶: Name of the column for which the histogram needs to be drawn.
group : str¶: Name of the binary column based on which the histogram will be mirrored.
bins : int, optional¶: Number of bins for the histograms. Default is 50.
weights : str, optional¶: Name of the column based on which the histogram will be weighted. Default is None.
xlabel : str, optional¶: Label for the x-axis. If not provided, defaults to the name of the var column.
ylabel : str, optional¶: Label for the y-axis. If not provided, defaults to “Frequency”.
title : str, optional¶: Title of the plot. If not provided, defaults to “Mirror Histogram of var by group”.

Raises:¶

TypeError – If var, group, weights, xlabel, ylabel, or title are not of type str. If data is not a pandas DataFrame. If var is not numerical. If weights is not numerical.
ValueError – If the bins parameter is not a positive integer. If the data DataFrame is empty. If the group column does not contain exactly two unique, non-NaN values.

Examples

Example 1: Basic usage with numerical data.

>>> import pandas as pd
>>> import seaborn as sns
>>> import numpy as np
>>> from skmiscpy import plot_mirror_histogram

>>> data = pd.DataFrame({
...     'group': [1, 1, 0, 0, 1, 0],
...     'var': [2.0, 3.5, 3.0, 2.2, 2.2, 3.3]
... })
>>> plot_mirror_histogram(data=data, var='var', group='group')

Example 2: With weights and custom labels.

>>> data = pd.DataFrame({
...     'group': [1, 1, 0, 0, 1, 0],
...     'var': [2.0, 3.5, 3.0, 2.2, 2.2, 3.3],
...     'weights': [1.0, 1.5, 2.0, 1.2, 1.1, 0.8]
... })
>>> plot_mirror_histogram(
...     data=data, var='var', group='group', weights='weights',
...     xlabel='Variable', ylabel='Count', title='Weighted Mirror Histogram'
... )

skmiscpy.plotting.plot_smd(data: pandas.DataFrame, add_ref_line: bool = False, ref_line_value: int | float = 0.1, *args, **kwargs) → None[source]¶

Plots the standardized mean difference (SMD) for variables as a point plot (also known as a love plot), displaying unadjusted (and adjusted, if provided) SMDs. Optionally includes a vertical reference line.

Parameters:¶

data : pd.DataFrame¶: A pandas DataFrame with at least two columns: variables and unadjusted_smd, containing the variable names and their associated unadjusted SMD values. To include the adjusted SMD in the plot, the DataFrame must also contain a column adjusted_smd with the adjusted SMD values. The column names must be variables, unadjusted_smd, and adjusted_smd.
add_ref_line : bool, optional¶: Whether to add a vertical reference line. Defaults to False.
ref_line_value : int or float, optional¶: The value at which to draw the vertical reference line. Defaults to 0.1. Must be between 0 and 1.
*args: Additional positional arguments passed to Seaborn’s pointplot.
**kwargs: Additional keyword arguments passed to Seaborn’s pointplot.

Raises:¶

ValueError – If ref_line_value is not between 0 and 1, or if the input DataFrame is empty.
TypeError – If data is not a pandas DataFrame, or if add_ref_line is not a boolean. Additionally, raises TypeError if ref_line_value is not an integer or float.

Examples

Basic usage with only unadjusted SMD:

>>> import pandas as pd
>>> from skmiscpy import plot_smd

>>> data = pd.DataFrame({
...     'variables': ['var1', 'var2', 'var3'],
...     'unadjusted_smd': [0.2, 0.5, 0.3]
... })

>>> plot_smd(data)
# This will plot the unadjusted SMD values with default settings.

Including adjusted SMD with a reference line:

>>> data = pd.DataFrame({
...     'variables': ['var1', 'var2', 'var3'],
...     'unadjusted_smd': [0.2, 0.5, 0.3],
...     'adjusted_smd': [0.1, 0.4, 0.2]
... })

>>> plot_smd(data, add_ref_line=True, ref_line_value=0.3)
# This will plot both unadjusted and adjusted SMD values, with a vertical reference line at 0.3.

Customizing the plot appearance:

>>> data = pd.DataFrame({
...     'variables': ['var1', 'var2', 'var3'],
...     'unadjusted_smd': [0.2, 0.5, 0.3],
...     'adjusted_smd': [0.1, 0.4, 0.2]
... })

>>> plot_smd(
...     data,
...     add_ref_line=True,
...     ref_line_value=0.2,
...     palette='husl',
...     markers=['o', 'D'],
...     linestyle='--'
... )
# This will plot the SMD values with custom color palette, markers, and linestyle for the plot.