skmiscpy.plotting

Functions

plot_mirror_histogram(→ None)

Plots a mirror histogram of a variable by another grouping binary variable.

plot_smd(→ None)

Plots the standardized mean difference (SMD) for variables as a point plot (also known as a love plot),

Module Contents

skmiscpy.plotting.plot_mirror_histogram(data: pandas.DataFrame, var: str, group: str, bins: int = 50, weights: str | None = None, xlabel: str | None = None, ylabel: str | None = None, title: str | None = None) None[source]

Plots a mirror histogram of a variable by another grouping binary variable.

Parameters:
data : pd.DataFrame

A pandas DataFrame containing the var and group column.

var : str

Name of the column for which the histogram needs to be drawn.

group : str

Name of the binary column based on which the histogram will be mirrored.

bins : int, optional

Number of bins for the histograms. Default is 50.

weights : str, optional

Name of the column based on which the histogram will be weighted. Default is None.

xlabel : str, optional

Label for the x-axis. If not provided, defaults to the name of the var column.

ylabel : str, optional

Label for the y-axis. If not provided, defaults to “Frequency”.

title : str, optional

Title of the plot. If not provided, defaults to “Mirror Histogram of var by group”.

Raises:
  • TypeError – If var, group, weights, xlabel, ylabel, or title are not of type str. If data is not a pandas DataFrame. If var is not numerical. If weights is not numerical.

  • ValueError – If the bins parameter is not a positive integer. If the data DataFrame is empty. If the group column does not contain exactly two unique, non-NaN values.

Examples

Example 1: Basic usage with numerical data.

>>> import pandas as pd
>>> import seaborn as sns
>>> import numpy as np
>>> from skmiscpy import plot_mirror_histogram
>>> data = pd.DataFrame({
...     'group': [1, 1, 0, 0, 1, 0],
...     'var': [2.0, 3.5, 3.0, 2.2, 2.2, 3.3]
... })
>>> plot_mirror_histogram(data=data, var='var', group='group')

Example 2: With weights and custom labels.

>>> data = pd.DataFrame({
...     'group': [1, 1, 0, 0, 1, 0],
...     'var': [2.0, 3.5, 3.0, 2.2, 2.2, 3.3],
...     'weights': [1.0, 1.5, 2.0, 1.2, 1.1, 0.8]
... })
>>> plot_mirror_histogram(
...     data=data, var='var', group='group', weights='weights',
...     xlabel='Variable', ylabel='Count', title='Weighted Mirror Histogram'
... )
skmiscpy.plotting.plot_smd(data: pandas.DataFrame, add_ref_line: bool = False, ref_line_value: int | float = 0.1, *args, **kwargs) None[source]

Plots the standardized mean difference (SMD) for variables as a point plot (also known as a love plot), displaying unadjusted (and adjusted, if provided) SMDs. Optionally includes a vertical reference line.

Parameters:
data : pd.DataFrame

A pandas DataFrame with at least two columns: variables and unadjusted_smd, containing the variable names and their associated unadjusted SMD values. To include the adjusted SMD in the plot, the DataFrame must also contain a column adjusted_smd with the adjusted SMD values. The column names must be variables, unadjusted_smd, and adjusted_smd.

add_ref_line : bool, optional

Whether to add a vertical reference line. Defaults to False.

ref_line_value : int or float, optional

The value at which to draw the vertical reference line. Defaults to 0.1. Must be between 0 and 1.

*args

Additional positional arguments passed to Seaborn’s pointplot.

**kwargs

Additional keyword arguments passed to Seaborn’s pointplot.

Raises:
  • ValueError – If ref_line_value is not between 0 and 1, or if the input DataFrame is empty.

  • TypeError – If data is not a pandas DataFrame, or if add_ref_line is not a boolean. Additionally, raises TypeError if ref_line_value is not an integer or float.

Examples

  1. Basic usage with only unadjusted SMD:

>>> import pandas as pd
>>> from skmiscpy import plot_smd
>>> data = pd.DataFrame({
...     'variables': ['var1', 'var2', 'var3'],
...     'unadjusted_smd': [0.2, 0.5, 0.3]
... })
>>> plot_smd(data)
# This will plot the unadjusted SMD values with default settings.
  1. Including adjusted SMD with a reference line:

>>> data = pd.DataFrame({
...     'variables': ['var1', 'var2', 'var3'],
...     'unadjusted_smd': [0.2, 0.5, 0.3],
...     'adjusted_smd': [0.1, 0.4, 0.2]
... })
>>> plot_smd(data, add_ref_line=True, ref_line_value=0.3)
# This will plot both unadjusted and adjusted SMD values, with a vertical reference line at 0.3.
  1. Customizing the plot appearance:

>>> data = pd.DataFrame({
...     'variables': ['var1', 'var2', 'var3'],
...     'unadjusted_smd': [0.2, 0.5, 0.3],
...     'adjusted_smd': [0.1, 0.4, 0.2]
... })
>>> plot_smd(
...     data,
...     add_ref_line=True,
...     ref_line_value=0.2,
...     palette='husl',
...     markers=['o', 'D'],
...     linestyle='--'
... )
# This will plot the SMD values with custom color palette, markers, and linestyle for the plot.