skmiscpy.plotting¶
Functions¶
|
Plots a mirror histogram of a variable by another grouping binary variable. |
|
Plots the standardized mean difference (SMD) for variables as a point plot (also known as a love plot), |
Module Contents¶
-
skmiscpy.plotting.plot_mirror_histogram(data: pandas.DataFrame, var: str, group: str, bins: int =
50, weights: str | None =None, xlabel: str | None =None, ylabel: str | None =None, title: str | None =None) None[source]¶ Plots a mirror histogram of a variable by another grouping binary variable.
- Parameters:¶
- data : pd.DataFrame¶
A pandas DataFrame containing the var and group column.
- var : str¶
Name of the column for which the histogram needs to be drawn.
- group : str¶
Name of the binary column based on which the histogram will be mirrored.
- bins : int, optional¶
Number of bins for the histograms. Default is 50.
- weights : str, optional¶
Name of the column based on which the histogram will be weighted. Default is None.
- xlabel : str, optional¶
Label for the x-axis. If not provided, defaults to the name of the var column.
- ylabel : str, optional¶
Label for the y-axis. If not provided, defaults to “Frequency”.
- title : str, optional¶
Title of the plot. If not provided, defaults to “Mirror Histogram of var by group”.
- Raises:¶
TypeError – If var, group, weights, xlabel, ylabel, or title are not of type str. If data is not a pandas DataFrame. If var is not numerical. If weights is not numerical.
ValueError – If the bins parameter is not a positive integer. If the data DataFrame is empty. If the group column does not contain exactly two unique, non-NaN values.
Examples
Example 1: Basic usage with numerical data.
>>> import pandas as pd >>> import seaborn as sns >>> import numpy as np >>> from skmiscpy import plot_mirror_histogram>>> data = pd.DataFrame({ ... 'group': [1, 1, 0, 0, 1, 0], ... 'var': [2.0, 3.5, 3.0, 2.2, 2.2, 3.3] ... }) >>> plot_mirror_histogram(data=data, var='var', group='group')Example 2: With weights and custom labels.
>>> data = pd.DataFrame({ ... 'group': [1, 1, 0, 0, 1, 0], ... 'var': [2.0, 3.5, 3.0, 2.2, 2.2, 3.3], ... 'weights': [1.0, 1.5, 2.0, 1.2, 1.1, 0.8] ... }) >>> plot_mirror_histogram( ... data=data, var='var', group='group', weights='weights', ... xlabel='Variable', ylabel='Count', title='Weighted Mirror Histogram' ... )
-
skmiscpy.plotting.plot_smd(data: pandas.DataFrame, add_ref_line: bool =
False, ref_line_value: int | float =0.1, *args, **kwargs) None[source]¶ Plots the standardized mean difference (SMD) for variables as a point plot (also known as a love plot), displaying unadjusted (and adjusted, if provided) SMDs. Optionally includes a vertical reference line.
- Parameters:¶
- data : pd.DataFrame¶
A pandas DataFrame with at least two columns: variables and unadjusted_smd, containing the variable names and their associated unadjusted SMD values. To include the adjusted SMD in the plot, the DataFrame must also contain a column adjusted_smd with the adjusted SMD values. The column names must be variables, unadjusted_smd, and adjusted_smd.
- add_ref_line : bool, optional¶
Whether to add a vertical reference line. Defaults to False.
- ref_line_value : int or float, optional¶
The value at which to draw the vertical reference line. Defaults to 0.1. Must be between 0 and 1.
- *args
Additional positional arguments passed to Seaborn’s pointplot.
- **kwargs
Additional keyword arguments passed to Seaborn’s pointplot.
- Raises:¶
ValueError – If ref_line_value is not between 0 and 1, or if the input DataFrame is empty.
TypeError – If data is not a pandas DataFrame, or if add_ref_line is not a boolean. Additionally, raises TypeError if ref_line_value is not an integer or float.
Examples
Basic usage with only unadjusted SMD:
>>> import pandas as pd >>> from skmiscpy import plot_smd>>> data = pd.DataFrame({ ... 'variables': ['var1', 'var2', 'var3'], ... 'unadjusted_smd': [0.2, 0.5, 0.3] ... })>>> plot_smd(data) # This will plot the unadjusted SMD values with default settings.Including adjusted SMD with a reference line:
>>> data = pd.DataFrame({ ... 'variables': ['var1', 'var2', 'var3'], ... 'unadjusted_smd': [0.2, 0.5, 0.3], ... 'adjusted_smd': [0.1, 0.4, 0.2] ... })>>> plot_smd(data, add_ref_line=True, ref_line_value=0.3) # This will plot both unadjusted and adjusted SMD values, with a vertical reference line at 0.3.Customizing the plot appearance:
>>> data = pd.DataFrame({ ... 'variables': ['var1', 'var2', 'var3'], ... 'unadjusted_smd': [0.2, 0.5, 0.3], ... 'adjusted_smd': [0.1, 0.4, 0.2] ... })>>> plot_smd( ... data, ... add_ref_line=True, ... ref_line_value=0.2, ... palette='husl', ... markers=['o', 'D'], ... linestyle='--' ... ) # This will plot the SMD values with custom color palette, markers, and linestyle for the plot.