Python Brasil 2025
25/10/2025
Palco 2
pt-br
Data Morph: A Cautionary Tale of Summary Statistics
Iniciante
Come learn about #DataMorph, a new open source Python package and teaching tool that can be used to morph an input dataset of 2D points into select shapes, while preserving the summary statistics.

Details

Statistics do not come intuitively to humans; they always try to find simple ways to describe complex things. Given a complex dataset, they may feel tempted to use simple summary statistics like the mean, median, or standard deviation to describe it. However, these numbers are not a replacement for visualizing the distribution.

To illustrate this fact, researchers have generated many datasets that are very different visually, but share the same summary statistics. In this talk, I will discuss [Data Morph](https://github.com/stefmolin/data-morph), an open source package that builds on previous research using simulated annealing to perturb an arbitrary input dataset into a variety of shapes, while preserving the mean, standard deviation, and correlation to multiple decimal points. I will showcase how it works, discuss the challenges faced during development, and explore the limitations of this approach.