pandas.DataFrame.describe — pandas 0.20.2 documentation (2024)

Generates descriptive statistics that summarize the central tendency,dispersion and shape of a dataset’s distribution, excludingNaN values.

Analyzes both numeric and object series, as wellas DataFrame column sets of mixed data types. The outputwill vary depending on what is provided. Refer to the notesbelow for more detail.

Parameters:

Parameters:	percentiles : list-like of numbers, optional The percentiles to include in the output. All shouldfall between 0 and 1. The default is`[.25, .5, .75]`, which returns the 25th, 50th, and75th percentiles. include : ‘all’, list-like of dtypes or None (default), optional A white list of data types to include in the result. Ignoredfor `Series`. Here are the options: ‘all’ : All columns of the input will be included in the output. A list-like of dtypes : Limits the results to theprovided data types.To limit the result to numeric types submit`numpy.number`. To limit it instead to categoricalobjects submit the `numpy.object` data type. Stringscan also be used in the style of`select_dtypes` (e.g. `df.describe(include=['O'])`) None (default) : The result will include all numeric columns. exclude : list-like of dtypes or None (default), optional, A black list of data types to omit from the result. Ignoredfor `Series`. Here are the options: A list-like of dtypes : Excludes the provided data typesfrom the result. To select numeric types submit`numpy.number`. To select categorical objects submit the datatype `numpy.object`. Strings can also be used in the style of`select_dtypes` (e.g. `df.describe(include=['O'])`) None (default) : The result will exclude nothing.
Returns:	summary: Series/DataFrame of summary statistics

percentiles : list-like of numbers, optional

The percentiles to include in the output. All shouldfall between 0 and 1. The default is[.25, .5, .75], which returns the 25th, 50th, and75th percentiles.

include : ‘all’, list-like of dtypes or None (default), optional

A white list of data types to include in the result. Ignoredfor Series. Here are the options:
‘all’ : All columns of the input will be included in the output.
A list-like of dtypes : Limits the results to theprovided data types.To limit the result to numeric types submitnumpy.number. To limit it instead to categoricalobjects submit the numpy.object data type. Stringscan also be used in the style ofselect_dtypes (e.g. df.describe(include=['O']))
None (default) : The result will include all numeric columns.

exclude : list-like of dtypes or None (default), optional,

A black list of data types to omit from the result. Ignoredfor Series. Here are the options:
A list-like of dtypes : Excludes the provided data typesfrom the result. To select numeric types submitnumpy.number. To select categorical objects submit the datatype numpy.object. Strings can also be used in the style ofselect_dtypes (e.g. df.describe(include=['O']))
None (default) : The result will exclude nothing.

Returns:

summary: Series/DataFrame of summary statistics

Describing a DataFrame. By default only numeric fieldsare returned.

>>> df = pd.DataFrame([[1, 'a'], [2, 'b'], [3, 'c']],...  columns=['numeric', 'object'])>>> df.describe() numericcount 3.0mean 2.0std 1.0min 1.025% 1.550% 2.075% 2.5max 3.0

Describing all columns of a DataFrame regardless of data type.

>>> df.describe(include='all') numeric objectcount 3.0 3unique NaN 3top NaN bfreq NaN 1mean 2.0 NaNstd 1.0 NaNmin 1.0 NaN25% 1.5 NaN50% 2.0 NaN75% 2.5 NaNmax 3.0 NaN

Describing a column from a DataFrame by accessing it asan attribute.

>>> df.numeric.describe()count 3.0mean 2.0std 1.0min 1.025% 1.550% 2.075% 2.5max 3.0Name: numeric, dtype: float64

Including only numeric columns in a DataFrame description.

>>> df.describe(include=[np.number]) numericcount 3.0mean 2.0std 1.0min 1.025% 1.550% 2.075% 2.5max 3.0

Including only string columns in a DataFrame description.

>>> df.describe(include=[np.object]) objectcount 3unique 3top bfreq 1

Excluding numeric columns from a DataFrame description.

>>> df.describe(exclude=[np.number]) objectcount 3unique 3top bfreq 1

Excluding object columns from a DataFrame description.

>>> df.describe(exclude=[np.object]) numericcount 3.0mean 2.0std 1.0min 1.025% 1.550% 2.075% 2.5max 3.0