Python Pandas - Descriptive Statistics (2024)

Python Pandas Tutorial
Python Pandas - Home
Python Pandas - Introduction
Python Pandas - Environment Setup
Introduction to Data Structures
Python Pandas - Series
Python Pandas - DataFrame
Python Pandas - Panel
Python Pandas - Basic Functionality
Descriptive Statistics
Function Application
Python Pandas - Reindexing
Python Pandas - Iteration
Python Pandas - Sorting
Working with Text Data
Statistical Functions
Python Pandas - Window Functions
Python Pandas - Aggregations
Python Pandas - Missing Data
Python Pandas - GroupBy
Python Pandas - Merging/Joining
Python Pandas - Concatenation
Python Pandas - Date Functionality
Python Pandas - Timedelta
Python Pandas - Categorical Data
Python Pandas - Visualization
Python Pandas - IO Tools
Python Pandas - Sparse Data
Comparison with SQL

Python Pandas Useful Resources
Python Pandas - Quick Guide
Python Pandas - Useful Resources
Python Pandas - Discussion

Selected Reading
UPSC IAS Exams Notes
Developer's Best Practices
Questions and Answers
Effective Resume Writing
HR Interview Questions
Computer Glossary
Who is Who

'; var adpushup = adpushup || {}; adpushup.que = adpushup.que || []; adpushup.que.push(function() { adpushup.triggerAd(ad_id); });

A large number of methods collectively compute descriptive statistics and other related operations on DataFrame. Most of these are aggregations like sum(), mean(), but some of them, like sumsum(), produce an object of the same size. Generally speaking, these methods take an axis argument, just like ndarray.{sum, std, ...}, but the axis can be specified by name or integer

DataFrame − “index” (axis=0, default), “columns” (axis=1)

Let us create a DataFrame and use this object throughout this chapter for all the operations.

Example

Live Demo

import pandas as pdimport numpy as np#Create a Dictionary of seriesd = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}#Create a DataFramedf = pd.DataFrame(d)print df

Its output is as follows −

sum()

Returns the sum of the values for the requested axis. By default, axis is index (axis=0).

Live Demo

import pandas as pdimport numpy as np #Create a Dictionary of seriesd = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}#Create a DataFramedf = pd.DataFrame(d)print df.sum()

Its output is as follows −

Age 382Name TomJamesRickyVinSteveSmithJackLeeDavidGasperBe...Rating 44.92dtype: object

Each individual column is added individually (Strings are appended).

axis=1

This syntax will give the output as shown below.

Live Demo

import pandas as pdimport numpy as np #Create a Dictionary of seriesd = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])} #Create a DataFramedf = pd.DataFrame(d)print df.sum(1)

Its output is as follows −

0 29.231 29.242 28.983 25.564 33.205 33.606 26.807 37.788 42.989 34.8010 55.1011 49.65dtype: float64

mean()

Returns the average value

Live Demo

import pandas as pdimport numpy as np#Create a Dictionary of seriesd = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}#Create a DataFramedf = pd.DataFrame(d)print df.mean()

Its output is as follows −

Age 31.833333Rating 3.743333dtype: float64

std()

Returns the Bressel standard deviation of the numerical columns.

Live Demo

Functions & Description

Let us now understand the functions under Descriptive Statistics in Python Pandas. The following table list down the important functions −

Sr.No.	Function	Description
1	count()	Number of non-null observations
2	sum()	Sum of values
3	mean()	Mean of Values
4	median()	Median of Values
5	mode()	Mode of values
6	std()	Standard Deviation of the Values
7	min()	Minimum Value
8	max()	Maximum Value
9	abs()	Absolute Value
10	prod()	Product of Values
11	c*msum()	Cumulative Sum
12	cumprod()	Cumulative Product

Note − Since DataFrame is a Heterogeneous data structure. Generic operations don’t work with all functions.

Functions like sum(), c*msum() work with both numeric and character (or) string data elements without any error. Though n practice, character aggregations are never used generally, these functions do not throw any exception.
Functions like abs(), cumprod() throw exception when the DataFrame contains character or string data because such operations cannot be performed.

Summarizing Data

The describe() function computes a summary of statistics pertaining to the DataFrame columns.

Live Demo

import pandas as pdimport numpy as np#Create a Dictionary of seriesd = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}#Create a DataFramedf = pd.DataFrame(d)print df.describe()

Its output is as follows −

 Age Ratingcount 12.000000 12.000000mean 31.833333 3.743333std 9.232682 0.661628min 23.000000 2.56000025% 25.000000 3.23000050% 29.500000 3.79000075% 35.500000 4.132500max 51.000000 4.800000

This function gives the mean, std and IQR values. And, function excludes the character columns and given summary about numeric columns. 'include' is the argument which is used to pass necessary information regarding what columns need to be considered for summarizing. Takes the list of values; by default, 'number'.

object − Summarizes String columns
number − Summarizes Numeric columns
all − Summarizes all columns together (Should not pass it as a list value)

Now, use the following statement in the program and check the output −

Live Demo

import pandas as pdimport numpy as np#Create a Dictionary of seriesd = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}#Create a DataFramedf = pd.DataFrame(d)print df.describe(include=['object'])

Its output is as follows −

 Namecount 12unique 12top Rickyfreq 1

Now, use the following statement and check the output −

Live Demo

import pandas as pdimport numpy as np#Create a Dictionary of seriesd = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}#Create a DataFramedf = pd.DataFrame(d)print df. describe(include='all')

Its output is as follows −

 Age Name Ratingcount 12.000000 12 12.000000unique NaN 12 NaNtop NaN Ricky NaNfreq NaN 1 NaNmean 31.833333 NaN 3.743333std 9.232682 NaN 0.661628min 23.000000 NaN 2.56000025% 25.000000 NaN 3.23000050% 29.500000 NaN 3.79000075% 35.500000 NaN 4.132500max 51.000000 NaN 4.800000

Print Page

Previous Next

Advertisem*nts

';adpushup.triggerAd(ad_id); });

Python Pandas - Descriptive Statistics (2024)

FAQs

How to do descriptive statistics in Pandas? ›

Get the Descriptive Statistics in Pandas DataFrame

Step 1: Collect the Data. To start, collect the data for your DataFrame. ...
Step 2: Create the DataFrame. Next, create the DataFrame based on the data collected: ...
Step 3: Get the Descriptive Statistics.

Read On ›

How to generate descriptive statistics in Python? ›

The describe() function computes a summary of statistics pertaining to the DataFrame columns. This function gives the mean, std and IQR values. And, function excludes the character columns and given summary about numeric columns.

Discover More Details ›

What is df.describe() in Python? ›

Pandas DataFrame describe() Method

The describe() method returns description of the data in the DataFrame. If the DataFrame contains numerical data, the description contains these information for each column: count - The number of not-empty values. mean - The average (mean) value.

What does 25-50-75 in Pandas describe? ›

For numeric data, the result's index will include count , mean , std , min , max as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75 . The 50 percentile is the same as the median.

See Details ›

How do you generate descriptive statistics? ›

To calculate descriptive statistics:

Mean: Add up all the scores and divide by the number of scores. ...
Median: Arrange the scores in ascending order and find the middle value. ...
Mode: Identify the score(s) that appear(s) most frequently. ...
Range: Calculate the difference between the highest and lowest scores.

More items...

Find Out More ›

How would you generate descriptive statistics for all the columns for a DataFrame df? ›

If we apply . describe() to an entire DataFrame, it returns a brand new DataFrame with rows that correspond to all essential descriptive statistics. By default, it will only include the columns with integer and float dtypes.

Tell Me More ›

Can you use Python for statistical analysis? ›

Python statistics libraries are comprehensive, popular, and widely used tools that will assist you in working with data. In this tutorial, you'll learn: What numerical quantities you can use to describe and summarize your datasets.

Show Me More ›

What is descriptive statistics for beginners? ›

Descriptive statistics summarizes or describes the characteristics of a data set. Descriptive statistics consists of three basic categories of measures: measures of central tendency, measures of variability (or spread), and frequency distribution.

Explore More ›

What is the formula for descriptive statistics? ›

This is calculated by summing all of the data values and dividing by the total number of data items you have. It is normally called the mean or the average. If you have a data consisting of n observations (x1,...,xn) ( x 1 , . . . , x n ) then the mean (¯x) is given by the formula: ¯x=1nn∑i=1 xi.

How to show summary statistics in Pandas? ›

Yes, you can get summary statistics of the Pandas dataframe by using the describe() method. Here's how it's done: You need to enable JavaScript to run this app. This method returns a new dataframe containing statistics such as count, mean, standard deviation, minimum, and maximum values for each column.

Show Me More ›

Why use df in Pandas? ›

The pandas DataFrame is a structure that contains two-dimensional data and its corresponding labels. DataFrames are widely used in data science, machine learning, scientific computing, and many other data-intensive fields. DataFrames are similar to SQL tables or the spreadsheets that you work with in Excel or Calc.

Read The Full Story ›

How does Pandas describe round values? ›

Pandas round() function rounds a DataFrame value to a number with given decimal places. This function provides the flexibility to round different columns by different decimal places.

See Details ›

What is the 80 20 rule in pandas? ›

Putting Pareto's Principle to work on the Pandas library

For those who don't know, Pareto's Principle (also known as the 80–20 rule) says that 20% of your inputs will always contribute towards generating 80% of your outputs.

Get More Info Here ›

What is the difference between describe and info in pandas? ›

info() method allows us to learn the shape of object types of our data. The . describe() method gives us summary statistics for numerical columns in our DataFrame.

How do you find the 50th percentile in pandas? ›

Use the quantile() Function

The quantile() function is used to find the percentile statistics of a given column in a Pandas DataFrame. We can use this function to find any percentile, such as the median (50th percentile), first quartile (25th percentile), third quartile (75th percentile), etc.

How do you present data in descriptive statistics? ›

There are several ways of presenting descriptive statistics in your paper. These include graphs, central tendency, dispersion and measures of association tables. Graphs: Quantitative data can be graphically represented in histograms, pie charts, scatter plots, line graphs, sociograms and geographic information systems.

View Details ›

How do you write descriptive statistics in results? ›

Presenting Descriptive Statistics in Writing

They can be presented either in the narrative description of the results or parenthetically—much like reference citations. Here are some examples: The mean age of the participants was 22.43 years with a standard deviation of 2.34.

What is the difference between DF describe and DF info? ›

info() method allows us to learn the shape of object types of our data. The . describe() method gives us summary statistics for numerical columns in our DataFrame.

Learn More ›