Python Pandas - Descriptive Statistics (2024)

Python Pandas - Descriptive Statistics (1)

  • Python Pandas Tutorial
  • Python Pandas - Home
  • Python Pandas - Introduction
  • Python Pandas - Environment Setup
  • Introduction to Data Structures
  • Python Pandas - Series
  • Python Pandas - DataFrame
  • Python Pandas - Panel
  • Python Pandas - Basic Functionality
  • Descriptive Statistics
  • Function Application
  • Python Pandas - Reindexing
  • Python Pandas - Iteration
  • Python Pandas - Sorting
  • Working with Text Data
  • Statistical Functions
  • Python Pandas - Window Functions
  • Python Pandas - Aggregations
  • Python Pandas - Missing Data
  • Python Pandas - GroupBy
  • Python Pandas - Merging/Joining
  • Python Pandas - Concatenation
  • Python Pandas - Date Functionality
  • Python Pandas - Timedelta
  • Python Pandas - Categorical Data
  • Python Pandas - Visualization
  • Python Pandas - IO Tools
  • Python Pandas - Sparse Data
  • Comparison with SQL
  • Python Pandas Useful Resources
  • Python Pandas - Quick Guide
  • Python Pandas - Useful Resources
  • Python Pandas - Discussion
  • Selected Reading
  • UPSC IAS Exams Notes
  • Developer's Best Practices
  • Questions and Answers
  • Effective Resume Writing
  • HR Interview Questions
  • Computer Glossary
  • Who is Who

'; var adpushup = adpushup || {}; adpushup.que = adpushup.que || []; adpushup.que.push(function() { adpushup.triggerAd(ad_id); });

Previous
Next

A large number of methods collectively compute descriptive statistics and other related operations on DataFrame. Most of these are aggregations like sum(), mean(), but some of them, like sumsum(), produce an object of the same size. Generally speaking, these methods take an axis argument, just like ndarray.{sum, std, ...}, but the axis can be specified by name or integer

  • DataFrame − “index” (axis=0, default), “columns” (axis=1)

Let us create a DataFrame and use this object throughout this chapter for all the operations.

Example

import pandas as pdimport numpy as np#Create a Dictionary of seriesd = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}#Create a DataFramedf = pd.DataFrame(d)print df

Its output is as follows −

 Age Name Rating0 25 Tom 4.231 26 James 3.242 25 Ricky 3.983 23 Vin 2.564 30 Steve 3.205 29 Smith 4.606 23 Jack 3.807 34 Lee 3.788 40 David 2.989 30 Gasper 4.8010 51 Betina 4.1011 46 Andres 3.65

sum()

Returns the sum of the values for the requested axis. By default, axis is index (axis=0).

import pandas as pdimport numpy as np #Create a Dictionary of seriesd = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}#Create a DataFramedf = pd.DataFrame(d)print df.sum()

Its output is as follows −

Age 382Name TomJamesRickyVinSteveSmithJackLeeDavidGasperBe...Rating 44.92dtype: object

Each individual column is added individually (Strings are appended).

axis=1

This syntax will give the output as shown below.

import pandas as pdimport numpy as np #Create a Dictionary of seriesd = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])} #Create a DataFramedf = pd.DataFrame(d)print df.sum(1)

Its output is as follows −

0 29.231 29.242 28.983 25.564 33.205 33.606 26.807 37.788 42.989 34.8010 55.1011 49.65dtype: float64

mean()

Returns the average value

import pandas as pdimport numpy as np#Create a Dictionary of seriesd = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}#Create a DataFramedf = pd.DataFrame(d)print df.mean()

Its output is as follows −

Age 31.833333Rating 3.743333dtype: float64

std()

Returns the Bressel standard deviation of the numerical columns.

import pandas as pdimport numpy as np#Create a Dictionary of seriesd = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}#Create a DataFramedf = pd.DataFrame(d)print df.std()

Its output is as follows −

Age 9.232682Rating 0.661628dtype: float64

Functions & Description

Let us now understand the functions under Descriptive Statistics in Python Pandas. The following table list down the important functions −

Sr.No.FunctionDescription
1count()Number of non-null observations
2sum()Sum of values
3mean()Mean of Values
4median()Median of Values
5mode()Mode of values
6std()Standard Deviation of the Values
7min()Minimum Value
8max()Maximum Value
9abs()Absolute Value
10prod()Product of Values
11c*msum()Cumulative Sum
12cumprod()Cumulative Product

Note − Since DataFrame is a Heterogeneous data structure. Generic operations don’t work with all functions.

  • Functions like sum(), c*msum() work with both numeric and character (or) string data elements without any error. Though n practice, character aggregations are never used generally, these functions do not throw any exception.

  • Functions like abs(), cumprod() throw exception when the DataFrame contains character or string data because such operations cannot be performed.

Summarizing Data

The describe() function computes a summary of statistics pertaining to the DataFrame columns.

import pandas as pdimport numpy as np#Create a Dictionary of seriesd = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}#Create a DataFramedf = pd.DataFrame(d)print df.describe()

Its output is as follows −

 Age Ratingcount 12.000000 12.000000mean 31.833333 3.743333std 9.232682 0.661628min 23.000000 2.56000025% 25.000000 3.23000050% 29.500000 3.79000075% 35.500000 4.132500max 51.000000 4.800000

This function gives the mean, std and IQR values. And, function excludes the character columns and given summary about numeric columns. 'include' is the argument which is used to pass necessary information regarding what columns need to be considered for summarizing. Takes the list of values; by default, 'number'.

  • object − Summarizes String columns
  • number − Summarizes Numeric columns
  • all − Summarizes all columns together (Should not pass it as a list value)

Now, use the following statement in the program and check the output −

import pandas as pdimport numpy as np#Create a Dictionary of seriesd = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}#Create a DataFramedf = pd.DataFrame(d)print df.describe(include=['object'])

Its output is as follows −

 Namecount 12unique 12top Rickyfreq 1

Now, use the following statement and check the output −

import pandas as pdimport numpy as np#Create a Dictionary of seriesd = {'Name':pd.Series(['Tom','James','Ricky','Vin','Steve','Smith','Jack', 'Lee','David','Gasper','Betina','Andres']), 'Age':pd.Series([25,26,25,23,30,29,23,34,40,30,51,46]), 'Rating':pd.Series([4.23,3.24,3.98,2.56,3.20,4.6,3.8,3.78,2.98,4.80,4.10,3.65])}#Create a DataFramedf = pd.DataFrame(d)print df. describe(include='all')

Its output is as follows −

 Age Name Ratingcount 12.000000 12 12.000000unique NaN 12 NaNtop NaN Ricky NaNfreq NaN 1 NaNmean 31.833333 NaN 3.743333std 9.232682 NaN 0.661628min 23.000000 NaN 2.56000025% 25.000000 NaN 3.23000050% 29.500000 NaN 3.79000075% 35.500000 NaN 4.132500max 51.000000 NaN 4.800000

Print Page

Previous Next

Advertisem*nts

';adpushup.triggerAd(ad_id); });

Python Pandas - Descriptive Statistics (2024)

FAQs

How to do descriptive statistics in Pandas? ›

Get the Descriptive Statistics in Pandas DataFrame
  1. Step 1: Collect the Data. To start, collect the data for your DataFrame. ...
  2. Step 2: Create the DataFrame. Next, create the DataFrame based on the data collected: ...
  3. Step 3: Get the Descriptive Statistics.

How to generate descriptive statistics in Python? ›

The describe() function computes a summary of statistics pertaining to the DataFrame columns. This function gives the mean, std and IQR values. And, function excludes the character columns and given summary about numeric columns.

What is df.describe() in Python? ›

Pandas DataFrame describe() Method

The describe() method returns description of the data in the DataFrame. If the DataFrame contains numerical data, the description contains these information for each column: count - The number of not-empty values. mean - The average (mean) value.

What does 25-50-75 in Pandas describe? ›

For numeric data, the result's index will include count , mean , std , min , max as well as lower, 50 and upper percentiles. By default the lower percentile is 25 and the upper percentile is 75 . The 50 percentile is the same as the median.

How do you generate descriptive statistics? ›

To calculate descriptive statistics:
  1. Mean: Add up all the scores and divide by the number of scores. ...
  2. Median: Arrange the scores in ascending order and find the middle value. ...
  3. Mode: Identify the score(s) that appear(s) most frequently. ...
  4. Range: Calculate the difference between the highest and lowest scores.

How would you generate descriptive statistics for all the columns for a DataFrame df? ›

If we apply . describe() to an entire DataFrame, it returns a brand new DataFrame with rows that correspond to all essential descriptive statistics. By default, it will only include the columns with integer and float dtypes.

Can you use Python for statistical analysis? ›

Python statistics libraries are comprehensive, popular, and widely used tools that will assist you in working with data. In this tutorial, you'll learn: What numerical quantities you can use to describe and summarize your datasets.

What is descriptive statistics for beginners? ›

Descriptive statistics summarizes or describes the characteristics of a data set. Descriptive statistics consists of three basic categories of measures: measures of central tendency, measures of variability (or spread), and frequency distribution.

What is the formula for descriptive statistics? ›

This is calculated by summing all of the data values and dividing by the total number of data items you have. It is normally called the mean or the average. If you have a data consisting of n observations (x1,...,xn) ( x 1 , . . . , x n ) then the mean (¯x) is given by the formula: ¯x=1nn∑i=1 xi.

How to show summary statistics in Pandas? ›

Yes, you can get summary statistics of the Pandas dataframe by using the describe() method. Here's how it's done: You need to enable JavaScript to run this app. This method returns a new dataframe containing statistics such as count, mean, standard deviation, minimum, and maximum values for each column.

Why use df in Pandas? ›

The pandas DataFrame is a structure that contains two-dimensional data and its corresponding labels. DataFrames are widely used in data science, machine learning, scientific computing, and many other data-intensive fields. DataFrames are similar to SQL tables or the spreadsheets that you work with in Excel or Calc.

How does Pandas describe round values? ›

Pandas round() function rounds a DataFrame value to a number with given decimal places. This function provides the flexibility to round different columns by different decimal places.

What is the 80 20 rule in pandas? ›

Putting Pareto's Principle to work on the Pandas library

For those who don't know, Pareto's Principle (also known as the 80–20 rule) says that 20% of your inputs will always contribute towards generating 80% of your outputs.

What is the difference between describe and info in pandas? ›

info() method allows us to learn the shape of object types of our data. The . describe() method gives us summary statistics for numerical columns in our DataFrame.

How do you find the 50th percentile in pandas? ›

Use the quantile() Function

The quantile() function is used to find the percentile statistics of a given column in a Pandas DataFrame. We can use this function to find any percentile, such as the median (50th percentile), first quartile (25th percentile), third quartile (75th percentile), etc.

How do you present data in descriptive statistics? ›

There are several ways of presenting descriptive statistics in your paper. These include graphs, central tendency, dispersion and measures of association tables. Graphs: Quantitative data can be graphically represented in histograms, pie charts, scatter plots, line graphs, sociograms and geographic information systems.

How do you write descriptive statistics in results? ›

Presenting Descriptive Statistics in Writing

They can be presented either in the narrative description of the results or parenthetically—much like reference citations. Here are some examples: The mean age of the participants was 22.43 years with a standard deviation of 2.34.

What is the difference between DF describe and DF info? ›

info() method allows us to learn the shape of object types of our data. The . describe() method gives us summary statistics for numerical columns in our DataFrame.

Top Articles
Amerhart | A Look at Formica Laminate Product Grades
Is It True Weight Loss is 80% Diet and 20% Exercise?
Fernald Gun And Knife Show
SZA: Weinen und töten und alles dazwischen
Athletic Squad With Poles Crossword
Select The Best Reagents For The Reaction Below.
AB Solutions Portal | Login
Minn Kota Paws
Canelo Vs Ryder Directv
Tabler Oklahoma
Housing Intranet Unt
Revitalising marine ecosystems: D-Shape’s innovative 3D-printed reef restoration solution - StartmeupHK
Our Facility
Syracuse Jr High Home Page
Reddit Wisconsin Badgers Leaked
Best Food Near Detroit Airport
Saberhealth Time Track
Moonshiner Tyler Wood Net Worth
ᐅ Bosch Aero Twin A 863 S Scheibenwischer
Conscious Cloud Dispensary Photos
Enterprise Car Sales Jacksonville Used Cars
Mail.zsthost Change Password
Echat Fr Review Pc Retailer In Qatar Prestige Pc Providers – Alpha Marine Group
Equipamentos Hospitalares Diversos (Lote 98)
Diamond Piers Menards
ZURU - XSHOT - Insanity Mad Mega Barrel - Speelgoedblaster - Met 72 pijltjes | bol
[PDF] PDF - Education Update - Free Download PDF
The Listings Project New York
Baldur's Gate 3: Should You Obey Vlaakith?
Elite Dangerous How To Scan Nav Beacon
14 Top-Rated Attractions & Things to Do in Medford, OR
Cowboy Pozisyon
R/Mp5
Progressbook Newark
Was heißt AMK? » Bedeutung und Herkunft des Ausdrucks
Autotrader Bmw X5
Appraisalport Com Dashboard /# Orders
2024 Ford Bronco Sport for sale - McDonough, GA - craigslist
Umiami Sorority Rankings
Rochester Ny Missed Connections
Ise-Vm-K9 Eol
Michael Jordan: A timeline of the NBA legend
Craigslist Boats Dallas
Citibank Branch Locations In North Carolina
Toomics - Die unendliche Welt der Comics online
Searsport Maine Tide Chart
The Largest Banks - ​​How to Transfer Money With Only Card Number and CVV (2024)
Sams Gas Price San Angelo
Big Brother 23: Wiki, Vote, Cast, Release Date, Contestants, Winner, Elimination
Google Flights Missoula
Itsleaa
Latest Posts
Article information

Author: Wyatt Volkman LLD

Last Updated:

Views: 6095

Rating: 4.6 / 5 (46 voted)

Reviews: 85% of readers found this page helpful

Author information

Name: Wyatt Volkman LLD

Birthday: 1992-02-16

Address: Suite 851 78549 Lubowitz Well, Wardside, TX 98080-8615

Phone: +67618977178100

Job: Manufacturing Director

Hobby: Running, Mountaineering, Inline skating, Writing, Baton twirling, Computer programming, Stone skipping

Introduction: My name is Wyatt Volkman LLD, I am a handsome, rich, comfortable, lively, zealous, graceful, gifted person who loves writing and wants to share my knowledge and understanding with you.