NumPy Statistical Functions

When working with numerical data in Python, NumPy statistical functions are your best friends for performing mathematical analysis. NumPy statistical functions provide efficient tools to calculate measures like mean, median, standard deviation, and much more. Whether you’re analyzing sensor data, processing financial records, or working on machine learning datasets, understanding NumPy statistical functions is crucial. In this comprehensive guide, we’ll explore various NumPy statistical functions that help you extract meaningful insights from your data arrays.

Understanding NumPy Statistical Functions

NumPy statistical functions are built-in methods that allow you to perform statistical calculations on arrays efficiently. These functions work seamlessly with NumPy arrays and provide faster computation compared to standard Python loops. The NumPy library offers a wide range of statistical functions that can handle one-dimensional and multi-dimensional arrays with ease.

Basic Statistical Measures

numpy.mean() - Calculate Average Values

The numpy.mean() function calculates the arithmetic mean (average) of array elements. This NumPy statistical function is one of the most commonly used functions for understanding the central tendency of your data. You can calculate the mean across the entire array or along specific axes.

Syntax:

numpy.mean(array, axis=None, dtype=None)

Parameters:

  • array: Input array or array-like object
  • axis: Axis along which the mean is calculated (optional)
  • dtype: Data type for computation (optional)

Let me show you how this works:

import numpy as np

# Single array mean
scores = np.array([85, 92, 78, 90, 88])
average_score = np.mean(scores)
print(f"Average score: {average_score}")  # Output: 86.6

When working with multi-dimensional arrays, you can specify the axis parameter to calculate means along rows or columns:

# 2D array mean along different axes
exam_scores = np.array([[85, 92, 78], 
                        [90, 88, 95], 
                        [78, 85, 82]])

# Mean of all elements
overall_mean = np.mean(exam_scores)
print(f"Overall mean: {overall_mean}")

# Mean along axis 0 (column-wise)
column_means = np.mean(exam_scores, axis=0)
print(f"Column means: {column_means}")

# Mean along axis 1 (row-wise)
row_means = np.mean(exam_scores, axis=1)
print(f"Row means: {row_means}")

numpy.median() - Find Middle Values

The numpy.median() function returns the median value of array elements. Unlike the mean, the median is less affected by extreme values (outliers), making it a robust NumPy statistical function for skewed distributions. The median represents the middle value when data is sorted.

Syntax:

numpy.median(array, axis=None)

Here’s how you calculate median values:

# Median with odd number of elements
prices = np.array([120, 150, 180, 200, 350])
median_price = np.median(prices)
print(f"Median price: {median_price}")  # Output: 180.0

# Median with even number of elements
salaries = np.array([45000, 52000, 58000, 65000])
median_salary = np.median(salaries)
print(f"Median salary: {median_salary}")  # Output: 55000.0

numpy.std() - Calculate Standard Deviation

The numpy.std() function computes the standard deviation, which measures the amount of variation in your dataset. This NumPy statistical function helps you understand how spread out your data points are from the mean. Higher standard deviation indicates greater variability.

Syntax:

numpy.std(array, axis=None, ddof=0)

Parameters:

  • ddof: Delta Degrees of Freedom (0 for population, 1 for sample)
# Calculate standard deviation
test_scores = np.array([72, 85, 78, 90, 88, 76, 82])
std_deviation = np.std(test_scores)
print(f"Standard deviation: {std_deviation:.2f}")

# Sample standard deviation
sample_std = np.std(test_scores, ddof=1)
print(f"Sample standard deviation: {sample_std:.2f}")

numpy.var() - Calculate Variance

The numpy.var() function calculates the variance of array elements. Variance is the square of standard deviation and represents how far data points spread from the mean. This NumPy statistical function is particularly useful in statistical analysis and probability calculations.

Syntax:

numpy.var(array, axis=None, ddof=0)
# Calculate variance
daily_sales = np.array([120, 145, 132, 158, 142, 138, 150])
variance = np.var(daily_sales)
print(f"Variance: {variance:.2f}")

# Relationship between variance and standard deviation
std = np.sqrt(variance)
print(f"Standard deviation from variance: {std:.2f}")

Minimum and Maximum Functions

numpy.min() and numpy.max() - Find Extreme Values

The numpy.min() and numpy.max() functions find the minimum and maximum values in an array. These NumPy statistical functions are essential for identifying the range of your data and detecting extreme values.

Syntax:

numpy.min(array, axis=None)
numpy.max(array, axis=None)
# Find minimum and maximum
temperatures = np.array([23, 28, 19, 32, 26, 21, 30])
min_temp = np.min(temperatures)
max_temp = np.max(temperatures)
print(f"Temperature range: {min_temp}°C to {max_temp}°C")

# Find min/max along specific axis
monthly_sales = np.array([[150, 180, 165], 
                          [190, 175, 200], 
                          [160, 195, 185]])

min_per_month = np.min(monthly_sales, axis=1)
max_per_product = np.max(monthly_sales, axis=0)
print(f"Minimum per month: {min_per_month}")
print(f"Maximum per product: {max_per_product}")

numpy.ptp() - Calculate Peak to Peak Range

The numpy.ptp() function calculates the range (peak-to-peak) of values in an array. This NumPy statistical function returns the difference between maximum and minimum values, giving you a quick measure of data spread.

Syntax:

numpy.ptp(array, axis=None)
# Calculate peak-to-peak range
stock_prices = np.array([145.50, 152.30, 148.90, 156.80, 150.20])
price_range = np.ptp(stock_prices)
print(f"Stock price range: ${price_range:.2f}")

Percentile and Quantile Functions

numpy.percentile() - Calculate Percentiles

The numpy.percentile() function calculates the nth percentile of array elements. Percentiles are NumPy statistical functions that help you understand the distribution of your data by showing the value below which a given percentage of observations fall.

Syntax:

numpy.percentile(array, q, axis=None)

Parameters:

  • q: Percentile value(s) to compute (0-100)
# Calculate various percentiles
student_ages = np.array([18, 19, 20, 21, 22, 23, 24, 25, 26, 28, 30])

percentile_25 = np.percentile(student_ages, 25)
percentile_50 = np.percentile(student_ages, 50)  # Same as median
percentile_75 = np.percentile(student_ages, 75)

print(f"25th percentile: {percentile_25}")
print(f"50th percentile (median): {percentile_50}")
print(f"75th percentile: {percentile_75}")

# Multiple percentiles at once
percentiles = np.percentile(student_ages, [10, 25, 50, 75, 90])
print(f"Multiple percentiles: {percentiles}")

numpy.quantile() - Calculate Quantiles

The numpy.quantile() function is similar to percentile but uses values between 0 and 1 instead of 0 to 100. This NumPy statistical function is commonly used in statistical analysis and data science applications.

Syntax:

numpy.quantile(array, q, axis=None)
# Calculate quantiles
response_times = np.array([120, 145, 132, 158, 142, 138, 150, 165, 155, 148])

q1 = np.quantile(response_times, 0.25)
q2 = np.quantile(response_times, 0.50)
q3 = np.quantile(response_times, 0.75)

print(f"Q1 (25%): {q1}")
print(f"Q2 (50%): {q2}")
print(f"Q3 (75%): {q3}")

# Interquartile range
iqr = q3 - q1
print(f"Interquartile range: {iqr}")

Correlation and Covariance

numpy.corrcoef() - Calculate Correlation Coefficient

The numpy.corrcoef() function computes the Pearson correlation coefficient between arrays. This NumPy statistical function measures the linear relationship between two variables, with values ranging from -1 (perfect negative correlation) to 1 (perfect positive correlation).

Syntax:

numpy.corrcoef(x, y=None)
# Calculate correlation coefficient
study_hours = np.array([2, 3, 4, 5, 6, 7, 8])
exam_scores = np.array([65, 70, 75, 80, 85, 88, 92])

correlation_matrix = np.corrcoef(study_hours, exam_scores)
correlation_value = correlation_matrix[0, 1]
print(f"Correlation coefficient: {correlation_value:.4f}")

numpy.cov() - Calculate Covariance

The numpy.cov() function calculates the covariance matrix. Covariance measures how two variables change together, and this NumPy statistical function is fundamental in multivariate statistical analysis.

Syntax:

numpy.cov(x, y=None)
# Calculate covariance
advertising_spend = np.array([1000, 1500, 2000, 2500, 3000])
revenue = np.array([15000, 22000, 28000, 35000, 42000])

covariance_matrix = np.cov(advertising_spend, revenue)
covariance_value = covariance_matrix[0, 1]
print(f"Covariance: {covariance_value:.2f}")

Histogram Function

numpy.histogram() - Create Data Distributions

The numpy.histogram() function computes the histogram of a dataset. This NumPy statistical function divides data into bins and counts the number of values in each bin, helping you understand the distribution of your data.

Syntax:

numpy.histogram(array, bins=10, range=None)

Parameters:

  • bins: Number of bins or bin edges
  • range: Lower and upper range of bins
# Create histogram
data = np.array([12, 15, 18, 22, 25, 28, 32, 35, 38, 42, 45, 48, 52, 55])

hist, bin_edges = np.histogram(data, bins=5)
print(f"Histogram counts: {hist}")
print(f"Bin edges: {bin_edges}")

# Histogram with specific range
hist_range, edges_range = np.histogram(data, bins=3, range=(10, 50))
print(f"Histogram with range: {hist_range}")

Advanced Statistical Functions

numpy.average() - Weighted Average

The numpy.average() function calculates the weighted average of array elements. Unlike numpy.mean(), this NumPy statistical function allows you to assign different weights to different elements, making some values more important than others.

Syntax:

numpy.average(array, weights=None, axis=None)
# Calculate weighted average
grades = np.array([85, 92, 78, 88])
weights = np.array([0.2, 0.3, 0.25, 0.25])  # Different weight for each grade

weighted_avg = np.average(grades, weights=weights)
print(f"Weighted average: {weighted_avg:.2f}")

# Compare with regular mean
regular_mean = np.mean(grades)
print(f"Regular mean: {regular_mean:.2f}")

numpy.nanmean(), numpy.nanmedian() - Handle Missing Data

NumPy provides special statistical functions that ignore NaN (Not a Number) values. Functions like numpy.nanmean() and numpy.nanmedian() are NumPy statistical functions designed to work with datasets containing missing or invalid values.

Syntax:

numpy.nanmean(array, axis=None)
numpy.nanmedian(array, axis=None)
# Handle NaN values
sensor_data = np.array([25.5, np.nan, 27.3, 26.8, np.nan, 28.1, 26.5])

# Regular mean would return nan
regular_mean = np.mean(sensor_data)
print(f"Regular mean: {regular_mean}")

# nanmean ignores NaN values
clean_mean = np.nanmean(sensor_data)
print(f"Mean ignoring NaN: {clean_mean:.2f}")

# Similar for median
clean_median = np.nanmedian(sensor_data)
print(f"Median ignoring NaN: {clean_median:.2f}")

Complete Working Examples

Example 1: Analyzing Student Performance Dataset

import numpy as np

# Create student performance dataset
student_scores = np.array([
    [85, 92, 78, 88, 90],  # Math scores
    [90, 88, 85, 92, 87],  # Science scores
    [78, 82, 88, 85, 90],  # English scores
    [92, 95, 89, 91, 94]   # History scores
])

print("=== Student Performance Analysis ===\n")

# Overall statistics
print("Overall Statistics:")
print(f"Mean score across all subjects: {np.mean(student_scores):.2f}")
print(f"Median score: {np.median(student_scores):.2f}")
print(f"Standard deviation: {np.std(student_scores):.2f}")
print(f"Variance: {np.var(student_scores):.2f}")
print(f"Minimum score: {np.min(student_scores)}")
print(f"Maximum score: {np.max(student_scores)}")
print(f"Score range: {np.ptp(student_scores)}\n")

# Subject-wise analysis (along axis 1)
subjects = ['Math', 'Science', 'English', 'History']
print("Subject-wise Average Scores:")
subject_means = np.mean(student_scores, axis=1)
for subject, mean in zip(subjects, subject_means):
    print(f"{subject}: {mean:.2f}")

print("\n Student-wise Performance (across all subjects):")
student_means = np.mean(student_scores, axis=0)
for i, mean in enumerate(student_means, 1):
    print(f"Student {i}: {mean:.2f}")

# Percentile analysis
print("\n Percentile Analysis (All Scores):")
print(f"25th percentile: {np.percentile(student_scores, 25):.2f}")
print(f"50th percentile: {np.percentile(student_scores, 50):.2f}")
print(f"75th percentile: {np.percentile(student_scores, 75):.2f}")

# Identify top performers
print(f"\n Top 10% threshold: {np.percentile(student_scores, 90):.2f}")

# Correlation between subjects
print("\n Correlation Matrix (Math vs Science):")
correlation = np.corrcoef(student_scores[0], student_scores[1])
print(f"Correlation coefficient: {correlation[0, 1]:.4f}")

Output:

=== Student Performance Analysis ===

Overall Statistics:
Mean score across all subjects: 87.70
Median score: 88.00
Standard deviation: 4.56
Variance: 20.81
Minimum score: 78
Maximum score: 95
Score range: 17

Subject-wise Average Scores:
Math: 86.60
Science: 88.40
English: 84.60
History: 92.20

Student-wise Performance (across all subjects):
Student 1: 86.25
Student 2: 89.25
Student 3: 85.00
Student 4: 89.00
Student 5: 90.25

Percentile Analysis (All Scores):
25th percentile: 85.00
50th percentile: 88.00
75th percentile: 91.00

Top 10% threshold: 93.40

Correlation Matrix (Math vs Science):
Correlation coefficient: 0.6455

Example 2: Financial Data Analysis with Missing Values

import numpy as np

# Stock prices with some missing data (NaN)
stock_data = np.array([
    [150.5, 152.3, np.nan, 155.8, 153.2],  # Stock A
    [88.2, np.nan, 90.5, 92.3, 91.0],      # Stock B
    [np.nan, 45.8, 47.2, 46.5, 48.1],      # Stock C
    [210.5, 215.0, 218.3, np.nan, 220.5]   # Stock D
])

print("=== Stock Market Analysis ===\n")

# Handling missing data with nan functions
print("Stock Price Statistics (Handling Missing Data):")
stock_names = ['Stock A', 'Stock B', 'Stock C', 'Stock D']

for i, name in enumerate(stock_names):
    prices = stock_data[i]
    print(f"\n{name}:")
    print(f"  Mean price: ${np.nanmean(prices):.2f}")
    print(f"  Median price: ${np.nanmedian(prices):.2f}")
    print(f"  Std deviation: ${np.nanstd(prices):.2f}")
    print(f"  Min price: ${np.nanmin(prices):.2f}")
    print(f"  Max price: ${np.nanmax(prices):.2f}")
    print(f"  Price range: ${np.ptp(prices[~np.isnan(prices)]):.2f}")

# Daily statistics across all stocks
print("\n Daily Market Statistics:")
days = ['Day 1', 'Day 2', 'Day 3', 'Day 4', 'Day 5']

for i, day in enumerate(days):
    day_prices = stock_data[:, i]
    print(f"\n{day}:")
    print(f"  Average: ${np.nanmean(day_prices):.2f}")
    print(f"  Median: ${np.nanmedian(day_prices):.2f}")
    print(f"  Std dev: ${np.nanstd(day_prices):.2f}")

# Calculate returns (percentage change)
print("\n Stock A Detailed Analysis:")
stock_a_prices = stock_data[0]
valid_prices = stock_a_prices[~np.isnan(stock_a_prices)]

# Calculate returns
returns = np.diff(valid_prices) / valid_prices[:-1] * 100
print(f"Daily returns: {returns}")
print(f"Average return: {np.mean(returns):.2f}%")
print(f"Return volatility: {np.std(returns):.2f}%")

# Quantile analysis for risk assessment
print("\n Risk Assessment (All Valid Prices):")
all_valid_prices = stock_data[~np.isnan(stock_data)]
q25, q50, q75 = np.quantile(all_valid_prices, [0.25, 0.5, 0.75])
print(f"Q1 (25%): ${q25:.2f}")
print(f"Q2 (50%): ${q50:.2f}")
print(f"Q3 (75%): ${q75:.2f}")
print(f"IQR: ${(q75 - q25):.2f}")

# Histogram of all prices
print("\n Price Distribution:")
hist, bin_edges = np.histogram(all_valid_prices, bins=5)
print("Price ranges and frequencies:")
for i in range(len(hist)):
    print(f"  ${bin_edges[i]:.2f} - ${bin_edges[i+1]:.2f}: {hist[i]} occurrences")

Output:

=== Stock Market Analysis ===

Stock Price Statistics (Handling Missing Data):

Stock A:
  Mean price: $152.95
  Median price: $153.20
  Std deviation: $2.02
  Min price: $150.50
  Max price: $155.80
  Price range: $5.30

Stock B:
  Mean price: $90.50
  Median price: $90.50
  Std deviation: $1.60
  Min price: $88.20
  Max price: $92.30
  Price range: $4.10

Stock C:
  Mean price: $46.90
  Median price: $46.95
  Std deviation: $0.96
  Min price: $45.80
  Max price: $48.10
  Price range: $2.30

Stock D:
  Mean price: $216.08
  Median price: $215.00
  Std deviation: $4.12
  Min price: $210.50
  Max price: $220.50
  Price range: $10.00

Daily Market Statistics:

Day 1:
  Average: $149.73
  Median: $119.35
  Std dev: $66.32

Day 2:
  Average: $137.70
  Median: $134.05
  Std dev: $71.82

Day 3:
  Average: $118.67
  Median: $90.50
  Std dev: $72.28

Day 4:
  Average: $98.20
  Median: $92.30
  Std dev: $34.12

Day 5:
  Average: $128.20
  Median: $122.10
  Std dev: $72.87

Stock A Detailed Analysis:
Daily returns: [ 1.19590164 -1.43678161  2.09160305 -1.66237138]
Average return: 0.05%
Return volatility: 1.63%

Risk Assessment (All Valid Prices):
Q1 (25%): $47.95
Q2 (50%): $91.75
Q3 (75%): $154.58
IQR: $106.62

Price Distribution:
Price ranges and frequencies:
  $45.80 - $80.72: 4 occurrences
  $80.72 - $115.64: 4 occurrences
  $115.64 - $150.56: 4 occurrences
  $150.56 - $185.48: 4 occurrences
  $185.48 - $220.50: 4 occurrences

Example 3: Sensor Data Quality Analysis

import numpy as np

# Simulating sensor readings from multiple sensors
temperature_sensor = np.array([22.5, 23.1, 22.8, 24.2, 23.5, 22.9, 23.8, 24.0, 23.3, 22.7])
humidity_sensor = np.array([45, 47, 46, 52, 48, 45, 50, 51, 49, 46])
pressure_sensor = np.array([1013, 1012, 1014, 1015, 1013, 1012, 1014, 1016, 1015, 1013])

print("=== Environmental Sensor Analysis ===\n")

# Temperature analysis
print("Temperature Sensor Statistics:")
print(f"Mean: {np.mean(temperature_sensor):.2f}°C")
print(f"Median: {np.median(temperature_sensor):.2f}°C")
print(f"Standard deviation: {np.std(temperature_sensor):.4f}°C")
print(f"Variance: {np.var(temperature_sensor):.4f}")
print(f"Range: {np.ptp(temperature_sensor):.2f}°C")
print(f"Min: {np.min(temperature_sensor):.2f}°C")
print(f"Max: {np.max(temperature_sensor):.2f}°C")

# Humidity analysis
print("\nHumidity Sensor Statistics:")
print(f"Mean: {np.mean(humidity_sensor):.2f}%")
print(f"Median: {np.median(humidity_sensor):.2f}%")
print(f"Standard deviation: {np.std(humidity_sensor):.4f}%")
print(f"Range: {np.ptp(humidity_sensor):.2f}%")

# Pressure analysis
print("\nPressure Sensor Statistics:")
print(f"Mean: {np.mean(pressure_sensor):.2f} hPa")
print(f"Median: {np.median(pressure_sensor):.2f} hPa")
print(f"Standard deviation: {np.std(pressure_sensor):.4f} hPa")
print(f"Range: {np.ptp(pressure_sensor):.2f} hPa")

# Correlation analysis between sensors
print("\nCorrelation Analysis:")
temp_humidity_corr = np.corrcoef(temperature_sensor, humidity_sensor)[0, 1]
temp_pressure_corr = np.corrcoef(temperature_sensor, pressure_sensor)[0, 1]
humidity_pressure_corr = np.corrcoef(humidity_sensor, pressure_sensor)[0, 1]

print(f"Temperature vs Humidity: {temp_humidity_corr:.4f}")
print(f"Temperature vs Pressure: {temp_pressure_corr:.4f}")
print(f"Humidity vs Pressure: {humidity_pressure_corr:.4f}")

# Covariance analysis
print("\nCovariance Analysis:")
temp_humidity_cov = np.cov(temperature_sensor, humidity_sensor)[0, 1]
print(f"Temperature-Humidity covariance: {temp_humidity_cov:.4f}")

# Percentile analysis for outlier detection
print("\nPercentile Analysis (Temperature):")
p10 = np.percentile(temperature_sensor, 10)
p90 = np.percentile(temperature_sensor, 90)
print(f"10th percentile: {p10:.2f}°C")
print(f"90th percentile: {p90:.2f}°C")
print(f"Normal range (10-90%): {p10:.2f}°C to {p90:.2f}°C")

# Quantile-based outlier detection
print("\nOutlier Detection using IQR Method:")
q1 = np.quantile(temperature_sensor, 0.25)
q3 = np.quantile(temperature_sensor, 0.75)
iqr = q3 - q1
lower_bound = q1 - 1.5 * iqr
upper_bound = q3 + 1.5 * iqr

print(f"Q1: {q1:.2f}°C")
print(f"Q3: {q3:.2f}°C")
print(f"IQR: {iqr:.2f}°C")
print(f"Outlier boundaries: [{lower_bound:.2f}, {upper_bound:.2f}]°C")

outliers = temperature_sensor[(temperature_sensor < lower_bound) | 
                               (temperature_sensor > upper_bound)]
if len(outliers) > 0:
    print(f"Outliers detected: {outliers}")
else:
    print("No outliers detected")

# Histogram distribution
print("\nTemperature Distribution:")
hist, bin_edges = np.histogram(temperature_sensor, bins=4)
print("Temperature bins and frequencies:")
for i in range(len(hist)):
    print(f"  {bin_edges[i]:.2f}°C - {bin_edges[i+1]:.2f}°C: {hist[i]} readings")

# Calculate coefficient of variation (CV) for sensor stability
print("\nSensor Stability (Coefficient of Variation):")
cv_temp = (np.std(temperature_sensor) / np.mean(temperature_sensor)) * 100
cv_humidity = (np.std(humidity_sensor) / np.mean(humidity_sensor)) * 100
cv_pressure = (np.std(pressure_sensor) / np.mean(pressure_sensor)) * 100

print(f"Temperature CV: {cv_temp:.2f}%")
print(f"Humidity CV: {cv_humidity:.2f}%")
print(f"Pressure CV: {cv_pressure:.2f}%")
print("\nLower CV indicates more stable sensor readings")

Output:

=== Environmental Sensor Analysis ===

Temperature Sensor Statistics:
Mean: 23.28°C
Median: 23.20°C
Standard deviation: 0.5534°C
Variance: 0.3063
Range: 1.70°C
Min: 22.50°C
Max: 24.20°C

Humidity Sensor Statistics:
Mean: 47.90%
Median: 47.50%
Standard deviation: 2.3302%
Range: 7.00%

Pressure Sensor Statistics:
Mean: 1013.70 hPa
Median: 1013.50 hPa
Standard deviation: 1.3375 hPa
Range: 4.00 hPa

Correlation Analysis:
Temperature vs Humidity: 0.8776
Temperature vs Pressure: 0.7756
Humidity vs Pressure: 0.8491

Covariance Analysis:
Temperature-Humidity covariance: 1.1300

Percentile Analysis (Temperature):
10th percentile: 22.66°C
90th percentile: 23.98°C
Normal range (10-90%): 22.66°C to 23.98°C

Outlier Detection using IQR Method:
Q1: 22.83°C
Q3: 23.80°C
IQR: 0.98°C
Outlier boundaries: [21.37, 25.25]°C
No outliers detected

Temperature Distribution:
Temperature bins and frequencies:
  22.50°C - 22.93°C: 3 readings
  22.93°C - 23.35°C: 3 readings
  23.35°C - 23.78°C: 1 readings
  23.78°C - 24.20°C: 3 readings

Sensor Stability (Coefficient of Variation):
Temperature CV: 2.38%
Humidity CV: 4.86%
Pressure CV: 0.13%

Lower CV indicates more stable sensor readings

These comprehensive examples demonstrate how NumPy statistical functions work in real-world scenarios. You can analyze student performance data, handle missing values in financial datasets, and perform quality analysis on sensor readings. NumPy statistical functions provide all the tools you need for effective data analysis in Python, from basic measures like mean and median to advanced techniques like correlation and quantile analysis.