NumPy Array Concatenation and Stacking

When working with numerical data in Python, NumPy array concatenation and NumPy array stacking are fundamental operations that you’ll use frequently. Whether you’re combining datasets, merging results from different computations, or organizing multidimensional data, understanding array concatenation in NumPy and array stacking techniques is essential. NumPy provides powerful functions like concatenate(), vstack(), hstack(), dstack(), and stack() to join arrays efficiently. In this comprehensive guide, we’ll explore all aspects of NumPy concatenation and NumPy stacking operations to help you master these critical array manipulation techniques.

Understanding NumPy Array Concatenation

NumPy array concatenation refers to the process of joining two or more arrays along an existing axis. The numpy.concatenate() function is the primary method for concatenating NumPy arrays. When you perform array concatenation, the arrays must have the same shape along all axes except the axis along which you’re concatenating. This is a crucial concept in NumPy concatenation operations.

The basic syntax for NumPy concatenate is straightforward. You pass a tuple or list of arrays as the first argument, and optionally specify the axis along which to perform the concatenation. Let’s look at how concatenate arrays in NumPy works with different dimensions.

The numpy.concatenate() Function

The numpy.concatenate() function is the most versatile tool for array concatenation in NumPy. This function accepts multiple arrays and joins them along a specified axis. The concatenate function parameters include the array sequence and an optional axis parameter (default is 0).

import numpy as np

# Creating two 1D arrays for concatenation
array1 = np.array([1, 2, 3])
array2 = np.array([4, 5, 6])

# Concatenating 1D arrays
result = np.concatenate((array1, array2))
print(result)  # Output: [1 2 3 4 5 6]

In this example, we performed 1D array concatenation using np.concatenate(). The function joined the two arrays end-to-end, creating a single array. This is the simplest form of NumPy concatenation.

Concatenating Along Different Axes

When working with multidimensional array concatenation, the axis parameter becomes crucial. For 2D array concatenation, you can concatenate along axis 0 (rows) or axis 1 (columns). Understanding axis-based concatenation is essential for effective NumPy array manipulation.

import numpy as np

# Creating 2D arrays
array_a = np.array([[1, 2], [3, 4]])
array_b = np.array([[5, 6], [7, 8]])

# Concatenating along axis 0 (vertically)
vertical_concat = np.concatenate((array_a, array_b), axis=0)
print("Vertical concatenation:")
print(vertical_concat)
# Output:
# [[1 2]
#  [3 4]
#  [5 6]
#  [7 8]]

# Concatenating along axis 1 (horizontally)
horizontal_concat = np.concatenate((array_a, array_b), axis=1)
print("\nHorizontal concatenation:")
print(horizontal_concat)
# Output:
# [[1 2 5 6]
#  [3 4 7 8]]

This example demonstrates how concatenate NumPy arrays along different axes. When you concatenate along axis 0, you’re stacking arrays vertically. When you concatenate along axis 1, you’re joining them horizontally. This flexibility makes NumPy concatenation powerful for various data manipulation tasks.

Concatenating Multiple Arrays

The numpy.concatenate() function isn’t limited to just two arrays. You can perform multiple array concatenation by passing as many arrays as needed. This is particularly useful when working with large dataset concatenation or when combining results from multiple operations.

import numpy as np

# Creating multiple arrays
arr1 = np.array([1, 2])
arr2 = np.array([3, 4])
arr3 = np.array([5, 6])
arr4 = np.array([7, 8])

# Concatenating multiple arrays at once
multi_concat = np.concatenate((arr1, arr2, arr3, arr4))
print(multi_concat)  # Output: [1 2 3 4 5 6 7 8]

Here we performed concatenation of multiple NumPy arrays in a single operation. This approach is more efficient than concatenating arrays pairwise, especially when dealing with bulk array concatenation.

NumPy Array Stacking Operations

NumPy array stacking is a specialized form of joining arrays that creates new dimensions. Unlike concatenation, which joins arrays along existing axes, stacking can create new axes. NumPy provides several stacking functions: stack(), vstack(), hstack(), and dstack(). Each stacking method serves different purposes in array organization.

Vertical Stacking with vstack()

The numpy.vstack() function performs vertical array stacking, which is equivalent to concatenation along axis 0 for 2D arrays. Vertical stacking in NumPy is particularly useful when you want to add rows to existing data. The vstack function automatically handles 1D arrays by treating them as row vectors.

import numpy as np

# Creating arrays for vertical stacking
row1 = np.array([1, 2, 3])
row2 = np.array([4, 5, 6])
row3 = np.array([7, 8, 9])

# Performing vertical stacking
vstacked = np.vstack((row1, row2, row3))
print(vstacked)
# Output:
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]]

This example shows vstack array stacking in action. The function took three 1D arrays and created a 2D array by stacking them vertically. This is a common operation in data preprocessing and matrix construction.

Horizontal Stacking with hstack()

Horizontal stacking with hstack() joins arrays along their second axis (axis 1) for 2D arrays. The numpy.hstack() function is used for horizontal array stacking, which is equivalent to column-wise concatenation. Hstack operations are essential when you need to add columns to your data.

import numpy as np

# Creating column vectors for horizontal stacking
col1 = np.array([[1], [2], [3]])
col2 = np.array([[4], [5], [6]])
col3 = np.array([[7], [8], [9]])

# Performing horizontal stacking
hstacked = np.hstack((col1, col2, col3))
print(hstacked)
# Output:
# [[1 4 7]
#  [2 5 8]
#  [3 6 9]]

In this horizontal stacking example, we combined three column vectors into a single 2D array. The hstack function is perfect for scenarios where you need to merge features or combine columns from different sources.

Depth Stacking with dstack()

The numpy.dstack() function performs depth-wise stacking, which stacks arrays along the third axis (axis 2). Depth stacking in NumPy is particularly useful when working with image data or creating 3D arrays. The dstack operation is less commonly used but powerful for specific multidimensional data operations.

import numpy as np

# Creating 2D arrays for depth stacking
layer1 = np.array([[1, 2], [3, 4]])
layer2 = np.array([[5, 6], [7, 8]])

# Performing depth stacking
dstacked = np.dstack((layer1, layer2))
print(dstacked)
print(f"Shape: {dstacked.shape}")
# Output:
# [[[1 5]
#   [2 6]]
#  [[3 7]
#   [4 8]]]
# Shape: (2, 2, 2)

This example demonstrates dstack array stacking. The function created a 3D array by stacking two 2D arrays along a new third dimension. This type of depth-based stacking is crucial for operations involving RGB image channels or time-series data layers.

General Stacking with stack()

The numpy.stack() function is the most flexible stacking method in NumPy. Unlike the specialized stacking functions, numpy.stack() allows you to specify any axis along which to stack arrays. This general-purpose stacking function gives you complete control over dimensional stacking.

import numpy as np

# Creating arrays for general stacking
arr_x = np.array([1, 2, 3])
arr_y = np.array([4, 5, 6])

# Stacking along axis 0 (creates new first dimension)
stack_axis0 = np.stack((arr_x, arr_y), axis=0)
print("Stacking along axis 0:")
print(stack_axis0)
# Output:
# [[1 2 3]
#  [4 5 6]]

# Stacking along axis 1 (creates new second dimension)
stack_axis1 = np.stack((arr_x, arr_y), axis=1)
print("\nStacking along axis 1:")
print(stack_axis1)
# Output:
# [[1 4]
#  [2 5]
#  [3 6]]

This example shows the versatility of np.stack() for custom axis stacking. By changing the axis parameter, you control how the arrays are combined. This makes stack() the most adaptable stacking function for various array arrangement scenarios.

Difference Between Concatenation and Stacking

Understanding the difference between concatenation and stacking is crucial for effective NumPy array manipulation. While both operations join arrays, they work differently. Concatenation joins arrays along an existing axis without changing the number of dimensions, whereas stacking can create a new axis and increase dimensionality.

When you use concatenate(), the resulting array has the same number of dimensions as the input arrays. However, when you use stack(), you’re adding a new dimension. For example, stacking two 1D arrays creates a 2D array, while concatenating them keeps the result as a 1D array. This fundamental distinction affects how you should choose between concatenation vs stacking.

import numpy as np

# Demonstrating the difference
arr1 = np.array([1, 2, 3])
arr2 = np.array([4, 5, 6])

# Concatenation - same dimensions
concat_result = np.concatenate((arr1, arr2))
print(f"Concatenate result: {concat_result}")
print(f"Concatenate shape: {concat_result.shape}")
# Output:
# Concatenate result: [1 2 3 4 5 6]
# Concatenate shape: (6,)

# Stacking - adds new dimension
stack_result = np.stack((arr1, arr2))
print(f"\nStack result:\n{stack_result}")
print(f"Stack shape: {stack_result.shape}")
# Output:
# Stack result:
# [[1 2 3]
#  [4 5 6]]
# Stack shape: (2, 3)

This comparison clearly illustrates the conceptual difference between concatenation and stacking operations. Choose concatenation when you want to extend arrays along existing dimensions, and choose stacking when you need to organize arrays into higher-dimensional structures.

Working with Different Array Shapes

When performing NumPy concatenation and stacking, understanding array shape compatibility is essential. For concatenation, arrays must have the same shape along all axes except the concatenation axis. For stacking, all input arrays must have exactly the same shape. Violating these rules results in shape mismatch errors.

import numpy as np

# Compatible shapes for concatenation
array_2x3 = np.array([[1, 2, 3], [4, 5, 6]])
array_1x3 = np.array([[7, 8, 9]])

# This works - same number of columns
concat_compatible = np.concatenate((array_2x3, array_1x3), axis=0)
print("Compatible concatenation:")
print(concat_compatible)
# Output:
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]]

# Compatible shapes for vstack
row_a = np.array([1, 2, 3])
row_b = np.array([4, 5, 6])

vstacked = np.vstack((row_a, row_b))
print("\nVStack result:")
print(vstacked)
# Output:
# [[1 2 3]
#  [4 5 6]]

This example demonstrates shape-compatible array concatenation and stacking. When working with NumPy array operations, always verify that your array shapes are compatible for the operation you’re performing.

Concatenating and Stacking Along Custom Axes

Custom axis operations in NumPy concatenation and stacking provide fine-grained control over how arrays are combined. For multidimensional arrays, you can specify which axis to use for the operation. Understanding axis indexing is critical for advanced array manipulation techniques.

import numpy as np

# Creating 3D arrays for custom axis operations
array_3d_1 = np.array([[[1, 2], [3, 4]], [[5, 6], [7, 8]]])
array_3d_2 = np.array([[[9, 10], [11, 12]], [[13, 14], [15, 16]]])

print("Original arrays shape:", array_3d_1.shape)  # (2, 2, 2)

# Concatenating along axis 0
concat_axis0 = np.concatenate((array_3d_1, array_3d_2), axis=0)
print(f"\nConcatenate axis 0 shape: {concat_axis0.shape}")  # (4, 2, 2)

# Concatenating along axis 1
concat_axis1 = np.concatenate((array_3d_1, array_3d_2), axis=1)
print(f"Concatenate axis 1 shape: {concat_axis1.shape}")  # (2, 4, 2)

# Concatenating along axis 2
concat_axis2 = np.concatenate((array_3d_1, array_3d_2), axis=2)
print(f"Concatenate axis 2 shape: {concat_axis2.shape}")  # (2, 2, 4)

This example shows 3D array concatenation along different axes. Each axis concatenation produces different output shapes, which is fundamental to understanding multidimensional array operations in NumPy.

Using Column Stack and Row Stack

NumPy provides additional convenience functions for array stacking: column_stack() and row_stack(). The column_stack function is similar to hstack() but handles 1D arrays differently by treating them as columns. The row_stack function is an alias for vstack().

import numpy as np

# Using column_stack
vec1 = np.array([1, 2, 3])
vec2 = np.array([4, 5, 6])
vec3 = np.array([7, 8, 9])

column_stacked = np.column_stack((vec1, vec2, vec3))
print("Column stack result:")
print(column_stacked)
# Output:
# [[1 4 7]
#  [2 5 8]
#  [3 6 9]]

# Using row_stack (equivalent to vstack)
row_stacked = np.row_stack((vec1, vec2, vec3))
print("\nRow stack result:")
print(row_stacked)
# Output:
# [[1 2 3]
#  [4 5 6]
#  [7 8 9]]

The column_stack and row_stack functions provide intuitive ways to organize 1D arrays into 2D structures. These functions are particularly useful for data organization and matrix construction tasks.

Practical Applications of Concatenation and Stacking

NumPy concatenation and stacking have numerous real-world applications in data science and scientific computing. These operations are essential for data preprocessing, feature engineering, batch processing, and combining model outputs. Let’s explore some practical scenarios where these techniques shine.

Combining Training and Testing Data

In machine learning workflows, you often need to combine datasets for processing. Array concatenation is perfect for merging training and testing data or combining data from multiple sources.

import numpy as np

# Simulating training and testing datasets
train_features = np.array([[1.2, 2.3, 3.4], [4.5, 5.6, 6.7]])
test_features = np.array([[7.8, 8.9, 9.0], [10.1, 11.2, 12.3]])

# Combining datasets
combined_features = np.concatenate((train_features, test_features), axis=0)
print("Combined dataset shape:", combined_features.shape)  # (4, 3)
print("Combined features:")
print(combined_features)

This demonstrates practical concatenation for data merging in machine learning pipelines.

Creating Batch Data

Stacking arrays is commonly used to create batches of data for neural network training or batch processing operations.

import numpy as np

# Individual data samples
sample1 = np.array([0.1, 0.2, 0.3, 0.4])
sample2 = np.array([0.5, 0.6, 0.7, 0.8])
sample3 = np.array([0.9, 1.0, 1.1, 1.2])

# Creating a batch using vstack
batch_data = np.vstack((sample1, sample2, sample3))
print("Batch shape:", batch_data.shape)  # (3, 4)
print("Batch data:")
print(batch_data)

This shows how vstack creates batched data from individual samples, a common requirement in deep learning.

Complete Working Examples

Now let’s put everything together with comprehensive examples that demonstrate various NumPy concatenation and stacking techniques in complete, runnable programs.

Example 1: Data Analysis Pipeline

import numpy as np

# Creating sample datasets from different sources
print("=== Data Analysis Pipeline ===\n")

# Monthly sales data from different regions
region_north = np.array([15000, 18000, 22000, 19000])
region_south = np.array([12000, 14000, 16000, 15000])
region_east = np.array([20000, 23000, 25000, 24000])
region_west = np.array([17000, 19000, 21000, 20000])

# Stacking regions into a single dataset
sales_data = np.vstack((region_north, region_south, region_east, region_west))
print("Sales Data by Region (rows) and Month (columns):")
print(sales_data)
print(f"Shape: {sales_data.shape}\n")

# Adding quarterly totals using hstack
quarterly_totals = np.sum(sales_data, axis=1).reshape(-1, 1)
sales_with_totals = np.hstack((sales_data, quarterly_totals))
print("Sales Data with Quarterly Totals:")
print(sales_with_totals)
print(f"Shape: {sales_with_totals.shape}\n")

# Concatenating with previous quarter data
previous_quarter = np.array([[14000, 17000, 21000, 18000, 70000],
                              [11000, 13000, 15000, 14000, 53000],
                              [19000, 22000, 24000, 23000, 88000],
                              [16000, 18000, 20000, 19000, 73000]])

full_year_data = np.concatenate((previous_quarter, sales_with_totals), axis=0)
print("Full Year Sales Data (2 Quarters):")
print(full_year_data)
print(f"Shape: {full_year_data.shape}\n")

# Calculate statistics
total_sales = np.sum(full_year_data[:, :-1])  # Exclude last column to avoid double counting
print(f"Total Sales: ${total_sales:,.2f}")
average_monthly_sales = np.mean(full_year_data[:, :-1])
print(f"Average Monthly Sales: ${average_monthly_sales:,.2f}")

Output:

=== Data Analysis Pipeline ===

Sales Data by Region (rows) and Month (columns):
[[15000 18000 22000 19000]
 [12000 14000 16000 15000]
 [20000 23000 25000 24000]
 [17000 19000 21000 20000]]
Shape: (4, 4)

Sales Data with Quarterly Totals:
[[15000 18000 22000 19000 74000]
 [12000 14000 16000 15000 57000]
 [20000 23000 25000 24000 92000]
 [17000 19000 21000 20000 77000]]
Shape: (4, 5)

Full Year Sales Data (2 Quarters):
[[14000 17000 21000 18000 70000]
 [11000 13000 15000 14000 53000]
 [19000 22000 24000 23000 88000]
 [16000 18000 20000 19000 73000]
 [15000 18000 22000 19000 74000]
 [12000 14000 16000 15000 57000]
 [20000 23000 25000 24000 92000]
 [17000 19000 21000 20000 77000]]
Shape: (8, 5)

Total Sales: $1192000.00
Average Monthly Sales: $18625.00

Example 2: Image Processing Simulation

import numpy as np

print("=== Image Processing with Array Operations ===\n")

# Simulating RGB channels of a small image (3x3 pixels)
red_channel = np.array([[255, 128, 64],
                        [200, 150, 100],
                        [180, 120, 80]])

green_channel = np.array([[0, 64, 128],
                          [50, 100, 150],
                          [70, 110, 140]])

blue_channel = np.array([[0, 32, 64],
                         [40, 80, 120],
                         [60, 100, 130]])

# Stacking channels to create RGB image using dstack
rgb_image = np.dstack((red_channel, green_channel, blue_channel))
print("RGB Image Array:")
print(rgb_image)
print(f"Shape: {rgb_image.shape} (height, width, channels)\n")

# Creating a second image
red_channel2 = np.array([[100, 150, 200],
                         [120, 170, 220],
                         [140, 190, 240]])

green_channel2 = np.array([[50, 100, 150],
                           [60, 110, 160],
                           [70, 120, 170]])

blue_channel2 = np.array([[25, 75, 125],
                          [35, 85, 135],
                          [45, 95, 145]])

rgb_image2 = np.dstack((red_channel2, green_channel2, blue_channel2))

# Concatenating images horizontally (side by side)
side_by_side = np.concatenate((rgb_image, rgb_image2), axis=1)
print("Images Concatenated Horizontally:")
print(f"Shape: {side_by_side.shape}\n")

# Concatenating images vertically (top and bottom)
stacked_vertically = np.concatenate((rgb_image, rgb_image2), axis=0)
print("Images Concatenated Vertically:")
print(f"Shape: {stacked_vertically.shape}\n")

# Extracting and analyzing color information
print("Sample pixel color values (RGB):")
print(f"Image 1, Pixel [0,0]: {rgb_image[0, 0]}")
print(f"Image 1, Pixel [1,1]: {rgb_image[1, 1]}")
print(f"Image 2, Pixel [2,2]: {rgb_image2[2, 2]}")

# Calculate average color across first image
avg_red = np.mean(red_channel)
avg_green = np.mean(green_channel)
avg_blue = np.mean(blue_channel)
print(f"\nAverage color in Image 1: RGB({avg_red:.1f}, {avg_green:.1f}, {avg_blue:.1f})")

Output:

=== Image Processing with Array Operations ===

RGB Image Array:
[[[255   0   0]
  [128  64  32]
  [ 64 128  64]]

 [[200  50  40]
  [150 100  80]
  [100 150 120]]

 [[180  70  60]
  [120 110 100]
  [ 80 140 130]]]
Shape: (3, 3, 3) (height, width, channels)

Images Concatenated Horizontally:
Shape: (3, 6, 3)

Images Concatenated Vertically:
Shape: (6, 3, 3)

Sample pixel color values (RGB):
Image 1, Pixel [0,0]: [255   0   0]
Image 1, Pixel [1,1]: [150 100  80]
Image 2, Pixel [2,2]: [240 170 145]

Average color in Image 1: RGB(139.4, 93.8, 78.2)

Example 3: Time Series Data Combination

import numpy as np

print("=== Time Series Data Combination ===\n")

# Temperature readings from different sensors (hourly data)
sensor_1_temp = np.array([22.5, 23.1, 23.8, 24.2, 24.5, 24.8])
sensor_2_temp = np.array([22.3, 23.0, 23.6, 24.0, 24.3, 24.6])
sensor_3_temp = np.array([22.7, 23.2, 23.9, 24.3, 24.6, 24.9])

# Humidity readings from the same sensors
sensor_1_humidity = np.array([65, 64, 63, 62, 61, 60])
sensor_2_humidity = np.array([66, 65, 64, 63, 62, 61])
sensor_3_humidity = np.array([64, 63, 62, 61, 60, 59])

# Stacking temperature readings using vstack
temperature_matrix = np.vstack((sensor_1_temp, sensor_2_temp, sensor_3_temp))
print("Temperature Readings (°C):")
print(temperature_matrix)
print(f"Shape: {temperature_matrix.shape}\n")

# Stacking humidity readings
humidity_matrix = np.vstack((sensor_1_humidity, sensor_2_humidity, sensor_3_humidity))
print("Humidity Readings (%):")
print(humidity_matrix)
print(f"Shape: {humidity_matrix.shape}\n")

# Creating a 3D array with both measurements using stack
combined_readings = np.stack((temperature_matrix, humidity_matrix), axis=2)
print("Combined Sensor Data (sensors, hours, measurements):")
print(f"Shape: {combined_readings.shape}")
print("\nSensor 1, All Hours, Both Measurements:")
print(combined_readings[0])
print("\n")

# Calculating averages
avg_temp_per_hour = np.mean(temperature_matrix, axis=0)
avg_humidity_per_hour = np.mean(humidity_matrix, axis=0)

# Creating summary using column_stack
hours = np.arange(1, 7)
summary = np.column_stack((hours, avg_temp_per_hour, avg_humidity_per_hour))
print("Hourly Summary (Hour, Avg Temp °C, Avg Humidity %):")
print(summary)
print()

# Finding extremes
max_temp = np.max(temperature_matrix)
min_temp = np.min(temperature_matrix)
max_humidity = np.max(humidity_matrix)
min_humidity = np.min(humidity_matrix)

print(f"Temperature Range: {min_temp}°C - {max_temp}°C")
print(f"Humidity Range: {min_humidity}% - {max_humidity}%")

Output:

=== Time Series Data Combination ===

Temperature Readings (°C):
[[22.5 23.1 23.8 24.2 24.5 24.8]
 [22.3 23.  23.6 24.  24.3 24.6]
 [22.7 23.2 23.9 24.3 24.6 24.9]]
Shape: (3, 6)

Humidity Readings (%):
[[65 64 63 62 61 60]
 [66 65 64 63 62 61]
 [64 63 62 61 60 59]]
Shape: (3, 6)

Combined Sensor Data (sensors, hours, measurements):
Shape: (3, 6, 2)

Sensor 1, All Hours, Both Measurements:
[[22.5 65. ]
 [23.1 64. ]
 [23.8 63. ]
 [24.2 62. ]
 [24.5 61. ]
 [24.8 60. ]]


Hourly Summary (Hour, Avg Temp °C, Avg Humidity %):
[[ 1.          22.5         65.        ]
 [ 2.          23.1         64.        ]
 [ 3.          23.76666667  63.        ]
 [ 4.          24.16666667  62.        ]
 [ 5.          24.46666667  61.        ]
 [ 6.          24.76666667  60.        ]]

Temperature Range: 22.3°C - 24.9°C
Humidity Range: 59% - 66%

Example 4: Matrix Operations and Data Augmentation

import numpy as np

print("=== Matrix Operations with Concatenation and Stacking ===\n")

# Creating feature matrices from different data sources
features_set1 = np.array([[1.0, 2.0, 3.0],
                          [4.0, 5.0, 6.0],
                          [7.0, 8.0, 9.0]])

features_set2 = np.array([[10.0, 11.0, 12.0],
                          [13.0, 14.0, 15.0],
                          [16.0, 17.0, 18.0]])

# Vertical concatenation to combine datasets
combined_features = np.concatenate((features_set1, features_set2), axis=0)
print("Combined Feature Matrix:")
print(combined_features)
print(f"Shape: {combined_features.shape}\n")

# Creating label vectors
labels_set1 = np.array([0, 1, 0])
labels_set2 = np.array([1, 1, 0])

# Concatenating labels
combined_labels = np.concatenate((labels_set1, labels_set2))
print("Combined Labels:")
print(combined_labels)
print(f"Shape: {combined_labels.shape}\n")

# Adding computed features using hstack
feature_sums = np.sum(combined_features, axis=1).reshape(-1, 1)
feature_means = np.mean(combined_features, axis=1).reshape(-1, 1)
feature_maxs = np.max(combined_features, axis=1).reshape(-1, 1)

augmented_features = np.hstack((combined_features, feature_sums, 
                                feature_means, feature_maxs))
print("Augmented Feature Matrix (original + sum + mean + max):")
print(augmented_features)
print(f"Shape: {augmented_features.shape}\n")

# Creating multiple datasets for batch processing
batch1 = np.array([[1, 2], [3, 4]])
batch2 = np.array([[5, 6], [7, 8]])
batch3 = np.array([[9, 10], [11, 12]])

# Stacking batches along a new axis
batched_data = np.stack((batch1, batch2, batch3), axis=0)
print("Batched Data (batch, samples, features):")
print(batched_data)
print(f"Shape: {batched_data.shape}\n")

# Processing each batch
print("Batch Statistics:")
for i, batch in enumerate(batched_data):
    batch_mean = np.mean(batch)
    batch_std = np.std(batch)
    print(f"Batch {i+1}: Mean = {batch_mean:.2f}, Std = {batch_std:.2f}")

# Creating a complete dataset with multiple feature types
numerical_features = np.array([[1.5, 2.5], [3.5, 4.5], [5.5, 6.5]])
categorical_encoded = np.array([[1, 0, 0], [0, 1, 0], [0, 0, 1]])
derived_features = np.array([[7.5], [8.5], [9.5]])

complete_dataset = np.hstack((numerical_features, categorical_encoded, derived_features))
print("\nComplete Feature Set (numerical + categorical + derived):")
print(complete_dataset)
print(f"Shape: {complete_dataset.shape}")

Output:

=== Matrix Operations with Concatenation and Stacking ===

Combined Feature Matrix:
[[ 1.  2.  3.]
 [ 4.  5.  6.]
 [ 7.  8.  9.]
 [10. 11. 12.]
 [13. 14. 15.]
 [16. 17. 18.]]
Shape: (6, 3)

Combined Labels:
[0 1 0 1 1 0]
Shape: (6,)

Augmented Feature Matrix (original + sum + mean + max):
[[ 1.  2.  3.  6.  2.  3.]
 [ 4.  5.  6. 15.  5.  6.]
 [ 7.  8.  9. 24.  8.  9.]
 [10. 11. 12. 33. 11. 12.]
 [13. 14. 15. 42. 14. 15.]
 [16. 17. 18. 51. 17. 18.]]
Shape: (6, 6)

Batched Data (batch, samples, features):
[[[ 1  2]
  [ 3  4]]

 [[ 5  6]
  [ 7  8]]

 [[ 9 10]
  [11 12]]]
Shape: (3, 2, 2)

Batch Statistics:
Batch 1: Mean = 2.50, Std = 1.12
Batch 2: Mean = 6.50, Std = 1.12
Batch 3: Mean = 10.50, Std = 1.12

Complete Feature Set (numerical + categorical + derived):
[[1.5 2.5 1.  0.  0.  7.5]
 [3.5 4.5 0.  1.  0.  8.5]
 [5.5 6.5 0.  0.  1.  9.5]]
Shape: (3, 6)

These comprehensive examples demonstrate the power and flexibility of NumPy array concatenation and stacking operations. Whether you’re working with data analysis, image processing, time series data, or machine learning pipelines, mastering these techniques is essential. The concatenate(), vstack(), hstack(), dstack(), and stack() functions provide all the tools you need to efficiently combine and organize arrays in NumPy. By understanding when to use concatenation versus stacking, and how different axes affect the results, you can handle complex array manipulation tasks with confidence. For more information, visit the official NumPy documentation.