基数排序 - 元素码农

发布时间: 2025-03-21 18:44

↑

# 基数排序

基数排序（Radix Sort）是一种非比较性的整数排序算法。它的基本思想是按照整数的每个位（个位、十位、百位等）分别排序，从最低位开始，逐位向高位进行。由于每个位的排序都是稳定的，所以最终得到的结果也是有序的。

## 基本原理

1. 找出数组中的最大值，确定最高位数
2. 从最低位开始，对每一位进行计数排序
3. 对每一位的排序结果作为下一位排序的输入
4. 重复步骤2-3，直到处理完最高位

## 动画演示

```
初始数组：[ 170, 45, 75, 90, 802, 24, 2, 66 ]

按个位数排序：
[ 170, 90, 802, 2, 24, 45, 75, 66 ]

按十位数排序：
[ 802, 2, 24, 45, 66, 70, 75, 90 ]

按百位数排序：
[ 2, 24, 45, 66, 70, 75, 90, 802 ]
```

## 代码实现

### 基本实现

```python
def radix_sort(arr):
    if not arr:
        return arr
    
    # 找出最大值，确定最高位数
    max_num = max(arr)
    exp = 1  # 当前处理的位数
    
    # 当前位数小于等于最大值，继续排序
    while max_num // exp > 0:
        counting_sort_by_digit(arr, exp)
        exp *= 10
    
    return arr

def counting_sort_by_digit(arr, exp):
    n = len(arr)
    output = [0] * n
    count = [0] * 10  # 0-9的计数数组
    
    # 统计当前位上每个数字出现的次数
    for i in range(n):
        digit = (arr[i] // exp) % 10
        count[digit] += 1
    
    # 修改count数组，使其包含位置信息
    for i in range(1, 10):
        count[i] += count[i - 1]
    
    # 从后向前遍历，保持稳定性
    for i in range(n - 1, -1, -1):
        digit = (arr[i] // exp) % 10
        output[count[digit] - 1] = arr[i]
        count[digit] -= 1
    
    # 将排序结果复制回原数组
    for i in range(n):
        arr[i] = output[i]

# 使用示例
arr = [170, 45, 75, 90, 802, 24, 2, 66]
print("排序前:", arr)
radix_sort(arr)
print("排序后:", arr)
```

### 支持负数的版本

```python
def radix_sort_with_negative(arr):
    if not arr:
        return arr
    
    # 分离正负数
    negative = [num for num in arr if num < 0]
    positive = [num for num in arr if num >= 0]
    
    # 将负数取绝对值并排序
    negative = [-num for num in negative]
    if negative:
        radix_sort(negative)
        negative = [-num for num in reversed(negative)]
    
    # 排序正数
    if positive:
        radix_sort(positive)
    
    # 合并结果
    return negative + positive

# 使用示例
arr = [170, -45, 75, 90, -802, 24, 2, -66]
print("排序前:", arr)
sorted_arr = radix_sort_with_negative(arr)
print("排序后:", sorted_arr)
```

### 字符串排序版本

```python
def string_radix_sort(arr):
    if not arr:
        return arr
    
    # 找出最长字符串的长度
    max_len = max(len(s) for s in arr)
    
    # 从右向左对每个位置进行排序
    for pos in range(max_len - 1, -1, -1):
        # 使用计数排序对当前位置进行排序
        counting_sort_by_char(arr, pos)
    
    return arr

def counting_sort_by_char(arr, pos):
    n = len(arr)
    output = [''] * n
    count = [0] * 128  # ASCII字符的计数数组
    
    # 统计字符出现次数
    for s in arr:
        # 如果字符串长度小于pos，则视为空字符（ASCII值为0）
        char = ord(s[pos]) if pos < len(s) else 0
        count[char] += 1
    
    # 修改count数组，使其包含位置信息
    for i in range(1, 128):
        count[i] += count[i - 1]
    
    # 从后向前遍历，保持稳定性
    for i in range(n - 1, -1, -1):
        char = ord(arr[i][pos]) if pos < len(arr[i]) else 0
        output[count[char] - 1] = arr[i]
        count[char] -= 1
    
    # 将排序结果复制回原数组
    for i in range(n):
        arr[i] = output[i]

# 使用示例
strings = ["apple", "dog", "cat", "ball", "elephant"]
print("排序前:", strings)
string_radix_sort(strings)
print("排序后:", strings)
```

## 性能分析

### 时间复杂度

- 最好情况：O(d * n)，其中d是最大数的位数，n是数组长度
- 最坏情况：O(d * n)
- 平均情况：O(d * n)

### 空间复杂度

- O(n + k)，其中k是每个位可能的数字个数（对于十进制数是10）

### 稳定性

- 稳定排序算法
- 相等元素的相对位置保持不变

## 优缺点

### 优点

1. 是稳定的排序算法
2. 适合整数和字符串排序
3. 当关键字位数较少时，性能优秀
4. 可以并行化实现

### 缺点

1. 需要额外的空间
2. 只能排序整数或可以转化为整数的数据
3. 当数字位数很多时，性能下降

## 应用场景

1. 整数排序（尤其是位数较少的整数）
2. 字符串排序
3. 日期时间排序
4. 固定格式数据的排序

## 实际应用示例

### 1. 日期排序

```python
from datetime import datetime

class DateRadixSort:
    def __init__(self):
        self.dates = []
    
    def add_date(self, date_str):
        # 假设日期格式为"YYYY-MM-DD"
        date = datetime.strptime(date_str, "%Y-%m-%d")
        self.dates.append(date)
    
    def sort_dates(self):
        if not self.dates:
            return []
        
        # 将日期转换为整数进行排序
        date_ints = [(d.year * 10000 + d.month * 100 + d.day) for d in self.dates]
        
        # 对整数进行基数排序
        sorted_indices = list(range(len(date_ints)))
        exp = 1
        max_int = max(date_ints)
        
        while max_int // exp > 0:
            self._counting_sort_by_digit(date_ints, sorted_indices, exp)
            exp *= 10
        
        # 返回排序后的日期
        return [self.dates[i] for i in sorted_indices]
    
    def _counting_sort_by_digit(self, date_ints, indices, exp):
        n = len(date_ints)
        output = [0] * n
        count = [0] * 10
        
        for i in indices:
            digit = (date_ints[i] // exp) % 10
            count[digit] += 1
        
        for i in range(1, 10):
            count[i] += count[i - 1]
        
        for i in range(n - 1, -1, -1):
            current = indices[i]
            digit = (date_ints[current] // exp) % 10
            output[count[digit] - 1] = current
            count[digit] -= 1
        
        for i in range(n):
            indices[i] = output[i]

# 使用示例
sorter = DateRadixSort()
dates = [
    "2023-05-15",
    "2023-03-10",
    "2023-07-22",
    "2023-01-05",
    "2023-04-30"
]

for date in dates:
    sorter.add_date(date)

print("排序前:")
for date in sorter.dates:
    print(date.strftime("%Y-%m-%d"))

sorted_dates = sorter.sort_dates()

print("\n排序后:")
for date in sorted_dates:
    print(date.strftime("%Y-%m-%d"))
```

### 2. 文件名排序

```python
class FileNameSorter:
    def __init__(self):
        self.files = []
    
    def add_file(self, filename):
        self.files.append(filename)
    
    def sort_filenames(self):
        # 将文件名补齐到相同长度
        max_len = max(len(name) for name in self.files)
        padded_files = [(name.ljust(max_len), name) for name in self.files]
        
        # 从右向左对每个字符进行排序
        for pos in range(max_len - 1, -1, -1):
            self._counting_sort_by_char(padded_files, pos)
        
        # 返回原始文件名
        return [original for _, original in padded_files]
    
    def _counting_sort_by_char(self, padded_files, pos):
        n = len(padded_files)
        output = [(None, None)] * n
        count = [0] * 128  # ASCII字符
        
        for padded, _ in padded_files:
            char = ord(padded[pos])
            count[char] += 1
        
        for i in range(1, 128):
            count[i] += count[i - 1]
        
        for i in range(n - 1, -1, -1):
            char = ord(padded_files[i][0][pos])
            output[count[char] - 1] = padded_files[i]
            count[char] -= 1
        
        for i in range(n):
            padded_files[i] = output[i]

# 使用示例
sorter = FileNameSorter()
filenames = [
    "document1.txt",
    "image10.jpg",
    "image2.jpg",
    "document10.txt",
    "document2.txt"
]

for filename in filenames:
    sorter.add_file(filename)

print("排序前:")
for filename in sorter.files:
    print(filename)

sorted_files = sorter.sort_filenames()

print("\n排序后:")
for filename in sorted_files:
    print(filename)
```

## 总结

基数排序是一种特殊的排序算法，它具有以下特点：

1. 非比较性排序算法
2. 时间复杂度为线性O(d * n)
3. 稳定排序
4. 适用于整数和字符串排序

在实际应用中，基数排序常用于：
- 整数排序（尤其是位数较少的整数）
- 字符串排序（如文件名、单词等）
- 日期时间排序
- 固定格式数据的排序

虽然基数排序有其局限性（只适用于可以分解为位的数据），但在适合的场景下，它是一个非常高效的排序算法。通过合理的优化和改进，基数排序可以满足不同的需求，如处理负数、字符串排序等。