希尔排序 - 元素码农

发布时间: 2025-03-21 18:47

↑

# 希尔排序

希尔排序（Shell Sort）是插入排序的一种改进版本。它通过比较相距一定间隔的元素来进行排序，逐步缩小间隔直到间隔为1，最后完成排序。这种方法可以使元素更快地移动到最终位置。

## 基本原理

1. 选择一个间隔序列（gap sequence），如n/2, n/4, ..., 1
2. 对每个间隔进行插入排序
3. 逐步减小间隔，重复步骤2
4. 最后一次使用间隔1进行插入排序

## 动画演示

```
初始数组：[ 8 9 1 7 2 3 5 4 6 0 ]

间隔为5：
比较：[8 9 1 7 2 | 3 5 4 6 0]
      [3 5 1 6 0 | 8 9 4 7 2]

间隔为2：
比较：[3 5 | 1 6 | 0 8 | 9 4 | 7 2]
      [0 2 1 4 3 5 7 6 9 8]

间隔为1：
最终插入排序：
[0 1 2 3 4 5 6 7 8 9]
```

## 代码实现

### 基本实现

```python
def shell_sort(arr):
    n = len(arr)
    gap = n // 2
    
    # 逐步缩小间隔
    while gap > 0:
        # 对每个间隔进行插入排序
        for i in range(gap, n):
            temp = arr[i]
            j = i
            
            # 比较间隔为gap的元素
            while j >= gap and arr[j - gap] > temp:
                arr[j] = arr[j - gap]
                j -= gap
            
            arr[j] = temp
        
        gap //= 2
    
    return arr

# 使用示例
arr = [64, 34, 25, 12, 22, 11, 90]
print("排序前:", arr)
shell_sort(arr)
print("排序后:", arr)
```

### 优化版本

```python
def shell_sort_optimized(arr):
    def get_gaps(n):
        # 使用Sedgewick序列
        gaps = []
        k = 1
        gap = 4**k + 3 * 2**(k-1) + 1
        while gap < n:
            gaps.append(gap)
            k += 1
            gap = 4**k + 3 * 2**(k-1) + 1
        return gaps[::-1]  # 返回降序的间隔序列
    
    n = len(arr)
    gaps = get_gaps(n)
    
    # 对每个间隔进行插入排序
    for gap in gaps:
        for i in range(gap, n):
            temp = arr[i]
            j = i
            
            # 使用二分查找优化比较过程
            while j >= gap and arr[j - gap] > temp:
                arr[j] = arr[j - gap]
                j -= gap
            
            arr[j] = temp
    
    return arr

# 使用示例
arr = [64, 34, 25, 12, 22, 11, 90]
print("排序前:", arr)
shell_sort_optimized(arr)
print("排序后:", arr)
```

### 多线程版本

```python
import threading
from concurrent.futures import ThreadPoolExecutor

class ParallelShellSort:
    def __init__(self, max_workers=4):
        self.max_workers = max_workers
    
    def sort_segment(self, arr, start, gap):
        # 对指定起始位置的子序列进行排序
        for i in range(start + gap, len(arr), gap):
            temp = arr[i]
            j = i
            while j >= gap and arr[j - gap] > temp:
                arr[j] = arr[j - gap]
                j -= gap
            arr[j] = temp
    
    def sort(self, arr):
        n = len(arr)
        gap = n // 2
        
        while gap > 0:
            # 使用线程池并行处理不同的子序列
            with ThreadPoolExecutor(max_workers=self.max_workers) as executor:
                futures = []
                for start in range(gap):
                    futures.append(
                        executor.submit(self.sort_segment, arr, start, gap)
                    )
                
                # 等待所有子序列排序完成
                for future in futures:
                    future.result()
            
            gap //= 2
        
        return arr

# 使用示例
sorter = ParallelShellSort()
arr = [64, 34, 25, 12, 22, 11, 90]
print("排序前:", arr)
sorter.sort(arr)
print("排序后:", arr)
```

## 性能分析

### 时间复杂度

- 最好情况：O(n log n)，使用最优间隔序列时
- 最坏情况：O(n²)，使用不当的间隔序列时
- 平均情况：取决于间隔序列的选择

### 空间复杂度

- O(1)，原地排序算法

### 稳定性

- 不稳定排序算法
- 相等元素的相对位置可能改变

## 间隔序列选择

1. Shell序列：n/2, n/4, ..., 1
2. Hibbard序列：2^k - 1
3. Sedgewick序列：4^k + 3×2^(k-1) + 1
4. Knuth序列：(3^k - 1) / 2

## 优缺点

### 优点

1. 比插入排序更高效
2. 适合中等规模数据
3. 代码实现简单
4. 原地排序，空间复杂度低

### 缺点

1. 不稳定排序
2. 性能依赖于间隔序列选择
3. 最优间隔序列选择复杂

## 应用场景

1. 中等规模数据排序
2. 嵌入式系统
3. 硬件资源受限环境
4. 数据基本有序的情况

## 实际应用示例

### 1. 成绩排序系统

```python
class Student:
    def __init__(self, name, score, student_id):
        self.name = name
        self.score = score
        self.student_id = student_id
    
    def __str__(self):
        return f"{self.name} (ID:{self.student_id}): {self.score}"

class GradingSystem:
    def __init__(self):
        self.students = []
    
    def add_student(self, name, score, student_id):
        self.students.append(Student(name, score, student_id))
    
    def sort_by_score(self):
        n = len(self.students)
        gap = n // 2
        
        while gap > 0:
            for i in range(gap, n):
                temp = self.students[i]
                j = i
                
                while j >= gap and self.students[j - gap].score > temp.score:
                    self.students[j] = self.students[j - gap]
                    j -= gap
                
                self.students[j] = temp
            
            gap //= 2
    
    def get_top_students(self, n):
        self.sort_by_score()
        return list(reversed(self.students))[:n]
    
    def get_grade_distribution(self):
        distribution = {'A': 0, 'B': 0, 'C': 0, 'D': 0, 'F': 0}
        
        for student in self.students:
            if student.score >= 90:
                distribution['A'] += 1
            elif student.score >= 80:
                distribution['B'] += 1
            elif student.score >= 70:
                distribution['C'] += 1
            elif student.score >= 60:
                distribution['D'] += 1
            else:
                distribution['F'] += 1
        
        return distribution

# 使用示例
grading_system = GradingSystem()

# 添加学生数据
students_data = [
    ("Alice", 85, "001"),
    ("Bob", 92, "002"),
    ("Charlie", 78, "003"),
    ("David", 95, "004"),
    ("Eve", 88, "005")
]

for name, score, student_id in students_data:
    grading_system.add_student(name, score, student_id)

# 排序并显示结果
print("排序前:")
for student in grading_system.students:
    print(student)

grading_system.sort_by_score()

print("\n按成绩排序后:")
for student in grading_system.students:
    print(student)

print("\n前3名学生:")
for student in grading_system.get_top_students(3):
    print(student)

print("\n成绩分布:")
for grade, count in grading_system.get_grade_distribution().items():
    print(f"{grade}: {count}人")
```

### 2. 日志分析器

```python
from datetime import datetime

class LogEntry:
    def __init__(self, timestamp, level, message):
        self.timestamp = timestamp
        self.level = level
        self.message = message
    
    def __str__(self):
        return f"[{self.timestamp}] {self.level}: {self.message}"

class LogAnalyzer:
    def __init__(self):
        self.logs = []
        self.level_priority = {
            'ERROR': 0,
            'WARNING': 1,
            'INFO': 2,
            'DEBUG': 3
        }
    
    def add_log(self, timestamp, level, message):
        self.logs.append(LogEntry(timestamp, level, message))
    
    def sort_by_priority(self):
        n = len(self.logs)
        gap = n // 2
        
        while gap > 0:
            for i in range(gap, n):
                temp = self.logs[i]
                j = i
                
                while j >= gap and \
                      self.level_priority[self.logs[j - gap].level] > \
                      self.level_priority[temp.level]:
                    self.logs[j] = self.logs[j - gap]
                    j -= gap
                
                self.logs[j] = temp
            
            gap //= 2
    
    def sort_by_timestamp(self):
        n = len(self.logs)
        gap = n // 2
        
        while gap > 0:
            for i in range(gap, n):
                temp = self.logs[i]
                j = i
                
                while j >= gap and self.logs[j - gap].timestamp > temp.timestamp:
                    self.logs[j] = self.logs[j - gap]
                    j -= gap
                
                self.logs[j] = temp
            
            gap //= 2
    
    def get_logs_by_level(self, level):
        return [log for log in self.logs if log.level == level]
    
    def get_logs_in_timerange(self, start_time, end_time):
        return [log for log in self.logs 
                if start_time <= log.timestamp <= end_time]

# 使用示例
analyzer = LogAnalyzer()

# 添加日志数据
log_data = [
    (datetime(2023, 5, 1, 10, 30), "ERROR", "Database connection failed"),
    (datetime(2023, 5, 1, 10, 31), "WARNING", "High memory usage"),
    (datetime(2023, 5, 1, 10, 32), "INFO", "Application started"),
    (datetime(2023, 5, 1, 10, 33), "ERROR", "API request timeout"),
    (datetime(2023, 5, 1, 10, 34), "DEBUG", "Cache miss")
]

for timestamp, level, message in log_data:
    analyzer.add_log(timestamp, level, message)

# 按优先级排序
print("按日志级别排序:")
analyzer.sort_by_priority()
for log in analyzer.logs:
    print(log)

# 按时间戳排序
print("\n按时间排序:")
analyzer.sort_by_timestamp()
for log in analyzer.logs:
    print(log)

# 查看特定级别的日志
print("\nERROR级别的日志:")
for log in analyzer.get_logs_by_level("ERROR"):
    print(log)

# 查看特定时间范围的日志
start_time = datetime(2023, 5, 1, 10, 31)
end_time = datetime(2023, 5, 1, 10, 33)
print(f"\n{start_time}到{end_time}之间的日志:")
for log in analyzer.get_logs_in_timerange(start_time, end_time):
    print(log)
```

## 总结

希尔排序是一种高效的排序算