Pandas Series 数据结构详解

1. Series 概述

Series 是 Pandas 的一维带标签数组，类似于带名字的列或字典：

数据：任意类型（数值、字符串、Python 对象等）
索引：整数、字符串、时间戳等标签
单一数据类型：每个 Series 元素类型一致（底层基于 NumPy）

import pandas as pd
import numpy as np

# 基本创建
s = pd.Series([1, 3, 5, np.nan, 6, 8])
print(s)

2. 创建 Series

2.1 从列表或数组创建

# 整数索引（默认）
s1 = pd.Series([10, 20, 30, 40])
print(s1)
# 0    10
# 1    20
# 2    30
# 3    40

# 自定义索引
s2 = pd.Series([10, 20, 30, 40], index=['a', 'b', 'c', 'd'])
print(s2)
# a    10
# b    20
# c    30
# d    40

2.2 从字典创建

# 字典自动成为索引
data = {'a': 1, 'b': 2, 'c': 3}
s3 = pd.Series(data)
print(s3)
# a    1
# b    2
# c    3

2.3 从标量创建

# 标量广播到所有索引
s4 = pd.Series(5, index=['a', 'b', 'c'])
print(s4)
# a    5
# b    5
# c    5

2.4 指定数据类型

# 强制类型转换
s5 = pd.Series([1, 2, 3], dtype='float32')
s6 = pd.Series(['a', 'b', 'c'], dtype='category')
print(s5.dtype)  # float32
print(s6.dtype)  # category

3. Series 属性和方法

3.1 核心属性

s = pd.Series([1, 3, 5, np.nan, 6, 8], index=['a', 'b', 'c', 'd', 'e', 'f'])

print(f"值: {s.values}")           # numpy 数组
print(f"索引: {s.index}")           # Index 对象
print(f"数据类型: {s.dtype}")       # dtype
print(f"形状: {s.shape}")           # 元组 (n,)
print(f"大小: {s.size}")            # 元素个数
print(f"索引名称: {s.index.name}")  # 索引名
print(f"是否有 NaN: {s.hasnans}")   # bool

3.2 基本信息

print(f"长度: {len(s)}")
print(f"空值数量: {s.isnull().sum()}")
print(f"非空数量: {s.notnull().sum()}")

4. 索引和切片

4.1 位置索引（iloc）

s = pd.Series([10, 20, 30, 40, 50], index=['a', 'b', 'c', 'd', 'e'])

# 按位置访问
print(s.iloc[0])     # 10 (第一个元素)
print(s.iloc[1:3])   # Series([20, 30])
print(s.iloc[[0, 2, 4]])  # 多位置选择

4.2 标签索引（loc）

# 按标签访问
print(s.loc['a'])      # 10
print(s.loc['b':'d'])  # Series(['b':20, 'c':30, 'd':40])
print(s.loc[['a', 'c', 'e']])

4.3 混合索引（混合使用）

# 布尔索引
print(s[s > 25])      # 选择大于25的元素

# 条件索引
mask = s.index.str.startswith('a')
print(s[mask])

4.4 Fancy Indexing

# 多维索引
indices = ['a', 'c', 'e']
print(s.loc[indices])
print(s.iloc[[0, 2, 4]])

5. 数据操作

5.1 赋值和修改

s = pd.Series([1, 2, 3, 4], index=['a', 'b', 'c', 'd'])

# 单个元素赋值
s.loc['a'] = 10
s.iloc[1] = 20

# 批量赋值
s.loc[['b', 'c']] = [200, 300]
s.iloc[0:2] = [100, 200]

print(s)

5.2 数学运算

s1 = pd.Series([1, 2, 3], index=['a', 'b', 'c'])
s2 = pd.Series([4, 5, 6], index=['a', 'b', 'd'])

# 对齐运算（按索引对齐）
print(s1 + s2)
# a    5.0
# b    7.0
# c    NaN
# d    NaN

# 标量运算
print(s1 * 2)

5.3 统计方法

s = pd.Series([1, 3, 5, np.nan, 6, 8])

print(f"平均值: {s.mean()}")        # 4.6
print(f"中位数: {s.median()}")      # 5.0
print(f"标准差: {s.std()}")
print(f"最小值: {s.min()}")
print(f"最大值: {s.max()}")
print(f"分位数: {s.quantile(0.75)}")

# 描述性统计
print(s.describe())

6. 缺失数据处理

6.1 检测缺失值

s = pd.Series([1, np.nan, 3, np.nan, 5])

print(s.isnull())      # True/False 数组
print(s.notnull())     # 反向
print(s.isna().sum())  # 缺失值数量

6.2 填充缺失值

# 前向填充
s_filled = s.fillna(method='ffill')

# 后向填充
s_bfill = s.fillna(method='bfill')

# 指定值填充
s_value = s.fillna(0)

# 均值填充
s_mean = s.fillna(s.mean())

# 删除缺失值
s_drop = s.dropna()

6.3 插值

# 线性插值
s_interp = s.interpolate(method='linear')

7. 排序和唯一值

7.1 排序

s = pd.Series([3, 1, 4, 1, 5], index=['d', 'a', 'b', 'c', 'e'])

# 按值排序
s_sorted = s.sort_values()
print(s_sorted)

# 按索引排序
s_idx_sorted = s.sort_index()
print(s_idx_sorted)

# 降序
s_desc = s.sort_values(ascending=False)

7.2 唯一值和计数

s = pd.Series(['apple', 'banana', 'apple', 'cherry', 'banana'])

print(s.unique())           # 唯一值数组
print(s.value_counts())     # 频数统计
print(s.nunique())          # 唯一值数量

8. 字符串操作

s = pd.Series(['Apple', 'Banana', 'Cherry', 'apple'])

# 字符串方法（需访问 .str）
print(s.str.lower())       # 转小写
print(s.str.upper())       # 转大写
print(s.str.len())         # 字符串长度
print(s.str.startswith('A'))  # 以A开头
print(s.str.contains('app'))  # 包含子串

9. 时间序列 Series

# 创建日期索引
dates = pd.date_range('20230101', periods=6)
ts = pd.Series(np.random.randn(6), index=dates)

print(ts)
print(ts.index)  # DatetimeIndex

# 时间操作
print(ts['2023-01-02'])     # 按日期访问
print(ts['2023-01'])        # 按月访问

10. 与 NumPy 的关系

# Series 是 NumPy 数组的增强版
s = pd.Series(np.array([1, 2, 3]))

print(type(s.values))  # <class 'numpy.ndarray'>
print(s.values + 1)    # NumPy 向量化运算

# 通用函数兼容
print(np.sin(s))
print(np.exp(s))

11. 实际应用示例

11.1 销售数据分析

# 创建销售数据
sales = pd.Series({
    '2023-01': 1000,
    '2023-02': 1200,
    '2023-03': 900,
    '2023-04': 1500,
    '2023-05': np.nan,
    '2023-06': 1800
}, dtype=float)

# 数据分析
print(f"总销售额: {sales.sum()}")
print(f"平均销售额: {sales.mean()}")
print(f"最大月: {sales.idxmax()}")

# 缺失值处理
sales_filled = sales.fillna(method='ffill')
print(f"填补后总销售额: {sales_filled.sum()}")

11.2 学生成绩分析

scores = pd.Series([85, 92, 78, 95, 88, np.nan, 91], 
                   index=['张三', '李四', '王五', '赵六', '孙七', '周八', '吴九'])

# 及格率
pass_rate = (scores >= 60).mean()
print(f"及格率: {pass_rate:.1%}")

# 排名
scores_ranked = scores.rank(ascending=False)
print("成绩排名:\n", scores_ranked)

# 填补缺失值（用平均分）
scores_filled = scores.fillna(scores.mean())
print(f"平均分: {scores_filled.mean():.1f}")

12. 性能注意事项

# 避免链式索引（性能差）
s = pd.Series([1, 2, 3])
# 差：s['a'][0] = 10  # SettingWithCopyWarning
# 好：s.loc['a'] = 10

# 大数据量操作
large_s = pd.Series(np.random.randn(1000000))
%timeit large_s.mean()  # 向量化操作很快

13. 常见错误和解决方案

13.1 SettingWithCopyWarning

# 错误写法
df = pd.DataFrame({'A': [1, 2]})
view = df[df['A'] > 1]
view['A'] = 10  # Warning!

# 正确写法
df.loc[df['A'] > 1, 'A'] = 10
# 或
df_copy = df.copy()
df_copy.loc[df_copy['A'] > 1, 'A'] = 10

Series 是 Pandas 数据处理的基础，掌握其索引、操作和方法是学习 DataFrame 的前提。通过练习不同场景的创建和操作，可以快速上手 Pandas 的核心功能。

Pandas 数据结构 – Series

Pandas Series 数据结构详解

1. Series 概述

2. 创建 Series

2.1 从列表或数组创建

2.2 从字典创建

2.3 从标量创建

2.4 指定数据类型

3. Series 属性和方法

3.1 核心属性

3.2 基本信息

4. 索引和切片

4.1 位置索引（iloc）

4.2 标签索引（loc）

4.3 混合索引（混合使用）

4.4 Fancy Indexing

5. 数据操作

5.1 赋值和修改

5.2 数学运算

5.3 统计方法

6. 缺失数据处理

6.1 检测缺失值

6.2 填充缺失值

6.3 插值

7. 排序和唯一值

7.1 排序

7.2 唯一值和计数

8. 字符串操作

9. 时间序列 Series

10. 与 NumPy 的关系

11. 实际应用示例

11.1 销售数据分析

11.2 学生成绩分析

12. 性能注意事项

13. 常见错误和解决方案

13.1 SettingWithCopyWarning

likuolei

发表回复取消回复

2026 年 6 月
一	二	三	四	五	六	日
1	2	3	4	5	6	7
8	9	10	11	12	13	14
15	16	17	18	19	20	21
22	23	24	25	26	27	28
29	30

Pandas Series 数据结构详解

1. Series 概述

2. 创建 Series

2.1 从列表或数组创建

2.2 从字典创建

2.3 从标量创建

2.4 指定数据类型

3. Series 属性和方法

3.1 核心属性

3.2 基本信息

4. 索引和切片

4.1 位置索引（iloc）

4.2 标签索引（loc）

4.3 混合索引（混合使用）

4.4 Fancy Indexing

5. 数据操作

5.1 赋值和修改

5.2 数学运算

5.3 统计方法

6. 缺失数据处理

6.1 检测缺失值

6.2 填充缺失值

6.3 插值

7. 排序和唯一值

7.1 排序

7.2 唯一值和计数

8. 字符串操作

9. 时间序列 Series

10. 与 NumPy 的关系

11. 实际应用示例

11.1 销售数据分析

11.2 学生成绩分析

12. 性能注意事项

13. 常见错误和解决方案

13.1 SettingWithCopyWarning

likuolei

发表回复 取消回复

相关文章

发表回复取消回复