Ruby 正则表达式

在 Ruby 中，正则表达式（Regular Expression，简称 regex） 是一种用于匹配、查找和处理字符串的强大工具。Ruby 通过 Regexp 类实现正则表达式，结合字符串方法（如 match, gsub）和操作符（如 =~），可以高效地进行模式匹配和文本处理。以下是对 Ruby 正则表达式的中文讲解，涵盖创建、匹配、常见用法及注意事项，力求简洁清晰。

1. 创建正则表达式

Ruby 使用 /pattern/ 或 Regexp.new 创建正则表达式。

字面量：用斜杠 / 包裹正则模式。

  regex = /ruby/
  puts regex.class  # 输出：Regexp

Regexp.new：动态构建正则表达式。

  regex = Regexp.new("ruby")

修饰符：在 / 后添加修饰符，常用：
i：忽略大小写。
m：多行模式，. 匹配换行符。

  regex = /ruby/i  # 忽略大小写
  puts "RUBY" =~ regex  # 输出：0（匹配成功）

2. 基本正则表达式语法

以下是常用的正则表达式模式：

字符：直接匹配（如 /cat/ 匹配 “cat”）。
元字符：
.：匹配任意字符（除换行符，m 模式除外）。
*：匹配前一模式 0 次或多次。
+：匹配前一模式 1 次或多次。
?：匹配前一模式 0 次或 1 次。
|：或运算（如 /cat|dog/ 匹配 “cat” 或 “dog”）。
^：匹配字符串开头。
$：匹配字符串结尾。
\b：单词边界。
字符类：
[abc]：匹配 a, b, c 中的任意字符。
[^abc]：匹配除 a, b, c 外的字符。
[a-z]：匹配小写字母范围。
\d：匹配数字（等价于 [0-9]）。
\w：匹配单词字符（字母、数字、下划线）。
\s：匹配空白字符（空格、制表符等）。
分组：用 () 创建捕获组。

  regex = /(\w+)@(\w+)\.com/

量词：
{n}：精确匹配 n 次。
{n,}：匹配 n 次或更多。
{n,m}：匹配 n 到 m 次。

3. 匹配字符串

Ruby 提供多种方法和操作符进行正则匹配。

=~ 操作符

返回匹配的起始位置（索引），或 nil（不匹配）。

str = "Hello, Ruby!"
puts /Ruby/ =~ str  # 输出：7
puts /Python/ =~ str  # 输出：nil

match 方法

返回 MatchData 对象，包含匹配信息。

str = "Hello, Ruby!"
match = /Ruby/.match(str)
puts match[0]  # 输出：Ruby（完整匹配）
puts match.pre_match   # 输出：Hello, （匹配前部分）
puts match.post_match  # 输出：!（匹配后部分）

match? 方法

仅返回布尔值，性能更高（不创建 MatchData）。

puts /Ruby/.match?("Hello, Ruby!")  # 输出：true

捕获组

使用 () 捕获子模式，访问通过 MatchData 的索引或命名捕获。

str = "Email: alice@example.com"
match = /(\w+)@(\w+)\.com/.match(str)
puts match[1]  # 输出：alice
puts match[2]  # 输出：example

# 命名捕获
match = /(?<user>\w+)@(?<domain>\w+)\.com/.match(str)
puts match[:user]    # 输出：alice
puts match[:domain]  # 输出：example

4. 替换与分割

正则表达式常用于字符串替换和分割。

gsub / sub：替换匹配内容（gsub 全局替换，sub 仅替换第一次）。

  str = "I love ruby and RUBY!"
  puts str.gsub(/ruby/i, "Python")  # 输出：I love Python and Python!
  puts str.sub(/ruby/i, "Python")   # 输出：I love Python and RUBY!

split：按正则模式分割字符串。

  str = "apple,banana,cherry"
  puts str.split(/,/)  # 输出：["apple", "banana", "cherry"]

5. 正则表达式与迭代

结合 scan 方法，迭代匹配的子字符串。

str = "apple banana cherry"
str.scan(/\w+/) { |word| puts word }
# 输出：
# apple
# banana
# cherry

6. 正则表达式修饰符

i：忽略大小写。

  puts /ruby/i.match?("RUBY")  # 输出：true

m：多行模式，. 匹配换行符。

  str = "line1\nline2"
  puts str[/line./m]  # 输出：line1\n

x：扩展模式，忽略空格和注释。

  regex = /
    \d+  # 匹配数字
    \s+  # 匹配空白
    \w+  # 匹配单词
  /x
  puts regex.match("123 abc")  # 输出：123 abc

7. 常见正则表达式示例

邮箱验证：

  regex = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i
  puts regex.match?("alice@example.com")  # 输出：true

电话号码：

  regex = /\A\d{3}-\d{3}-\d{4}\z/
  puts regex.match?("123-456-7890")  # 输出：true

提取数字：

  str = "Price: $99.99"
  puts str.scan(/\d+\.\d{2}/)  # 输出：["99.99"]

8. 性能优化

预编译正则：频繁使用的正则表达式应保存为变量，避免重复编译。

  EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i
  puts EMAIL_REGEX.match?("alice@example.com")  # 更高效

match? 优先：当只需布尔值时，使用 match? 替代 match，减少开销。
锚点：使用 ^ 和 $（或 \A 和 \z）限制匹配范围，提升性能。

9. 注意事项

编码：正则表达式默认使用字符串的编码（通常 UTF-8），处理多字节字符（如中文）需谨慎。

  str = "你好 Ruby"
  puts str[/\w+/]  # 输出：Ruby（\w 不匹配中文）
  puts str[/[\p{Han}]+/]  # 输出：你好（匹配中文）

贪婪与非贪婪：
默认贪婪：*, + 匹配尽可能多的字符。
非贪婪：*?, +? 匹配尽可能少的字符。

  str = "<b>text</b>"
  puts str[/<b>.+<\/b>/]   # 输出：<b>text</b>（贪婪）
  puts str[/<b>.+?<\/b>/]  # 输出：<b>text</b>（非贪婪）

异常：正则表达式语法错误会抛出 RegexpError。

  begin
    Regexp.new("[a-z")  # 未闭合括号
  rescue RegexpError => e
    puts e.message  # 输出：unterminated string meets end of file
  end

边界：\A 和 \z 比 ^ 和 $ 更严格，适合精确匹配。

  str = "ruby\nruby"
  puts str[/^ruby/]   # 输出：ruby（匹配第一行）
  puts str[/\Aruby/]  # 输出：ruby（仅匹配字符串开头）

10. 示例：综合应用

module TextProcessor
  EMAIL_REGEX = /\A[\w+\-.]+@[a-z\d\-]+(\.[a-z]+)*\.[a-z]+\z/i

  def self.extract_emails(text)
    text.scan(EMAIL_REGEX)
  end
end

text = "Contact: alice@example.com, bob@company.org"
puts TextProcessor.extract_emails(text)  # 输出：["alice@example.com", "bob@company.org"]

# 替换电话号码
str = "Call me at 123-456-7890 or 987-654-3210"
puts str.gsub(/\d{3}-\d{3}-\d{4}/, "XXX-XXX-XXXX")
# 输出：Call me at XXX-XXX-XXXX or XXX-XXX-XXXX

11. 总结

Ruby 的正则表达式通过 Regexp 类和字符串方法（如 match, gsub, scan）提供强大的文本处理能力。支持丰富的模式（如字符类、分组、量词）和修饰符（如 i, m），适用于匹配、替换和分割等场景。注意编码、贪婪模式和性能优化（如预编译和 match?），可以提升代码效率和准确性。

如果你有具体问题或需要更详细的示例，请告诉我！

1. 创建正则表达式

2. 基本正则表达式语法

3. 匹配字符串

=~ 操作符

match 方法

match? 方法

捕获组

4. 替换与分割

5. 正则表达式与迭代

6. 正则表达式修饰符

7. 常见正则表达式示例

8. 性能优化

9. 注意事项

10. 示例：综合应用

11. 总结

likuolei

发表回复取消回复

归档

分类

2026 年 2 月
一	二	三	四	五	六	日
						1
2	3	4	5	6	7	8
9	10	11	12	13	14	15
16	17	18	19	20	21	22
23	24	25	26	27	28

1. 创建正则表达式

2. 基本正则表达式语法

3. 匹配字符串

=~ 操作符

match 方法

match? 方法

捕获组

4. 替换与分割

5. 正则表达式与迭代

6. 正则表达式修饰符

7. 常见正则表达式示例

8. 性能优化

9. 注意事项

10. 示例：综合应用

11. 总结

likuolei

发表回复 取消回复

相关文章

发表回复取消回复