Python 入门机器学习

5 分钟上手 + 30 分钟实战 + 代码即复制即运行

一、环境准备（3 分钟）

# 1. 安装 Python（推荐 3.9+）
# 2. 安装核心库（推荐用 conda 或 pip）
pip install numpy pandas scikit-learn matplotlib jupyter

推荐工具：

Jupyter Notebook（写代码 + 看图）

VS Code / PyCharm

Google Colab（0 配置，免费 GPU）

二、机器学习第一步：认识数据

# 导入库
import pandas as pd
import matplotlib.pyplot as plt
from sklearn.datasets import load_iris

# 加载经典鸢尾花数据集（150 条，3 种花）
iris = load_iris()
X = iris.data    # 特征：花萼长度、宽度，花瓣长度、宽度
y = iris.target  # 标签：0=Setosa, 1=Versicolor, 2=Virginica

# 转成 DataFrame 看一眼
df = pd.DataFrame(X, columns=iris.feature_names)
df['species'] = y
print(df.head())

输出：

   sepal length (cm)  sepal width (cm)  petal length (cm)  petal width (cm)  species
0                5.1               3.5                1.4               0.2        0
1                4.9               3.0                1.4               0.2        0
...

三、完整流程：从数据到预测（6 步）

from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score, classification_report

# 1. 划分训练集 / 测试集（80% 训练，20% 测试）
X_train, X_test, y_train, y_test = train_test_split(
    X, y, test_size=0.2, random_state=42
)

# 2. 选择模型（决策树）
model = DecisionTreeClassifier(max_depth=3, random_state=42)

# 3. 训练模型
model.fit(X_train, y_train)

# 4. 预测
y_pred = model.predict(X_test)

# 5. 评估
print("准确率:", accuracy_score(y_test, y_pred))  # 通常 0.97+
print(classification_report(y_test, y_pred, target_names=iris.target_names))

输出：

准确率: 0.973
              precision    recall  f1-score   support
      setosa       1.00      1.00      1.00        10
  versicolor       1.00      0.92      0.96        13
   virginica       0.88      1.00      0.93         7

四、可视化：看模型怎么分

import numpy as np
import matplotlib.pyplot as plt
from sklearn.inspection import DecisionBoundaryDisplay

# 取两个特征画决策边界
X_vis = X[:, [0, 2]]  # 花萼长度 + 花瓣长度
model_vis = DecisionTreeClassifier(max_depth=3).fit(X_vis, y)

# 画图
plt.figure(figsize=(8, 6))
DecisionBoundaryDisplay.from_estimator(
    model_vis, X_vis, cmap='Pastel1', response_method="predict"
)
plt.scatter(X_vis[:, 0], X_vis[:, 1], c=y, edgecolor='k', cmap='Set1')
plt.xlabel('Sepal length')
plt.ylabel('Petal length')
plt.title('Decision Tree Decision Boundary')
plt.show()

你会看到 3 个清晰区域，模型学会了“用长度区分花”！

五、进阶：5 行代码训练 6 种模型对比

from sklearn.linear_model import LogisticRegression
from sklearn.neighbors import KNeighborsClassifier
from sklearn.svm import SVC
from sklearn.ensemble import RandomForestClassifier
from sklearn.naive_bayes import GaussianNB

models = {
    '逻辑回归': LogisticRegression(),
    'KNN': KNeighborsClassifier(),
    'SVM': SVC(),
    '决策树': DecisionTreeClassifier(),
    '随机森林': RandomForestClassifier(n_estimators=100),
    '朴素贝叶斯': GaussianNB()
}

for name, model in models.items():
    model.fit(X_train, y_train)
    acc = model.score(X_test, y_test)
    print(f"{name:10}: {acc:.3f}")

输出示例：

逻辑回归    : 1.000
KNN       : 1.000
SVM       : 1.000
决策树     : 0.973
随机森林   : 0.973
朴素贝叶斯 : 1.000

六、实战项目：预测房价（回归）

from sklearn.datasets import fetch_california_housing
from sklearn.linear_model import LinearRegression
from sklearn.metrics import mean_squared_error

# 加载加州房价数据
data = fetch_california_housing()
X, y = data.data, data.target

# 训练线性回归
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2)
model = LinearRegression()
model.fit(X_train, y_train)
pred = model.predict(X_test)

print("MSE:", mean_squared_error(y_test, pred))
print("预测第一套房价格:", pred[0], "万美元")

七、保存与加载模型（部署第一步）

import joblib

# 保存
joblib.dump(model, 'house_price_model.pkl')

# 加载使用
loaded_model = joblib.load('house_price_model.pkl')
print(loaded_model.predict(X_test[:1]))

八、学习路线图（30 天入门）

天数	目标	任务
1–3	Python 基础	变量、列表、函数、类
4–7	NumPy + Pandas	数据加载、清洗、统计
8–12	Scikit-learn	分类、回归、评估
13–18	可视化	Matplotlib、Seaborn
19–25	Kaggle 实战	Titanic / House Prices
26–30	部署	Flask / Streamlit 小网页

九、推荐资源（免费）

类型	资源
课程	Andrew Ng – Machine Learning
书籍	《Python 数据科学手册》
平台	Kaggle（数据 + 比赛）
工具	Google Colab（免费 GPU）

你现在可以做什么？

复制下面完整代码，保存为 ml_start.py，直接运行：

# ===== 机器学习 5 分钟入门完整代码 =====
from sklearn.datasets import load_iris
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import accuracy_score
import matplotlib.pyplot as plt
from sklearn.inspection import DecisionBoundaryDisplay

# 1. 数据
iris = load_iris()
X, y = iris.data[:, [0, 2]], iris.target  # 只用两个特征方便画图

# 2. 分割
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# 3. 模型
model = DecisionTreeClassifier(max_depth=3)
model.fit(X_train, y_train)

# 4. 预测与评估
pred = model.predict(X_test)
print("准确率:", accuracy_score(y_test, pred))

# 5. 可视化决策边界
plt.figure(figsize=(8,6))
DecisionBoundaryDisplay.from_estimator(model, X, cmap='Pastel1')
plt.scatter(X[:, 0], X[:, 1], c=y, edgecolor='k', cmap='Set1')
plt.xlabel('Sepal length')
plt.ylabel('Petal length')
plt.title('Iris Classification - Decision Tree')
plt.show()

下一步想学什么？

[ ] 用 神经网络 识别手写数字（MNIST）
[ ] 做 Kaggle 比赛（泰坦尼克号生存预测）
[ ] 把模型部署成网页（Streamlit）
[ ] 学习 深度学习（PyTorch）

回复数字 1–4，我立刻带你实战！

一、环境准备（3 分钟）

二、机器学习第一步：认识数据

三、完整流程：从数据到预测（6 步）

四、可视化：看模型怎么分

五、进阶：5 行代码训练 6 种模型对比

六、实战项目：预测房价（回归）

七、保存与加载模型（部署第一步）

八、学习路线图（30 天入门）

九、推荐资源（免费）

你现在可以做什么？

likuolei

发表回复取消回复

归档

分类

2025 年 11 月
一	二	三	四	五	六	日
					1	2
3	4	5	6	7	8	9
10	11	12	13	14	15	16
17	18	19	20	21	22	23
24	25	26	27	28	29	30

一、环境准备（3 分钟）

二、机器学习第一步：认识数据

三、完整流程：从数据到预测（6 步）

四、可视化：看模型怎么分

五、进阶：5 行代码训练 6 种模型对比

六、实战项目：预测房价（回归）

七、保存与加载模型（部署第一步）

八、学习路线图（30 天入门）

九、推荐资源（免费）

你现在可以做什么？

likuolei

发表回复 取消回复

相关文章

发表回复取消回复