第八章：文件操作与 with 语句

本章学习目标

掌握文件的打开、读取、写入操作
深入理解 with 语句和上下文管理器
学会处理不同编码的文本文件
掌握 JSON 文件的读写操作
熟练使用 pathlib 处理文件路径

8.1 文件基础

8.1.1 什么是文件

文件是存储在磁盘上的数据集合。在 Python 中，文件被视为一种对象，我们可以打开、读取、写入和关闭文件。

文件主要分为两种类型：

文本文件：以文本格式存储，可以用文本编辑器打开
二进制文件：以二进制格式存储，如图片、音频、视频等

8.1.2 打开文件

使用 open() 函数打开文件：

# 基本语法
file = open("filename.txt", "r", encoding="utf-8")
# file: 文件对象
# "filename.txt": 文件名
# "r": 打开模式
# "utf-8": 字符编码

文件打开模式：

模式	说明	文件不存在时
`r`	只读（默认）	抛出 FileNotFoundError
`w`	只写（覆盖）	创建新文件
`a`	追加	创建新文件
`x`	排他创建	抛出 FileExistsError
`r+`	读写	抛出 FileNotFoundError
`w+`	读写（覆盖）	创建新文件
`a+`	读写（追加）	创建新文件

二进制模式： 在模式后加 b（如 rb、wb）

8.1.3 关闭文件

# 手动关闭
file = open("example.txt", "r")
content = file.read()
file.close()  # 重要：记得关闭文件

# 问题：如果读取过程中发生异常，文件不会关闭

8.1.4 推荐方式：with 语句

with 语句确保文件在使用后正确关闭，即使发生异常：

# 使用 with 语句
with open("example.txt", "r", encoding="utf-8") as file:
    content = file.read()
# 文件自动关闭

# 更安全、更简洁

8.2 读取文件

8.2.1 read() 方法

# 读取整个文件
with open("example.txt", "r", encoding="utf-8") as file:
    content = file.read()
    print(content)

# 读取指定字节数
with open("example.txt", "r", encoding="utf-8") as file:
    content = file.read(100)  # 读取前 100 个字符

8.2.2 readline() 方法

# 读取一行
with open("example.txt", "r", encoding="utf-8") as file:
    line = file.readline()
    print(line)

# 循环读取所有行
with open("example.txt", "r", encoding="utf-8") as file:
    while True:
        line = file.readline()
        if not line:
            break
        print(line.rstrip())  # 去除换行符

8.2.3 readlines() 方法

# 读取所有行到列表
with open("example.txt", "r", encoding="utf-8") as file:
    lines = file.readlines()
    print(lines)  # ['第一行\n', '第二行\n', '第三行']

# 去除换行符
with open("example.txt", "r", encoding="utf-8") as file:
    lines = [line.rstrip() for line in file]
    print(lines)

8.2.4 遍历文件对象

最常用的读取方式：

# 直接遍历（推荐）
with open("example.txt", "r", encoding="utf-8") as file:
    for line in file:
        print(line.rstrip())

# enumerate 添加行号
with open("example.txt", "r", encoding="utf-8") as file:
    for i, line in enumerate(file, 1):
        print(f"{i}: {line.rstrip()}")

8.2.5 二进制文件读取

# 读取二进制文件
with open("image.png", "rb") as file:
    data = file.read()
    print(type(data))  # <class 'bytes'>

# 读取图片并显示大小
with open("image.png", "rb") as file:
    data = file.read()
    print(f"图片大小: {len(data)} 字节")

8.3 写入文件

8.3.1 write() 方法

# 写入文本文件
with open("output.txt", "w", encoding="utf-8") as file:
    file.write("第一行\n")
    file.write("第二行\n")
    file.write("第三行")

# 注意：write 不会自动添加换行符

8.3.2 writelines() 方法

# 写入多行
lines = ["第一行\n", "第二行\n", "第三行\n"]

with open("output.txt", "w", encoding="utf-8") as file:
    file.writelines(lines)

# 从列表写入
lines = ["line1\n", "line2\n", "line3\n"]
with open("output.txt", "w") as file:
    file.writelines(lines)

8.3.3 追加模式

# 追加到文件末尾
with open("log.txt", "a", encoding="utf-8") as file:
    file.write("新的日志条目\n")

# 追加多行
with open("log.txt", "a", encoding="utf-8") as file:
    lines = ["entry1\n", "entry2\n"]
    file.writelines(lines)

8.3.4 写入二进制文件

# 写入二进制数据
data = b"\x89PNG\r\n\x1a\n"

with open("output.bin", "wb") as file:
    file.write(data)

# 复制二进制文件
with open("source.png", "rb") as src:
    with open("dest.png", "wb") as dst:
        dst.write(src.read())

8.4 with 语句与上下文管理器

8.4.1 什么是上下文管理器

上下文管理器是一种协议，定义了对象如何与 with 语句配合使用。它确保资源在使用后正确清理，即使发生异常。

# with 语句的基本语法
with expression as variable:
    # 使用 variable
    pass

8.4.2 文件是最常见的上下文管理器

# 自动关闭
with open("file.txt", "r") as f:
    data = f.read()
# f 已自动关闭

# 即使发生异常也会关闭
try:
    with open("file.txt", "r") as f:
        data = f.read()
        raise ValueError("模拟错误")
except ValueError:
    pass
# 文件仍然正确关闭

8.4.3 自定义上下文管理器

class FileManager:
    """文件上下文管理器"""

    def __init__(self, filename: str, mode: str) -> None:
        self.filename = filename
        self.mode = mode
        self.file = None

    def __enter__(self) -> object:
        """进入上下文时调用"""
        self.file = open(self.filename, self.mode)
        return self.file

    def __exit__(self, exc_type, exc_val, exc_tb) -> bool:
        """退出上下文时调用"""
        if self.file:
            self.file.close()
        # 返回 True 抑制异常
        return False


# 使用自定义上下文管理器
with FileManager("test.txt", "w") as f:
    f.write("Hello, World!")

# 文件已自动关闭

8.4.4 使用 contextlib

Python 的 contextlib 模块提供了更简洁的创建上下文管理器的方式：

from contextlib import contextmanager

@contextmanager
def file_manager(filename: str, mode: str):
    """使用生成器创建上下文管理器"""
    f = open(filename, mode)
    try:
        yield f
    finally:
        f.close()


# 使用
with file_manager("test.txt", "w") as f:
    f.write("Hello!")

8.4.5 其他上下文管理器

# 锁
from threading import Lock

lock = Lock()
with lock:
    # 临界区代码
    pass

# 临时目录
import tempfile
with tempfile.TemporaryDirectory() as tmpdir:
    # 临时目录中的代码
    pass

# 压缩文件
import gzip
with gzip.open("file.txt.gz", "wt") as f:
    f.write("压缩内容")

8.5 文件指针操作

8.5.1 文件指针基础

文件指针表示当前读写位置：

with open("example.txt", "r") as f:
    # 获取当前位置
    pos = f.tell()
    print(f"当前位置: {pos}")

    # 读取 5 个字符
    data = f.read(5)
    print(f"读取: {data}")

    # 再次获取位置
    pos = f.tell()
    print(f"当前位置: {pos}")

    # 移动文件指针
    f.seek(0)  # 回到开头
    f.seek(0, 2)  # 移到末尾 (从开头偏移 0)
    f.seek(5)  # 移到第 5 个字节

8.5.2 seek() 方法

# seek(offset, whence)
# whence: 0=开头, 1=当前位置, 2=末尾

with open("example.txt", "rb") as f:
    # 从开头偏移
    f.seek(10)

    # 从当前位置偏移
    f.seek(5, 1)

    # 从末尾偏移（负数）
    f.seek(-10, 2)

8.6 字符编码

8.6.1 编码基础

编码是将字符转换为字节的方式。常见的编码：

UTF-8：Unicode 的一种变长编码，兼容 ASCII
GBK：中文 Windows 默认编码
GB2312：简体中文编码
ISO-8859-1：拉丁字母编码

8.6.2 指定编码读写

# 读取 UTF-8 文件（默认）
with open("file.txt", "r", encoding="utf-8") as f:
    content = f.read()

# 读取 GBK 编码文件
with open("file.txt", "r", encoding="gbk") as f:
    content = f.read()

# 写入 UTF-8 文件
with open("file.txt", "w", encoding="utf-8") as f:
    f.write("你好，世界！")

# 写入 GBK 编码
with open("file.txt", "w", encoding="gbk") as f:
    f.write("你好")

8.6.3 处理编码错误

# 忽略错误
with open("file.txt", "r", encoding="utf-8", errors="ignore") as f:
    content = f.read()

# 替换错误字符
with open("file.txt", "r", encoding="utf-8", errors="replace") as f:
    content = f.read()

# 使用转义
with open("file.txt", "r", encoding="utf-8", errors="backslashreplace") as f:
    content = f.read()

8.6.4 检测文件编码

# 使用 chardet 库
import chardet

with open("file.txt", "rb") as f:
    raw_data = f.read()
    result = chardet.detect(raw_data)
    print(result)  # {'encoding': 'utf-8', 'confidence': 0.99}
    encoding = result['encoding']

# 使用检测到的编码读取
with open("file.txt", "r", encoding=encoding) as f:
    content = f.read()

8.7 JSON 文件操作

8.7.1 JSON 基础

JSON（JavaScript Object Notation）是一种轻量级的数据交换格式。

Python 与 JSON 的对应关系：

Python	JSON
`dict`	`object`
`list`, `tuple`	`array`
`str`	`string`
`int`, `float`	`number`
`True`/`False`	`true`/`false`
`None`	`null`

8.7.2 写入 JSON

import json

# Python 字典
data = {
    "name": "Alice",
    "age": 25,
    "city": "Beijing",
    "scores": [90, 85, 92],
    "active": True,
    "email": None
}

# 转为 JSON 字符串
json_str = json.dumps(data)
print(json_str)

# 格式化输出
json_str = json.dumps(data, indent=4, ensure_ascii=False)
print(json_str)

# 写入文件
with open("data.json", "w", encoding="utf-8") as f:
    json.dump(data, f, indent=2, ensure_ascii=False)

8.7.3 读取 JSON

import json

# 从字符串解析
json_str = '{"name": "Alice", "age": 25}'
data = json.loads(json_str)
print(data)

# 从文件读取
with open("data.json", "r", encoding="utf-8") as f:
    data = json.load(f)
    print(data)

8.7.4 自定义 JSON 编码

import json
from datetime import datetime

# 自定义编码器
class CustomEncoder(json.JSONEncoder):
    def default(self, obj):
        if isinstance(obj, datetime):
            return obj.isoformat()
        if isinstance(obj, set):
            return list(obj)
        return super().default(obj)

# 使用
data = {
    "created_at": datetime.now(),
    "tags": {"python", "json"},
    "name": "Alice"
}

json_str = json.dumps(data, cls=CustomEncoder, ensure_ascii=False)
print(json_str)

# 或者使用 default 参数
def custom_default(obj):
    if isinstance(obj, datetime):
        return obj.isoformat()
    raise TypeError(f"Object of type {type(obj)} is not JSON serializable")

json_str = json.dumps(data, default=custom_default, ensure_ascii=False)

8.7.5 自定义 JSON 解码

import json
from datetime import datetime

# 自定义解码器
def custom_object_hook(d):
    if "created_at" in d:
        d["created_at"] = datetime.fromisoformat(d["created_at"])
    return d

# 使用
json_str = '{"name": "Alice", "created_at": "2024-01-15T10:30:00"}'
data = json.loads(json_str, object_hook=custom_object_hook)
print(data)

8.8 pathlib 模块

8.8.1 pathlib 简介

pathlib 是 Python 3.4+ 提供的面向对象的路径处理模块，比 os.path 更直观：

from pathlib import Path

# 创建路径对象
p = Path("example.txt")
p = Path("/home/user/documents")
p = Path(".") / "subdir" / "file.txt"

8.8.2 路径操作

from pathlib import Path

p = Path("/home/user/documents/report.txt")

# 路径组件
print(p.name)        # report.txt (文件名)
print(p.stem)        # report (不含扩展名)
print(p.suffix)      # .txt (扩展名)
print(p.parent)       # /home/user/documents
print(p.parents)      # 所有父目录
print(p.anchor)       # /

# 路径拼接
Path("dir") / "file.txt"
Path("dir") / Path("subdir/file.txt")

# 路径属性
p.is_file()          # 是否是文件
p.is_dir()           # 是否是目录
p.is_absolute()      # 是否是绝对路径
p.exists()           # 是否存在

8.8.3 读取和写入文件

from pathlib import Path

# 读取文件
p = Path("example.txt")
content = p.read_text(encoding="utf-8")

# 写入文件
p.write_text("Hello, World!", encoding="utf-8")

# 读取二进制
data = p.read_bytes()

# 写入二进制
p.write_bytes(b"\x00\x01\x02")

8.8.4 目录操作

from pathlib import Path

# 创建目录
Path("new_dir").mkdir(exist_ok=True)
Path("a/b/c").mkdir(parents=True, exist_ok=True)

# 列出目录内容
for p in Path(".").iterdir():
    print(p.name)

# 递归列出
for p in Path(".").rglob("*.py"):
    print(p)

# glob 模式匹配
list(Path(".").glob("*.txt"))
list(Path(".").glob("**/*.py"))  # 递归

8.8.5 文件信息

from pathlib import Path
from datetime import datetime

p = Path("example.txt")

# 文件属性
p.stat().st_size      # 文件大小
p.stat().st_mtime     # 修改时间

# 便捷方法
p.stat().st_size      # bytes
p.stat().st_mtime     # 时间戳

# 转换为 datetime
modified = datetime.fromtimestamp(p.stat().st_mtime)
print(f"修改时间: {modified}")

8.9 综合示例

示例 1：复制文件

import shutil
from pathlib import Path

def copy_file(src: str, dst: str) -> None:
    """复制文件"""
    src_path = Path(src)
    dst_path = Path(dst)

    if not src_path.exists():
        raise FileNotFoundError(f"源文件不存在: {src}")

    # 创建目标目录
    dst_path.parent.mkdir(parents=True, exist_ok=True)

    # 复制文件
    shutil.copy2(src, dst)
    print(f"已复制: {src} -> {dst}")

# 使用
copy_file("source.txt", "backup/source.txt")

示例 2：统计文件行数

from pathlib import Path

def count_lines(filepath: str) -> dict[str, int]:
    """统计文件的行数、单词数、字符数"""
    path = Path(filepath)

    if not path.exists():
        raise FileNotFoundError(f"文件不存在: {filepath}")

    lines = 0
    words = 0
    chars = 0

    with open(path, "r", encoding="utf-8") as f:
        for line in f:
            lines += 1
            words += len(line.split())
            chars += len(line)

    return {
        "lines": lines,
        "words": words,
        "chars": chars
    }

# 使用
result = count_lines("example.txt")
print(f"行数: {result['lines']}")
print(f"单词数: {result['words']}")
print(f"字符数: {result['chars']}")

示例 3：配置文件读写

import json
from pathlib import Path

class Config:
    """简单的配置管理类"""

    def __init__(self, config_file: str = "config.json") -> None:
        self.config_file = Path(config_file)
        self.config: dict = {}
        self.load()

    def load(self) -> None:
        """加载配置"""
        if self.config_file.exists():
            with open(self.config_file, "r", encoding="utf-8") as f:
                self.config = json.load(f)
        else:
            self.config = self.get_default_config()

    def save(self) -> None:
        """保存配置"""
        with open(self.config_file, "w", encoding="utf-8") as f:
            json.dump(self.config, f, indent=2, ensure_ascii=False)

    def get(self, key: str, default=None):
        return self.config.get(key, default)

    def set(self, key: str, value) -> None:
        self.config[key] = value

    @staticmethod
    def get_default_config() -> dict:
        return {
            "theme": "light",
            "language": "zh-CN",
            "font_size": 14,
            "auto_save": True
        }

# 使用
config = Config("settings.json")
print(config.get("theme"))
config.set("theme", "dark")
config.save()

示例 4：日志文件处理

from datetime import datetime
from pathlib import Path

class SimpleLogger:
    """简单的日志记录器"""

    def __init__(self, log_file: str = "app.log") -> None:
        self.log_file = Path(log_file)

    def _write(self, level: str, message: str) -> None:
        timestamp = datetime.now().strftime("%Y-%m-%d %H:%M:%S")
        log_line = f"[{timestamp}] [{level}] {message}\n"

        with open(self.log_file, "a", encoding="utf-8") as f:
            f.write(log_line)

    def info(self, message: str) -> None:
        self._write("INFO", message)

    def warning(self, message: str) -> None:
        self._write("WARNING", message)

    def error(self, message: str) -> None:
        self._write("ERROR", message)

    def read_logs(self, lines: int = 10) -> list[str]:
        """读取最近的日志"""
        with open(self.log_file, "r", encoding="utf-8") as f:
            all_lines = f.readlines()
            return all_lines[-lines:]

# 使用
logger = SimpleLogger("app.log")
logger.info("程序启动")
logger.info("处理用户请求")
logger.warning("内存使用率较高")
logger.error("连接数据库失败")

# 读取日志
recent_logs = logger.read_logs()
for log in recent_logs:
    print(log.rstrip())

示例 5：CSV 文件处理

import csv
from pathlib import Path

# 写入 CSV
def write_csv(filepath: str, data: list[list[str]]) -> None:
    with open(filepath, "w", newline="", encoding="utf-8") as f:
        writer = csv.writer(f)
        writer.writerows(data)

# 读取 CSV
def read_csv(filepath: str) -> list[list[str]]:
    with open(filepath, "r", encoding="utf-8") as f:
        reader = csv.reader(f)
        return list(reader)

# 写入
data = [
    ["姓名", "年龄", "城市"],
    ["Alice", "25", "Beijing"],
    ["Bob", "30", "Shanghai"],
    ["Charlie", "28", "Guangzhou"]
]
write_csv("users.csv", data)

# 读取
users = read_csv("users.csv")
for row in users:
    print(",".join(row))

# 使用 DictReader/DictWriter
def read_csv_dict(filepath: str) -> list[dict]:
    with open(filepath, "r", encoding="utf-8") as f:
        reader = csv.DictReader(f)
        return list(reader)

users_dict = read_csv_dict("users.csv")
for user in users_dict:
    print(f"姓名: {user['姓名']}, 年龄: {user['年龄']}, 城市: {user['城市']}")

最佳实践

始终使用 with 语句：确保文件正确关闭

# ✓ 推荐
with open("file.txt", "r") as f:
    data = f.read()

# ✗ 不推荐
f = open("file.txt", "r")
data = f.read()
f.close()  # 容易忘记

明确指定编码：避免跨平台问题

# ✓ 推荐
with open("file.txt", "r", encoding="utf-8") as f:

# ✗ 不推荐（依赖系统默认编码）
with open("file.txt", "r") as f:

使用 pathlib 处理路径：更清晰、更面向对象

# ✓ 推荐
from pathlib import Path
path = Path("dir") / "file.txt"

# ✗ 不推荐
import os
path = os.path.join("dir", "file.txt")

使用 JSON 处理结构化数据：易于阅读和调试
处理异常：文件操作可能失败

try:
    with open("file.txt", "r") as f:
        data = f.read()
except FileNotFoundError:
    print("文件不存在")
except PermissionError:
    print("没有权限")

课后练习

练习 8.1：文件读取

编写程序，读取一个文本文件并：

统计行数
统计单词数
统计字符数
找出最长的行

练习 8.2：文件复制

实现一个文件复制函数，支持文本文件和二进制文件。

练习 8.3：日志系统

创建一个简单的日志系统，支持写入日志到文件，按日期分割日志文件。

练习 8.4：JSON 配置

创建一个配置管理系统，支持：

读取 JSON 配置文件
修改配置项
保存配置

练习 8.5：目录遍历

使用 pathlib 遍历目录，打印所有 .py 文件及其大小。

练习 8.6：CSV 处理

实现一个 CSV 联系人管理程序，支持：

添加联系人
查看联系人列表
搜索联系人

练习 8.7：文件搜索

实现一个文件搜索工具，查找指定目录下所有指定类型的文件。

练习 8.8：上下文管理器

创建一个上下文管理器，用于计时代码块执行时间。

本章小结

本章我们详细学习了 Python 的文件操作：

文件基础：
- 文件打开、读取、写入、关闭
- 不同的打开模式
with 语句：
- 自动资源管理
- 自定义上下文管理器
编码处理：
- UTF-8、GBK 等编码
- 编码错误处理
JSON 操作：
- 读写 JSON 文件
- 自定义编码器
pathlib 模块：
- 面向对象的路径处理
- 目录和文件操作
综合应用：
- 文件复制、统计
- 配置管理、日志系统

这些知识是 Python 编程的基础技能，在实际开发中会经常用到。

分享

第八章：文件操作与 with 语句

第八章：文件操作与 with 语句

本章学习目标

8.1 文件基础

8.1.1 什么是文件

8.1.2 打开文件

8.1.3 关闭文件

8.1.4 推荐方式：with 语句

8.2 读取文件

8.2.1 read() 方法

8.2.2 readline() 方法

8.2.3 readlines() 方法

8.2.4 遍历文件对象

8.2.5 二进制文件读取

8.3 写入文件

8.3.1 write() 方法

8.3.2 writelines() 方法

8.3.3 追加模式

8.3.4 写入二进制文件

8.4 with 语句与上下文管理器

8.4.1 什么是上下文管理器

8.4.2 文件是最常见的上下文管理器

8.4.3 自定义上下文管理器

8.4.4 使用 contextlib

8.4.5 其他上下文管理器

8.5 文件指针操作

8.5.1 文件指针基础

8.5.2 seek() 方法

8.6 字符编码

8.6.1 编码基础

8.6.2 指定编码读写

8.6.3 处理编码错误

8.6.4 检测文件编码

8.7 JSON 文件操作

8.7.1 JSON 基础

8.7.2 写入 JSON

8.7.3 读取 JSON

8.7.4 自定义 JSON 编码

8.7.5 自定义 JSON 解码

8.8 pathlib 模块

8.8.1 pathlib 简介

8.8.2 路径操作

8.8.3 读取和写入文件

8.8.4 目录操作

8.8.5 文件信息

8.9 综合示例

示例 1：复制文件

示例 2：统计文件行数

示例 3：配置文件读写

示例 4：日志文件处理

示例 5：CSV 文件处理

最佳实践

课后练习

练习 8.1：文件读取

练习 8.2：文件复制

练习 8.3：日志系统

练习 8.4：JSON 配置

练习 8.5：目录遍历

练习 8.6：CSV 处理

练习 8.7：文件搜索

练习 8.8：上下文管理器

本章小结

相关资源

评论