在Python中防止某些字段被Pickle序列化

在Python中，如果你想防止某些字段被pickle序列化，可以使用__reduce__()方法来自定义pickle行为。__reduce__()方法允许你返回一个元组，其中包含要在对象被pickle时调用的函数以及传递给该函数的参数。下面就是我遇到的问题以及最终解决方案。

在这里插入图片描述

1、问题背景

在使用 Python 的 Pickle 模块对对象进行序列化时，我们有时希望排除某些字段，以防止其被序列化。这可能是由于这些字段包含敏感信息，或者只是因为它们是临时变量，不应被持久化。

2、解决方案

有几种方法可以防止某些字段被 Pickle 序列化。

使用 __getstate__ 和 __setstate__ 方法

__getstate__ 和 __setstate__ 是 Python 内置的特殊方法，可以让我们自定义对象的序列化和反序列化行为。我们可以通过重写这些方法来控制哪些字段被序列化。

class Something(object):def __init__(self):self._thing_id = 0self._cached_thing = Nonedef __getstate__(self):# 只序列化 `_thing_id` 字段return {'_thing_id': self._thing_id}def __setstate__(self, state):# 从 `state` 中恢复 `_thing_id` 字段self._thing_id = state['_thing_id']

使用 __getnewargs__ 和 __getnewargs_ex__ 方法

__getnewargs__ 和 __getnewargs_ex__ 是 Python 内置的特殊方法，可以让我们在序列化对象时传递自定义参数。我们可以通过重写这些方法来控制哪些字段被序列化。

class Something(object):def __init__(self, thing_id):self._thing_id = thing_idself._cached_thing = Nonedef __getnewargs__(self):# 只传递 `_thing_id` 参数return (self._thing_id,)def __getnewargs_ex__(self):# 只传递 `_thing_id` 参数return (self._thing_id,), {}

使用 __reduce__ 方法

__reduce__ 是 Python 内置的特殊方法，可以让我们自定义对象的序列化行为。我们可以通过重写这个方法来控制哪些字段被序列化。

class Something(object):def __init__(self, thing_id):self._thing_id = thing_idself._cached_thing = Nonedef __reduce__(self):# 只返回 `_thing_id` 参数return (self.__class__, (self._thing_id,), {})

使用 _blacklist 变量

我们可以使用 _blacklist 变量来指定哪些字段不应被序列化。在 __getstate__ 方法中，我们可以使用这个变量来过滤掉不需要序列化的字段。

class Something(object):def __init__(self, thing_id):self._thing_id = thing_idself._cached_thing = None# 黑名单_blacklist = ['_cached_thing']def __getstate__(self):# 只序列化除 `_blacklist` 中的字段以外的所有字段return {k: v for k, v in self.__dict__.items() if k not in self._blacklist}

使用命名约定

为了避免在每个类中都指定 _blacklist 变量，我们可以使用命名约定来标记哪些字段不应被序列化。例如，我们可以将不应被序列化的字段命名为 _cached_xxx。这样，我们在 __getstate__ 方法中就可以直接过滤掉所有以 _cached_xxx 开头的字段。

class Something(object):def __init__(self, thing_id):self._thing_id = thing_idself._cached_thing = Nonedef __getstate__(self):# 只序列化除了以下列 "_cached_" 开头的字段之外的所有字段return {k: v for k, v in self.__dict__.items() if not k.startswith('_cached_')}

在这个示例中，MyClass类有两个字段：sensitive_data和non_sensitive_data。我们通过定义__reduce__()方法来指定pickle时应该调用的函数。在这个函数中，我们只传递了non_sensitive_data字段，而忽略了self.sensitive_data字段，从而防止了敏感数据被pickle序列化。

我们可以根据实际需求自定义__reduce__()方法来选择哪些字段需要被pickle序列化，哪些字段不需要。

如果有任何问题可以留言讨论交流。

本文来自互联网用户投稿，该文观点仅代表作者本人，不代表本站立场。本站仅提供信息存储空间服务，不拥有所有权，不承担相关法律责任。如若转载，请注明出处：http://www.xdnf.cn/news/1425004.html

如若内容造成侵权/违法违规/事实不符，请联系一条长河网进行投诉反馈，一经查实，立即删除！