Python酷库之旅-第三方库Pandas(128)

一、用法精讲

571、pandas.DataFrame.T属性

571-1、语法

571-2、参数

571-3、功能

571-4、返回值

571-5、说明

571-6、用法

571-6-1、数据准备

571-6-2、代码示例

571-6-3、结果输出

572、pandas.DataFrame.transpose方法

572-1、语法

572-2、参数

572-3、功能

572-4、返回值

572-5、说明

572-6、用法

572-6-1、数据准备

572-6-2、代码示例

572-6-3、结果输出

573、pandas.DataFrame.assign方法

573-1、语法

573-2、参数

573-3、功能

573-4、返回值

573-5、说明

573-6、用法

573-6-1、数据准备

573-6-2、代码示例

573-6-3、结果输出

574、pandas.DataFrame.compare方法

574-1、语法

574-2、参数

574-3、功能

574-4、返回值

574-5、说明

574-6、用法

574-6-1、数据准备

574-6-2、代码示例

574-6-3、结果输出

575、pandas.DataFrame.join方法

575-1、语法

575-2、参数

575-3、功能

575-4、返回值

575-5、说明

575-6、用法

575-6-1、数据准备

575-6-2、代码示例

575-6-3、结果输出

二、推荐阅读

1、Python筑基之旅

2、Python函数之旅

3、Python算法之旅

4、Python魔法之旅

5、博客个人主页

一、用法精讲

571、pandas.DataFrame.T属性

571-1、语法

# 571、pandas.DataFrame.T属性
property DataFrame.T
The transpose of the DataFrame.Returns:
DataFrame
The transposed DataFrame.

571-2、参数

无

571-3、功能

用于转置DataFrame的属性，它的功能是在行和列之间进行互换。

571-4、返回值

返回一个新的DataFrame，其中原DataFrame的行和列互换。

571-5、说明

571-5-1、转置后的DataFrame的数据类型可能会发生变化，特别是当原DataFrame中的数据类型不一致时。

571-5-2、转置一个DataFrame可能会增加内存的使用量，尤其是当DataFrame的形状非常大时。

571-6、用法

571-6-1、数据准备

无

571-6-2、代码示例

# 571、pandas.DataFrame.T属性
import pandas as pd
# 创建一个示例DataFrame
data = {'A': [5, 11],'B': [10, 24]
}
df = pd.DataFrame(data)
# 转置操作
transposed_df = df.T
# 输出结果
print(transposed_df)

571-6-3、结果输出

# 571、pandas.DataFrame.T属性
#     0   1
# A   5  11
# B  10  24

572、pandas.DataFrame.transpose方法

572-1、语法

# 572、pandas.DataFrame.transpose方法
pandas.DataFrame.transpose(*args, copy=False)
Transpose index and columns.Reflect the DataFrame over its main diagonal by writing rows as columns and vice-versa. The property T is an accessor to the method transpose().Parameters:
*argstuple, optional
Accepted for compatibility with NumPy.copybool, default False
Whether to copy the data after transposing, even for DataFrames with a single dtype.Note that a copy is always required for mixed dtype DataFrames, or for DataFrames with any extension types.NoteThe copy keyword will change behavior in pandas 3.0. Copy-on-Write will be enabled by default, which means that all methods with a copy keyword will use a lazy copy mechanism to defer the copy and ignore the copy keyword. The copy keyword will be removed in a future version of pandas.You can already get the future behavior and improvements through enabling copy on write pd.options.mode.copy_on_write = TrueReturns:
DataFrame
The transposed DataFrame.

572-2、参数

572-2-1、*args(可选)：自定义或级别的索引，如果要转置多级索引的数据框，可以指定需要转置的级别。

572-2-2、copy(可选，默认值为False)：布尔值，默认为False，如果为True，则将返回数据的副本；如果为False，则会在可能的情况下返回一个视图。

572-3、功能

用于对DataFrame进行转置，即将行与列互换。

572-4、返回值

返回一个新的DataFrame，其中行和列互换。

572-5、说明

572-5-1、转置操作在具有多级索引的DataFrame上时，可以通过传递级别参数来指定需要转置的级别。

572-5-2、在处理大数据集时，请注意copy参数的设置，以避免不必要的内存消耗。

572-6、用法

572-6-1、数据准备

无

572-6-2、代码示例

# 572、pandas.DataFrame.transpose方法
import pandas as pd
# 创建示例DataFrame
df = pd.DataFrame({'A': [1, 2, 3],'B': [4, 5, 6],
})
# 转置DataFrame
transposed_df = df.transpose()
print(transposed_df)

572-6-3、结果输出

# 572、pandas.DataFrame.transpose方法
#    0  1  2
# A  1  2  3
# B  4  5  6

573、pandas.DataFrame.assign方法

573-1、语法

# 573、pandas.DataFrame.assign方法
pandas.DataFrame.assign(**kwargs)
Assign new columns to a DataFrame.Returns a new object with all original columns in addition to new ones. Existing columns that are re-assigned will be overwritten.Parameters:
**kwargs
dict of {str: callable or Series}
The column names are keywords. If the values are callable, they are computed on the DataFrame and assigned to the new columns. The callable must not change input DataFrame (though pandas doesn’t check it). If the values are not callable, (e.g. a Series, scalar, or array), they are simply assigned.Returns:
DataFrame
A new DataFrame with the new columns in addition to all the existing columns.NotesAssigning multiple columns within the same assign is possible. Later items in ‘**kwargs’ may refer to newly created or modified columns in ‘df’; items are computed and assigned into ‘df’ in order.

573-2、参数

573-2-1、**kwargs(可选)：关键字参数，列名作为关键字，列的值作为相应的值，可以是简单的标量值、Series、数组或者更复杂的表达式。

573-3、功能

添加新列：通过传递新的列名和相关计算，可以在原DataFrame的基础上添加新列。
修改现有列：如果传入的列名已经存在于DataFrame中，则会更新该列的值。
表达式计算：可以通过函数或表达式的结果来动态生成新列。

573-4、返回值

返回一个新的DataFrame，其中包含原DataFrame的数据以及通过assign方法添加或修改的列，原DataFrame不会被修改。

573-5、说明

无

573-6、用法

573-6-1、数据准备

无

573-6-2、代码示例

# 573、pandas.DataFrame.assign方法
import pandas as pd
# 创建示例DataFrame
df = pd.DataFrame({'A': [1, 2, 3],'B': [4, 5, 6]
})
# 使用assign添加新列和更新现有列
new_df = df.assign(C=lambda x: x['A'] + x['B'],  # 添加新列 C，值为 A + BB=lambda x: x['B'] * 2         # 修改现有列 B，使其值翻倍
)
print(new_df)

573-6-3、结果输出

# 573、pandas.DataFrame.assign方法
#    A   B  C
# 0  1   8  5
# 1  2  10  7
# 2  3  12  9

574、pandas.DataFrame.compare方法

574-1、语法

# 574、pandas.DataFrame.compare方法
pandas.DataFrame.compare(other, align_axis=1, keep_shape=False, keep_equal=False, result_names=('self', 'other'))
Compare to another DataFrame and show the differences.Parameters:
otherDataFrame
Object to compare with.align_axis{0 or ‘index’, 1 or ‘columns’}, default 1
Determine which axis to align the comparison on.0, or ‘index’Resulting differences are stacked vertically
with rows drawn alternately from self and other.1, or ‘columns’Resulting differences are aligned horizontally
with columns drawn alternately from self and other.keep_shapebool, default False
If true, all rows and columns are kept. Otherwise, only the ones with different values are kept.keep_equalbool, default False
If true, the result keeps values that are equal. Otherwise, equal values are shown as NaNs.result_namestuple, default (‘self’, ‘other’)
Set the dataframes names in the comparison.New in version 1.5.0.Returns:
DataFrame
DataFrame that shows the differences stacked side by side.The resulting index will be a MultiIndex with ‘self’ and ‘other’ stacked alternately at the inner level.Raises:
ValueError
When the two DataFrames don’t have identical labels or shape.

574-2、参数

574-2-1、other(必须)：DataFrame，表示要比较的另一个DataFrame，必须与调用的DataFrame具有相同的索引和列。

574-2-2、align_axis(可选，默认值为1)：整数，指定用于比较的轴，0表示按行(索引)对齐，1表示按列(标签)对齐。

574-2-3、keep_shape(可选，默认值为False)：布尔值，如果设置为True，在比较的结果中保留所有行和列，即使它们在两个DataFrame中是相同的，默认情况下，只有在两个DataFrame中有差异的行和列才会显示。

574-2-4、keep_equal(可选，默认值为False)：布尔值，如果设置为True，在结果中保留相同的值，而不仅仅是不同的值，这样可以更全面地查看数据。

574-2-5、result_names(可选，默认值为('self', 'other'))：tuple，用于指定返回结果的列的名称，返回的DataFrame将有两列，分别显示来自两个DataFrame的不同值，可以自定义这些列的名称。

574-3、功能

提供了对两个DataFrame的详细对比，方便用户查看数据差异，可以选择是否保留相同的值和行，灵活调整输出结果。

574-4、返回值

返回一个DataFrame，显示不同的值：

该DataFrame的行索引是原始DataFrame的索引，而列则是差异的列(以result_names指定的名称)。
如果没有差异，将返回一个空的DataFrame。

574-5、说明

无

574-6、用法

574-6-1、数据准备

无

574-6-2、代码示例

# 574、pandas.DataFrame.compare方法
import pandas as pd
# 创建示例DataFrame
df1 = pd.DataFrame({'A': [1, 2, 3],'B': [4, 5, 6]
})
df2 = pd.DataFrame({'A': [1, 2, 4],'B': [4, 7, 6]
})
# 使用compare方法
result = df1.compare(df2, keep_shape=True, keep_equal=True)
print(result)

574-6-3、结果输出

# 574、pandas.DataFrame.compare方法
#      A          B      
#   self other self other
# 0    1     1    4     4
# 1    2     2    5     7
# 2    3     4    6     6

575、pandas.DataFrame.join方法

575-1、语法

# 575、pandas.DataFrame.join方法
pandas.DataFrame.join(other, on=None, how='left', lsuffix='', rsuffix='', sort=False, validate=None)
Join columns of another DataFrame.Join columns with other DataFrame either on index or on a key column. Efficiently join multiple DataFrame objects by index at once by passing a list.Parameters:
otherDataFrame, Series, or a list containing any combination of them
Index should be similar to one of the columns in this one. If a Series is passed, its name attribute must be set, and that will be used as the column name in the resulting joined DataFrame.onstr, list of str, or array-like, optional
Column or index level name(s) in the caller to join on the index in other, otherwise joins index-on-index. If multiple values given, the other DataFrame must have a MultiIndex. Can pass an array as the join key if it is not already contained in the calling DataFrame. Like an Excel VLOOKUP operation.how{‘left’, ‘right’, ‘outer’, ‘inner’, ‘cross’}, default ‘left’
How to handle the operation of the two objects.left: use calling frame’s index (or column if on is specified)right: use other’s index.outer: form union of calling frame’s index (or column if on is specified) with other’s index, and sort it lexicographically.inner: form intersection of calling frame’s index (or column if on is specified) with other’s index, preserving the order of the calling’s one.cross: creates the cartesian product from both frames, preserves the order of the left keys.lsuffixstr, default ‘’
Suffix to use from left frame’s overlapping columns.rsuffixstr, default ‘’
Suffix to use from right frame’s overlapping columns.sortbool, default False
Order result DataFrame lexicographically by the join key. If False, the order of the join key depends on the join type (how keyword).validatestr, optional
If specified, checks if join is of specified type.“one_to_one” or “1:1”: check if join keys are unique in both left and right datasets.“one_to_many” or “1:m”: check if join keys are unique in left dataset.“many_to_one” or “m:1”: check if join keys are unique in right dataset.“many_to_many” or “m:m”: allowed, but does not result in checks.New in version 1.5.0.Returns:
DataFrame
A dataframe containing columns from both the caller and other.

575-2、参数

575-2-1、other(必须)：DataFrame或者Series，表示要连接的另一个DataFrame或Series。

575-2-2、on(可选，默认值为None)：字符串，指定用于连接的列名，如果other是DataFrame，则这些列名可以是other中的列，只有在how不为'left'时该参数才有效。

575-2-3、how(可选，默认值为'left')：字符串，指定连接的方式，可选的值包括：

'left'：使用调用DataFrame的索引，返回左边DataFrame中的所有行。
'right'：返回右边DataFrame中的所有行，并基于其索引。
'outer'：返回所有的行，匹配的地方填充数据，没匹配的地方填充NaN。
'inner'：仅返回两个DataFrame中都有的行。

575-2-4、lsuffix(可选，默认值为'')：字符串，当连接的两个DataFrame中有相同的列名时，左边DataFrame列名前缀的后缀。

575-2-5、rsuffix(可选，默认值为'')：字符串，当连接的两个DataFrame中有相同的列名时，右边DataFrame列名前缀的后缀。

575-2-6、sort(可选，默认值为False)：布尔值，如果设置为True，返回的DataFrame会按索引排序；如果为False，则按原有顺序返回。

575-2-7、validate(可选，默认值为None)：字符串，检查合并的type，如果提供了值，将会抛出错误，可选值包括：

'one_to_one'：确保合并是一个对一的关系。
'one_to_many'：确保左边是一个对多的关系。
'many_to_one'：确保右边是一个对多的关系。
'many_to_many'：允许多对多的关系。

575-3、功能

可用于高效地连接多个DataFrame，而不必担心列名的重复(可通过后缀解决)，允许灵活地选择连接的方式(如左连接、右连接、外连接和内连接)以适应不同的需求，支持按列进行连接，使得DataFrame的合并更具可控性。

575-4、返回值

返回一个新的DataFrame，包含了连接后的结果：

新DataFrame的索引为左侧DataFrame的索引(根据指定的how类型而定)。
包含连接后的所有列，若存在列名重复的情况，则会使用lsuffix和rsuffix来区分。

575-5、说明

无

575-6、用法

575-6-1、数据准备

无

575-6-2、代码示例

# 575、pandas.DataFrame.join方法
import pandas as pd
# 创建示例DataFrame
df1 = pd.DataFrame({'A': [1, 2],'B': [3, 4]
}, index=['a', 'b'])
df2 = pd.DataFrame({'B': [5, 6],'C': [7, 8]
}, index=['a', 'c'])
# 使用join方法
result = df1.join(df2, how='outer', lsuffix='_left', rsuffix='_right')
print(result)

575-6-3、结果输出

# 575、pandas.DataFrame.join方法
#      A  B_left  B_right    C
# a  1.0     3.0      5.0  7.0
# b  2.0     4.0      NaN  NaN
# c  NaN     NaN      6.0  8.0