Python数据处理 | J&Ocean BLOG

笔记

发布日期: 2023-07-06

文章字数: 40k

阅读时长: 225 分

_hello="helloworld"
score=0
y=20
y=True

print(_hello)

helloworld

print(score)

print(y)

True

变量

python是动态类型语言，不检查数据类型
可以接收其他类型的数据

a=b=c=10

python支持链式赋值语句

print(a)

#coding=utf-8
#file:chapter4/4.4/hello.py

_hello="helloworld"
score_for_student=10.0 #没有错误发生
y=20

name1="Tom";name2="Tony"
#链式赋值语句
a=b=c=10

if y>10:
    print(y)
    print(score_for_student)
else:
    print(y*10)
print(_hello)

20
10.0
helloworld

#coding=utf-8
#file:chapter4/4.4/hello.py

_hello="helloworld"
score_for_student=10.0 #没有错误发生
y=20

name1="Tom";name2="Tony"
#链式赋值语句
a=b=c=10

if y>10:
    print(y)
    print(score_for_student)
else:
    print(y*10)
print(_hello)

20
10.0
helloworld

## coding=utf-8
import module1
from module1 import z

y=20

print(y)
print(module1.y)
print(z)

20
True
10.0

## coding=utf-8
import module1
from module1 import z

y=20

print(y)
print(module1.y)
print(z)

20
True
10.0

import com.pkg2.hello as module1
from com.pkg2.hello import z as x
print(x)
y=20
print(y)
print(module1.y)
print(z)

10.1
20
True
10.0

编码规范

命名规范

包名：全部小写字母，中间可以由的隔开，不推荐使用下画线。作为命名空间，包名野窍应该具有唯一性，推荐采用公司或组织域名的倒置，如com.apple . quicktime . v2 。
模块名：全部小写字母，如果是多个单词构成，可以用下画线隔开，如dummy_threading 。
类名：采用大驼峰法命名③，如SplitViewController 。
异常名：异常属于类，命名同类命名，但应该使用Error 作为后缀。如FileNotFoundError 。
变量名：全部小写字母，如果由多个单词构成，可以用下画线隔开。如果变量应用于模块或函数内部，则变量名可以由单下画线开头：变量类内部私有使用变量名可以双下画线开头。不要命名双下画线开头和结尾的变量，这是Python 保留的。另外，避免使用小写L 、大写0 和大写I 作为变量名。
函数名和方法名：命名同变量命名，如balance_account 、push_cm exit 。
常量名：全部大写字母，如果是由多个单词构成，可以用下画线隔开，如YEAR 和WEEK OF MONTH 。

注释规范

单行注释、多行注释和文档注释

文件注释

文件注释就是在每一个文件开头添加注释，采用多行注释。文件注释通常包括如下信息：版权信息、文件名、所在模块、作者信息、历史版本信息、文件内容和作用等。

#
#版权所有2015 北京智捷东方科技有限公司
#许可信息查看LICENSE . txt 文件
#描述：
## 实现日期基本功能
#历史版本：
## 2015 7 22 ：创建关东升
## 2015 - 8 - 20 ： 添加socket 库
## 2015 - 8 - 22 ：添加math 库
#

上述注释只是提供了版权信息、文件内容和历史版本信息等，文件注释要根据实际情况包
括内容。

文档注释

代码注释

使用todo注释

导入规范

导入语句应该按照从通用到特殊的顺序分组，顺序是：标准库→ 第三方库→ 自己模块。每一组之间有一个空行，而且组中模块是按照英文字母顺序排序的。

import io
import os
import pkgutil
import platform
import re
import sys
import time
from html import unescape
from com.pkgl import example

代码规范

空行

import 语句块前后保留两个空行
函数声明之前保留两个空行
类声明之前保留两个空行
方法声明之前保留一个空行
两个逻辑代码块之间应该保留一个空行

空格

赋值符号“＝”前后各有一个空格
所有的二元运算符都应该使用空格与操作数分开
一元运算符：算法运算符取反“”和运算符取反“ ～ ”
括号内不要有空格， Python 中括号包括小括号“（）飞中括号“ ［］”和大括号“｛｝”
不要在逗号、分号、冒号前面有空格，而是要在它们后面有一个空格，除非该符号已经是行尾了
参数列表、索引或切片的左括号前不应有空格

缩进

4 个空格常被作为缩进排版的一个级别。虽然在开发时程序员可以使用制表符进行缩进，而默认情况下一个制表符等于8 个空格，但是不同的IDE 工具中一个制表符与空格对应个数会有不同，所以不要使用制表符缩进。

断行

一行代码中最多79 个字符，对于文档注释和多行注释时一行最多72 个字符，但是如果注释中包含URL 地址可以不受这个限制。否则，如果超过则需断行，可以依据下面的一般规范断开。

在逗号后面断开
在运算符前面断开
尽量不要使用续行符“ ＼ ” ，当有括号（包括大括号、中括号和小括号）则在括号中断开，这样可以不使用续行符

数据类型

数字类型

整数类型

28

0b11100

28

0o34

0x1c

浮点类型

1.0

1.0

0.0

0.0

3.36e2

336.0

1.56e-2

0.0156

复数类型

1+2j

(1+2j)

(1+2j)+(1+2j)

(2+4j)

布尔类型

bool(0)

False

bool(2)

True

bool(1)

True

bool('')

False

bool(' ')

True

bool([])

False

bool({})

False

数字类型相互转换

隐式类型转换

a=1+True

print(a)

a=1.0+1

type(a)

float

print(a)

2.0

a=1.0+True

print(a)

2.0

a=1.0+1+True

print(a)

3.0

a=1.0+1+False

print(a)

2.0

显式类型转换

int(False)

int(True)

int(19.6)

float(5)

5.0

float(False)

0.0

float(True)

1.0

字符串类型

字符串表示方式

s = 'Hello World'

print(s)

Hello World

s="Hello World"

print(s)

Hello World

s='\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064'

print(s)

Hello World

s="\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064"

print(s)

Hello World

转义符

s='Hello\n World'

print(s)

Hello
 World

s='Hello\t World'

print(s)

Hello	 World

s='Hello \'World'
print(s)

Hello 'World

s="hello'world"
print(s)

hello'world

s='hello"world'
print(s)

hello"world

s='hello\\world'
print(s)

hello\world

s='hello\u005c world'
print(s)

hello\ world

原始字符串

s='hello\tworld'
print(s)

hello	world

s=r'hello\tworld'
print(s)

hello\tworld

长字符串

s='''hello
world'''
print(s)

hello
world

s='''hello
\tworld'''
print(s)

hello
    world

字符串格式化

name='Mary'
age=18
s='她的年龄是{0}岁。'.format(age)
print(s)

她的年龄是18岁。

s='{0}芳龄是{1}岁'.format(name,age)
print(s)

Mary芳龄是18岁

s='{1}芳龄是{0}岁'.format(age,name)
print(s)

Mary芳龄是18岁

s='{n}芳龄是{a}岁'.format(n=name,a=age)
print(s)

Mary芳龄是18岁

name='Mary'
age=18
money=1234.5678
"{0}芳龄是{1:d}岁。".format(name,age)

'Mary芳龄是18岁。'

"{1}芳龄是{0:5d}岁。".format(age,name)

'Mary芳龄是   18岁。'

"{0}今天收入是{1:f}元".format(name,money)

'Mary今天收入是1234.567800元'

"{0}今天收入是{1:.2f}".format(name,money)

'Mary今天收入是1234.57'

"{0}今天收入是{1:10.2f}".format(name,money)

'Mary今天收入是   1234.57'

"{0}今天收入是{1:g}".format(name,money)

'Mary今天收入是1234.57'

"{0}今天收入是{1:G}".format(name,money)

'Mary今天收入是1234.57'

"{0}今天收入是{1:e}".format(name,money)

'Mary今天收入是1.234568e+03'

"{0}今天收入是{1:E}".format(name,money)

'Mary今天收入是1.234568E+03'

字符串查找

source_str="there is a string accessing example"
len(source_str)

source_str[16]

'g'

source_str.find('r')

source_str.rfind('r')

source_str.find('ing')

source_str.rfind('ing')

source_str.find('e',15)

source_str.find('ing',5)

source_str.rfind('ing',5)

source_str.find('ing',18,28)

source_str.find('ingg',5)

-1

字符串与数字相互转换

int('9')

int('9.6')

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Input In [24], in <cell line: 1>()
----> 1 int('9.6')


ValueError: invalid literal for int() with base 10: '9.6'

float('9.6')

9.6

int('AB')

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Input In [26], in <cell line: 1>()
----> 1 int('AB')


ValueError: invalid literal for int() with base 10: 'AB'

str(3.24)

'3.24'

str(True)

'True'

str([])

'[]'

str([1,2,3])

'[1, 2, 3]'

str(34)

'34'

'{0:2f}'.format(3.24)

'3.240000'

'{:.1f}'.format(3.24)

'3.2'

'{:10.1f}'.format(3.24)

'       3.2'

运算符

算数运算符

一元运算符

a=12
-a

-12

二元运算符

1+2

2-1

2*3

3/2

1.5

3%2

3//2

-3//2

-2

10**2

10.22+10

20.22

10.0+True+2

13.0

'hello'+'world'

'helloworld'

'hello'+2

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

Input In [47], in <cell line: 1>()
----> 1 'hello'+2


TypeError: can only concatenate str (not "int") to str

'hello'*2

'hellohello'

关系运算符

a=1
b=2
a>b

False

a<b

True

a>=b

False

a<=b

True

1.0!=1

False

a='hello'
b='hello'
a==b

True

a='World'
a>b

False

a<b

True

a=[]
b=[1,2]
a==b

False

a<b

True

a=[1,2]
a==b

True

逻辑运算符

i=0
a=10
b=9

if a>b or i==1:
    print("或运算为真")
else:
    print("或运算为假")
    
if a<b and i==1:
    print("与运算为真")
else:
    print("与运算为假")
    

def f1():
    return a>b

def f2():
    print('--f2--')
    return a==b

print(f1() or f2())

或运算为真
与运算为假
True

位运算符

a=0b10110010
b=0b01011110
print("a|b={0}".format(a|b))
print("a&b={0}".format(a&b))
print("a^b={0}".format(a^b))
print("~a={0}".format(~a))
print("a>>2={0}".format(a>>2))
print("a<<2={0}".format(a<<2))
c=-0b1100
print("c>>2={0}".format(c>>2))
print("c<<2={0}".format(c<<2))

a|b=254
a&b=18
a^b=236
~a=-179
a>>2=44
a<<2=712
c>>2=-3
c<<2=-48

赋值运算符

a=1
b=2

a+=b
print(a)

a+=b+3
print(a)

a-=b
print(a)

a*=b
print(a)

a/=b
print(a)

a%=b
print(a)

a=0b10110010
b=0b01011110

a|=b
print(a)

a^=b
print(a)

其他运算符

同一性测试运算符

成员测试运算符

class Person:
    def __init__(self,name,age):
        self.name=name
        self.age=age
        
p1=Person('Tony',18)
p2=Person('Tony',18)

print(p1==p2)
print(p1 is p2)

print(p1!=p2)
print(p1 is not p2)

False
False
True
True

class Person:
    def __init__(self,name,age):
        self.name=name
        self.age=age
    
    def __eq__(self,other):
        if self.name==other.name and self.age==other.age:
            return True
        else:
            return False
  
        
p1=Person('Tony',18)
p2=Person('Tony',18)

print(p1==p2)
print(p1 is p2)

print(p1!=p2)
print(p1 is not p2)

True
False
False
True

string_a='hello'
print('e' in string_a)
print('ell' not in string_a)

list_a=[1,2]
print(2 in list_a)
print(1 not in list_a)

True
False
True
False

控制语句

分支语句

if结构

score=5

if score>=85:
    print('perfect')
if score<60:
    print('hard')
if score>=60 and score<85:
    print('justsoso')

hard

if-else结构

score=75

if score>=60:
    print('justsoso')
    if score>=90:
        print('perfect')
else:
    print("不及格")

justsoso

elif结构

score=80

if score>=90:
    grade='A'
elif score>=80:
    grade='B'
elif score>=70:
    grade='C'
elif score>=60:
    grade='D'
else:
    grade='F'
    
print(grade)

条件表达式

score=85
result='justsoso' if score>=60 else 'hard'
print(result)

justsoso

循环语句

while语句

i=0

while i*i<100_000:
    i+=1

print(i)
print(i*i)

317
100489

for语句

print('----范围----')
for num in range(1,10):
    print("{0}*{0}={1}".format(num,num*num))

print('----字符串----')
for item in "hello":
    print(item)
    
print('----整数列表----')
numbers=[43,32,53,54,75,7,10]
for item in numbers:
    print(item)

----范围----
1*1=1
2*2=4
3*3=9
4*4=16
5*5=25
6*6=36
7*7=49
8*8=64
9*9=81
----字符串----
h
e
l
l
o
----整数列表----
43
32
53
54
75
7
10

跳转语句

break语句

for item in range(10):
    if item==3:
        break
    print(item)

0
1
2

continue语句

for item in range(10):
    if item==3:
        continue
    print(item)

while和for中的else语句

i=0

while i*i<10:
    i+=1
    print("{0}*{0}={1}".format(num,num*num))
else:
    print("whileover")
    
print('----------')

for item in range(10):
    if item==3:
        break
    print(item)
else:
    print('forover')

9*9=81
9*9=81
9*9=81
9*9=81
whileover
----------
0
1
2

使用范围

range()函数语法：
$$
range([start,]stop[,step])
$$

for item in range(1,10,2):
    print(item)
print('------------')

for item in range(1,-10,-3):
    print(item)

1
3
5
7
9
------------
1
-2
-5
-8

数据结构

元组

序列

索引操作

a='hello'

a[0]

'h'

a[1]

'e'

a[2]

'l'

a[3]

'l'

a[4]

'o'

a[5]

---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

Input In [7], in <cell line: 1>()
----> 1 a[5]


IndexError: string index out of range

max(a)

'o'

min(a)

'e'

len(a)

序列的+和*

a*3

'hellohellohello'

print(a)

hello

a+=' '

a+='world'

print(a)

hello world

序列分片

[start:end]：start是开始索引，end是结束索引
[start:end:step]：start是开始索引，end是结束索引，step是步长，可取正负整数
实际切下分片为：[start,end)

a[1:3]

'el'

a[:3]

'hel'

a[0:3]

'hel'

a[0:]

'hello world'

a[0:5]

'hello'

a[:]

'hello world'

a[1:-1]

'ello worl'

a[1:5]

'ello'

a[1:5:2]

'el'

创建元组

21，32，43，45

  Input In [26]
    21，32，43，45
      ^
SyntaxError: invalid character '，' (U+FF0C)

21,32,43,45

(21, 32, 43, 45)

(21,32,43,45)

(21, 32, 43, 45)

print(a)

hello world

a=(21,32,43,45)

print(a)

(21, 32, 43, 45)

('hello','world')

('hello', 'world')

('hello','world',1,2,3)

('hello', 'world', 1, 2, 3)

tuple([21,32,43,45])

(21, 32, 43, 45)

a=(21)

type(a)

int

a=(21,)

type(a)

tuple

a=()

type(a)

tuple

访问元组

a=('hello','world',1,2,3)

a[1]

'world'

a[1:3]

('world', 1)

a[2:]

(1, 2, 3)

a[:2]

('hello', 'world')

str1,str2,n1,n2,n3=a

str1

'hello'

str2

'world'

n1

n2

n3

str1,str2,*n=a

str1

'hello'

str2

'world'

[1, 2, 3]

str1,_,n1,n2,_=a

str1

'hello'

n1

n2

遍历元组

a=(21,32,43,45)

for item in a:
    print(item)

print('---------------------')
for i,item in enumerate(a):
    print('{0}-{1}'.format(i,item))

21
32
43
45
---------------------
0-21
1-32
2-43
3-45

列表

列表创建

[20,10,50,40,30]

[20, 10, 50, 40, 30]

[]

[]

['hello','world',1,2,3]

['hello', 'world', 1, 2, 3]

a=[10]

type(a)

list

a=[10,]

type(a)

list

list((20,10,50,40,30))

[20, 10, 50, 40, 30]

追加元素

list.append(x)
list.extend(t)

student_list=['张三','李四','王五']

student_list.append('董六')

student_list

['张三', '李四', '王五', '董六']

student_list+=['刘备','关羽']

student_list

['张三', '李四', '王五', '董六', '刘备', '关羽']

student_list.extend(['张飞','赵云'])

student_list

['张三', '李四', '王五', '董六', '刘备', '关羽', '张飞', '赵云']

插入元素

list.insert(i,x)

student_list=['zhangsan','lisi','wangwu']

student_list.insert(2,'liubei')

student_list

['zhangsan', 'lisi', 'liubei', 'wangwu']

替换元素

student_list=['zhangsan','lisi','wangwu']

student_list[0]='zhugeliang'

student_list

['zhugeliang', 'lisi', 'wangwu']

删除元素

remove()方法

如果找到多个，只会删除第一个

student_list=['zhangsan','lisi','wangwu','wangwu']

student_list.remove('wangwu')

student_list

['zhangsan', 'lisi', 'wangwu']

student_list.remove('wangwu')

student_list

['zhangsan', 'lisi']

pop()方法

item=list.pop([i])

i是指定删除元素的索引

student_list=['zhangsan','lisi','wangwu']

student_list.pop()

'wangwu'

student_list

['zhangsan', 'lisi']

student_list.pop(0)

'zhangsan'

student_list

['lisi']

其他常用办法

reverse():倒置列表
copy():复制列表
clear():清楚列表中的所有元素
index(x[,i[,j]]):返回x第一次出现的索引，i为开始查找索引，j是结束查找索引，继承序列
count(x):返回x出现的次数，方法继承序列

a=[21,32,43,45]

a.reverse()

[45, 43, 32, 21]

b=a.copy()

[45, 43, 32, 21]

a.clear()

[]

[45, 43, 32, 21]

a=[45,43,32,21,32]

a.count(32)

student_list=['zhangsan','lisi','wangwu']

student_list.index('wangwu')

student_tuple=('zhangsan','lisi','wangwu')

student_tuple.index('wangwu')

student_tuple.index('lisi',1,2)

列表推导式

n_list=[]
for x in range(10):
    if x%2==0:
        n_list.append(x**2)
print(n_list)

[0, 4, 16, 36, 64]

n_list=[x**2 for x in range(10) if x%2==0]

n_list

[0, 4, 16, 36, 64]

n_list=[x for x in range(100) if x%2==0 if x%5==0]

n_list

[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

集合

创建可变集合

a={'zhangsan','lisi','wangwu'}

{'lisi', 'wangwu', 'zhangsan'}

a={'zhangsan','lisi','wangwu','wangwu'}

len(a)

{'lisi', 'wangwu', 'zhangsan'}

set((20,10,50,40,30))

{10, 20, 30, 40, 50}

b={}

type(b)

dict

b=set()

type(b)

set

修改可变集合

add(elem):添加元素，已存在不能添加
remove(elem):删除元素，不存在则抛出错误
discard(elem):删除元素，不存在不抛出
pop():删除返回集合中任意元素，返回值是删除的元素
clear():清楚集合

student_set={'zhangsan','lisi','wangwu'}

student_set.add('dongliu')

student_set

{'dongliu', 'lisi', 'wangwu', 'zhangsan'}

student_set.remove('lisi')

student_set

{'dongliu', 'wangwu', 'zhangsan'}

student_set.remove('lisi')

---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

Input In [129], in <cell line: 1>()
----> 1 student_set.remove('lisi')


KeyError: 'lisi'

student_set.discard('lisi')

student_set

{'dongliu', 'wangwu', 'zhangsan'}

student_set.discard('wangwu')

student_set

{'dongliu', 'zhangsan'}

student_set.pop()

'dongliu'

student_set

{'zhangsan'}

student_set.clear()

student_set

set()

遍历集合

student_set={'zhangsan','lisi','wangwu'}

for item in student_set:
    print(item)
    
print('----------')
for i,item in enumerate(student_set):
    print('{0}-{1}'.format(i,item))

lisi
wangwu
zhangsan
----------
0-lisi
1-wangwu
2-zhangsan

不可变集合

student_set=frozenset({'zhangsan','lisi','wangwu'})

student_set

frozenset({'lisi', 'wangwu', 'zhangsan'})

type(student_set)

frozenset

student_set.add('dongliu')

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

Input In [142], in <cell line: 1>()
----> 1 student_set.add('dongliu')


AttributeError: 'frozenset' object has no attribute 'add'

a=(21,32,43,45)

seta=frozenset(a)

seta

frozenset({21, 32, 43, 45})

集合推导式

n_list={x for x in range(100) if x%2==0 if x%5==0}
print(n_list)

{0, 70, 40, 10, 80, 50, 20, 90, 60, 30}

input_list=[2,3,2,4,5,6,6,6]
n_set=[x**2 for x in input_list]

n_set

[4, 9, 4, 16, 25, 36, 36, 36]

n_list={x**2 for x in input_list}

n_list

{4, 9, 16, 25, 36}

字典

创建字典

dict1={102:'zhangsan',105:'lisi',109:'wangwu'}

len(dict1)

dict1

{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}

type(dict1)

dict

dict1={}

dict1

{}

dict({102:'zhangsan',105:'lisi',109:'wangwu'})

{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}

dict(((102,'zhangsan'),(105,'lisi'),(109,'wangwu')))

{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}

dict([(102,'zhangsan'),(105,'lisi'),(109,'wangwu')])

{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}

t1=(102,'zhangsan')

t2=(105,'lisi')

t3=(109,'wangwu')

t=(t1,t2,t3)

dict(t)

{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}

list1=[t1,t2,t3]

dict(list1)

{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}

dict(zip([102,105,109],['zhangsan','lisi','wangwu']))

{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}

访问字典

get(key[,default]):通过键返回值，如果键不存在返回默认值
items()：返回字典的所有键值对
keys()：返回字典键视图
values()：返回字典值视图

dict1={102:'zhangsan',105:'lisi',109:'wangwu'}

dict1.get(105)

'lisi'

dict1.get(101)

dict1.get(101,'dongliu')

'dongliu'

dict1.items()

dict_items([(102, 'zhangsan'), (105, 'lisi'), (109, 'wangwu')])

dict1.keys()

dict_keys([102, 105, 109])

dict1.values()

dict_values(['zhangsan', 'lisi', 'wangwu'])

student_dict={102:'zhangsan',105:'lisi',109:'wangwu'}

102 in student_dict

True

'lisi' in student_dict

False

print('---bianlijian---')
for student_id in student_dict.keys():
    print('xuehao:'+str(student_id))
    
print('---bianlizhi---')
for student_name in student_dict.values():
    print('xuesheng:'+student_name)
    
print('---bianlijian:zhi---')
for student_id,student_name in student_dict.items():
    print('xuehao:{0}-xuesheng:{1}'.format(student_id,student_name))

---bianlijian---
xuehao:102
xuehao:105
xuehao:109
---bianlizhi---
xuesheng:zhangsan
xuesheng:lisi
xuesheng:wangwu
---bianlijian:zhi---
xuehao:102-xuesheng:zhangsan
xuehao:105-xuesheng:lisi
xuehao:109-xuesheng:wangwu

字典推导式

input_dict={'one':1,'two':2,'three':3,'four':4}

output_dict={k:v for k,v in input_dict.items() if v%2==0}
output_dict

{'two': 2, 'four': 4}

keys=[k for k,v in input_dict.items() if v%2==0]

keys

['two', 'four']

函数式编程

定义函数

def ---:
    ---
    return ---

def rectangle_area(width,height):
    area=width*height
    return area

r_area=rectangle_area(320,420)
print("320*420的矩形面积{0}".format(r_area))

320*420的矩形面积134400

函数参数

使用关键字参数调用函数

def print_area(width,height):
    area=width*height
    print("{0}*{1}矩形的面积是：{2}".format(width,height,area))
    
print_area(320,420)
print_area(width=320,height=420)
print_area(320,height=420)
print(height=420,width=320)

320*420矩形的面积是：134400
320*420矩形的面积是：134400
320*420矩形的面积是：134400



---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

Input In [4], in <cell line: 8>()
      6 print_area(width=320,height=420)
      7 print_area(320,height=420)
----> 8 print(height=420,width=320)


TypeError: 'height' is an invalid keyword argument for print()

参数默认值

def make_coffee(name="Cappuccino"):
    return "制作一杯{0}".format(name)

coffee1=make_coffee("Latte")
coffee2=make_coffee()

print(coffee1)
print(coffee2)

制作一杯Latte
制作一杯Cappuccino

可变参数

*可变参数

def sum(*numbers,multiple=1):
    total=0
    for number in numbers:
        total+=number
    return total*multiple

print(sum(100.0,20.0,30.0))
print(sum(80,30))
print(sum(30,80,multiple=2))
double_tuple={50.0,60.0,0.0}
print(sum(30,80,*double_tuple))

**可变参数

def show(sep=':', **info):
    print('----info----')
    for key, value in info.items():
        print('{0} {2} {1}'.format(key, value, sep))


show('->', name='tony', age=18, sex = True)
show(student_name='tony',student_no='1000',sep='=')
stu_dict={'name':'tony','age':18}
show(**stu_dict,sex=True,sep='=')

----info----
name -> tony
age -> 18
sex -> True
----info----
student_name = tony
student_no = 1000
----info----
name = tony
age = 18
sex = True

函数返回值

无返回值函数

def show(sep=':', **info):
    print('----info----')
    for key, value in info.items():
        print('{0} {2} {1}'.format(key, value, sep))
    return

result=show('->', name='tony', age=18, sex = True)
print(result)

def sum(*numbers,multiple=1):
    total=0
    for number in numbers:
        total+=number
    return total*multiple

print(sum(100.0,20.0,30.0))
print(sum(80,30))

----info----
name -> tony
age -> 18
sex -> True
None
150.0
110

多返回值函数

def position(dt,speed):
    posx=speed[0]*dt
    posy=speed[1]*dt
    return(posx,posy)

move=position(60,(10,-5))
print("物体位移：({0},{1})".format(move[0],move[1]))

物体位移：(600,-300)

函数变量作用域

x=20
def print_value():
    print("函数中x={0}".format(x))
    
print_value()
print("全局变量={0}".format(x))

函数中x=20
全局变量=20

x=20
def print_value():
    x=10
    print("函数中x={0}".format(x))
    
print_value()
print("全局变量={0}".format(x))

函数中x=10
全局变量=20

x=20
def print_value():
    global x
    x=10
    print("函数中x={0}".format(x))
    
print_value()
print("全局变量={0}".format(x))

函数中x=10
全局变量=10

生成器

def square(num):
    n_list=[]
    
    for i in range(1,num+1):
        n_list.append(i*i)
        
    return n_list

for i in square(5):
    print(i,end=' ')

1 4 9 16 25

def square(num):
    n_list=[]
    
    for i in range(1,num+1):
        yield i*i
        
    return n_list

for i in square(5):
    print(i,end=' ')

1 4 9 16 25

def square(num):
    for i in range(1,num+1):
        yield i*i

n_seq=square(5)

n_seq.__next__()

n_seq.__next__()

n_seq.__next__()

n_seq.__next__()

n_seq.__next__()

n_seq.__next__()

---------------------------------------------------------------------------

StopIteration                             Traceback (most recent call last)

Input In [14], in <cell line: 1>()
----> 1 n_seq.__next__()


StopIteration:

嵌套函数

def calculate(n1,n2,opr):
    multiple=2
    
    def add(a,b):
        return (a+b)*multiple
    
    def sub(a,b):
        return (a-b)*multiple
    
    if opr=='+':
        return add(n1,n2)
    else:
        return sub(n1,n2)
    
print(calculate(10,5,'+'))

函数式编程基础

函数类型

def calculate_fun(opr):
    def add(a,b):
        return a+b
    
    def sub(a,b):
        return a-b
    
    if opr=='+':
        return add
    else:
        return sub

f1=calculate_fun('+')
f2=calculate_fun('-')

print(type(f1))

print('10+5={0}'.format(f1(10,5)))
print('10-5={0}'.format(f2(10,5)))

<class 'function'>
10+5=15
10-5=5

Lamda表达式

def calculate_fun(opr):
    if opr=='+':
        return lambda a,b:(a+b)
    else:
        return lambda a,b:(a-b)

f1=calculate_fun('+')
f2=calculate_fun('-')

print(type(f1))

print('10+5={0}'.format(f1(10,5)))
print('10-5={0}'.format(f2(10,5)))

<class 'function'>
10+5=15
10-5=5

三大基础函数

filter()

users=['tony','tom','ben','alex']
users_filter=filter(lambda u:u.startswith('t'),users)
print(list(users_filter))

['tony', 'tom']

number_list=range(1,11)
number_filter=filter(lambda it:it%2==0,number_list)
print(list(number_filter))

[2, 4, 6, 8, 10]

map()

users=['tony','tom','ben','alex']
users_map=map(lambda u:u.lower(),users)
print(list(users_map))

['tony', 'tom', 'ben', 'alex']

users=['tony','tom','ben','alex']
users_filter=filter(lambda u:u.startswith('t'),users)
users_map=map(lambda u:u.lower(),filter(lambda u:u.startswith('t'),users))
print(list(users_map))

['tony', 'tom']

from functools import reduce
a={1,2,3,4}
a_reduce=reduce(lambda acc,i:acc+i,a)
print(a_reduce)

面向对象编程

面向对象概述oop

面向对象三个基本特性

封装性

继承性

多态性

类和对象

定义类

class 类名[(父类)]:
    类体

class Animal(object):
    
    pass

创建和使用对象

animal=Animal()

print(animal)

<__main__.Animal object at 0x00000222D7FA4160>

实例变量

class Animal(object):
    def __init__(self,age,sex,weight):
        self.age=age
        self.sex=sex
        self.weight=weight

animal=Animal(2,1,10.0)

print('age:{0}'.format(animal.age))
print('sex:{0}'.format('female' if animal.sex==0 else 'male'))
print('weight:{0}'.format(animal.weight))

age:2
sex:male
weight:10.0

类变量

class Account:
    interest_rate=0.0668
    
    def __init__(self,owner,amount):
        self.owner=owner
        self.amount=amount
        
account=Account('tony',1_800_000.0)
print('account:{0}'.format(account.owner))
print('amount:{0}'.format(account.amount))
print('interest_rate:{0}'.format(account.interest_rate))

account:tony
amount:1800000.0
interest_rate:0.0668

构造方法

class Animal(object):
    def __init__(self,age,sex=1,weight=0.0):
        self.age=age
        self.sex=sex
        self.weight=weight
        
a1=Animal(2,1,10.0)
a2=Animal(1,weight=5.0)
a3=Animal(1,sex=0)
print('age:{0}'.format(a1.age))
print('sex:{0}'.format('female' if a3.sex==0 else 'male'))
print('weight:{0}'.format(a2.weight))

age:2
sex:female
weight:5.0

实例方法

class Animal(object):
    def __init__(self,age,sex=1,weight=0.0):
        self.age=age
        self.sex=sex
        self.weight=weight
    def eat(self):
        self.weight+=0.05
        print('eat')
    def run(self):
        self.weight-=0.01
        print('run')
        
a1=Animal(2,0,10.0)
print(a1.weight)
a1.eat()
print(a1.weight)
a1.run()
print(a1.weight)

10.0
eat
10.05
run
10.040000000000001

类方法

class Account:
    interest_rate=0.0668
    
    def __init__(self,owner,amount):
        self.owner=owner
        self.amount=amount
        
    @classmethod
    def interest_by(cls,amt):
        return cls.interest_rate*amt
    
interest=Account.interest_by(12000.0)
print(interest)

801.6

静态方法

class Account:
    interest_rate=0.0668
    
    def __init__(self,owner,amount):
        self.owner=owner
        self.amount=amount
        
    @classmethod
    def interest_by(cls,amt):
        return cls.interest_rate*amt
    
    @staticmethod
    def interest_with(amt):
        return Account.interest_by(amt)
    
interest1=Account.interest_by(12000.0)
print(interest1)
interest2=Account.interest_with(12000.0)
print(interest2)

801.6
801.6

封装性

私有变量

class Animal(object):
    def __init__(self,age,sex=1,weight=0.0):
        self.age=age
        self.sex=sex
        self.__weight=weight
    def eat(self):
        self.weight+=0.05
        print('eat')
    def run(self):
        self.weight-=0.01
        print('run')
        
a1=Animal(2,0,10.0)
print(a1.weight)
a1.eat()
a1.run()

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

Input In [4], in <cell line: 14>()
     11         print('run')
     13 a1=Animal(2,0,10.0)
---> 14 print(a1.weight)
     15 a1.eat()
     16 a1.run()


AttributeError: 'Animal' object has no attribute 'weight'

私有方法

class Animal(object):
    def __init__(self,age,sex=1,weight=0.0):
        self.age=age
        self.sex=sex
        self.__weight=weight
    def eat(self):
        self.__weight+=0.05
        print('eat')
    def __run(self):
        self.__weight-=0.01
        print('run')
        
a1=Animal(2,0,10.0)
a1.eat()
a1.run()

eat



---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

Input In [5], in <cell line: 15>()
     13 a1=Animal(2,0,10.0)
     14 a1.eat()
---> 15 a1.run()


AttributeError: 'Animal' object has no attribute 'run'

定义属性

class Animal(object):
    def __init__(self,age,sex=1,weight=0.0):
        self.age=age
        self.sex=sex
        self.__weight=weight
        
    def set_weight(self,weight):
        self.__weight=weight
    def get_weight(self):
        return self.__weight
    
a1=Animal(2,0,10.0)
print(a1.get_weight)
a1.set_weight(123.45)
print(a1.get_weight)

<bound method Animal.get_weight of <__main__.Animal object at 0x000002310CC31CA0>>
<bound method Animal.get_weight of <__main__.Animal object at 0x000002310CC31CA0>>

class Animal(object):
    def __init__(self,age,sex=1,weight=0.0):
        self.age=age
        self.sex=sex
        self.__weight=weight
        
    @property
    def weight(self):
        return self.__weight
    
    @weight.setter
    def weight(self,weight):
        self.__weight=weight
        
a1=Animal(2,0,10.0)
print(a1.weight)
a1.weight=123.45
print(a1.weight)

10.0
123.45

继承性

继承概念

class Person:
    def __init__(self,name,age):
        self.name=name
        self.age=age
        
    def info(self):
        template='Person[name={0},age={1}]'
        s=template.format(self.name,self.age)
        return s

class Student(Person):
    def __init__(self,name,age,school):
        super().__init__(name,age)
        self.school=school

重写方法

class Animal(object):
    def __init__(self,age,sex=1,weight=0.0):
        self.age=age
        self.sex=sex
        self.weight=weight
        
    def eat(self):
        self.weight+=0.05
        print('eat')
        
class Dog(Animal):
    def eat(self):
        self.weight+=0.1
        print('gougouchi...')
        
a1=Dog(2,0,10.0)
a1.eat()

gougouchi...

多继承

class ParentClass1:
    def run(self):
        print('ParentClass1 run...')
        
class ParentClass2:
    def run(self):
        print('ParentClass2 run...')
        
class SubClass1(ParentClass1,ParentClass2):
    pass

class SubClass2(ParentClass2,ParentClass1):
    pass

class SubClass3(ParentClass1,ParentClass2):
    def run(self):
        print('SubClass3 run...')
        
sub1=SubClass1()
sub1.run()

sub2=SubClass2()
sub2.run()

sub3=SubClass3()
sub3.run()

ParentClass1 run...
ParentClass2 run...
SubClass3 run...

多态性

多态概念

class Figure:
    def draw(self):
        print('draw figure...')
        
class Ellipse(Figure):
    def draw(self):
        print('draw Ellipse')
        
class Triangle(Figure):
    def draw(self):
        print('draw Triangle')
        
f1=Figure()
f1.draw()

f2=Ellipse()
f2.draw()

f3=Triangle()
f3.draw()

draw figure...
draw Ellipse
draw Triangle

类型检查

class Figure:
    def draw(self):
        print('draw figure...')
        
class Ellipse(Figure):
    def draw(self):
        print('draw Ellipse')
        
class Triangle(Figure):
    def draw(self):
        print('draw Triangle')
        
f1=Figure()
f1.draw()

f2=Ellipse()
f2.draw()

f3=Triangle()
f3.draw()


print(isinstance(f1,Triangle))
print(isinstance(f2,Triangle))
print(isinstance(f3,Triangle))
print(isinstance(f2,Figure))

draw figure...
draw Ellipse
draw Triangle
False
False
True
True

鸭子类型

class Animal(object):
    def run(self):
        print('animal run')
        
class Dog(Animal):
    def run(self):
        print('dog run')
        
class Car(object):
    def run(self):
        print('car run')
        
def go(animal):
    animal.run()
    
go(Animal())
go(Dog())
go(Car())

animal run
dog run
car run

Python根类——object

两个重要方法

str()
eq(other)

str()方法

class Person:
    def __init__(self,name,age):
        self.name=name
        self.age=age
        
    def __str__(self):
        template='Person [name={0},age={1}]'
        s=template.format(self.name,self.age)
        return s

person=Person('Tony',18)
print(person)

Person [name=Tony,age=18]

对象比较方法

class Person:
    def __init__(self,name,age):
        self.name=name
        self.age=age
        
    def __str__(self):
        template='Person [name={0},age={1}]'
        s=template.format(self.name,self.age)
        return s
    
    def __eq__(self,other):
        if self.name==other.name and self.age==other.age:
            return True
        else:
            return False
        
p1=Person('Tony',18)
p2=Person('Tony',18)

print(p1==p2)

True

枚举类

定义枚举类

import enum

class WeekDays(enum.Enum):
    MONDAY=1
    TUESDAY=2
    WEDNESDAY=3
    THURSDAY=4
    FRIDAY=5
    
day=WeekDays.FRIDAY

print(day)
print(day.value)
print(day.name)

WeekDays.FRIDAY
5
FRIDAY

限制枚举类

import enum

@enum.unique

class WeekDays(enum.IntEnum):
    MONDAY=1
    TUESDAY=2
    WEDNESDAY=3
    THURSDAY=4
    FRIDAY=5
    
day=WeekDays.FRIDAY

print(day)
print(day.value)
print(day.name)

WeekDays.FRIDAY
5
FRIDAY

使用枚举类

import enum

@enum.unique

class WeekDays(enum.IntEnum):
    MONDAY=1
    TUESDAY=2
    WEDNESDAY=3
    THURSDAY=4
    FRIDAY=5
    
day=WeekDays.FRIDAY

if day==WeekDays.MONDAY:
    print('work')
elif day==WeekDays.FRIDAY:
    print('study')

study

异常处理

常见异常

AttributeError异常

class Animal(object):
    pass

al=Animal()

al.run()

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

Input In [3], in <cell line: 1>()
----> 1 al.run()


AttributeError: 'Animal' object has no attribute 'run'

print(al.age)

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

Input In [4], in <cell line: 1>()
----> 1 print(al.age)


AttributeError: 'Animal' object has no attribute 'age'

print(Animal.weight)

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

Input In [5], in <cell line: 1>()
----> 1 print(Animal.weight)


AttributeError: type object 'Animal' has no attribute 'weight'

OSError异常

f=open('abc.txt')

---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)

Input In [6], in <cell line: 1>()
----> 1 f=open('abc.txt')


FileNotFoundError: [Errno 2] No such file or directory: 'abc.txt'

IndexError异常

code_list=[125,56,89,36]
code_list[4]

---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

Input In [7], in <cell line: 2>()
      1 code_list=[125,56,89,36]
----> 2 code_list[4]


IndexError: list index out of range

KeyError异常

访问字典里不存在的键时引发

dict1[104]

---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Input In [9], in <cell line: 1>()
----> 1 dict1[104]


NameError: name 'dict1' is not defined

NameError异常

value1

---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Input In [10], in <cell line: 1>()
----> 1 value1


NameError: name 'value1' is not defined

a=value1

---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Input In [11], in <cell line: 1>()
----> 1 a=value1


NameError: name 'value1' is not defined

value1=10

TypeError异常

i='2'

print(5/i)

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

Input In [14], in <cell line: 1>()
----> 1 print(5/i)


TypeError: unsupported operand type(s) for /: 'int' and 'str'

ValueError异常

i='QWE'

print(5/int(i))

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Input In [16], in <cell line: 1>()
----> 1 print(5/int(i))


ValueError: invalid literal for int() with base 10: 'QWE'

捕获异常

try-except语句

import datetime as dt

def read_date(in_date):
    try:
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except ValueError:
        print('处理ValueError异常')


str_date='2018-8-18'
print('日期={0}'.format(read_date(str_date)))

日期=2018-08-18 00:00:00

def read_date(in_date):
    try:
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except ValueError as e:
        print('处理ValueError异常')
        print(e)

str_date='201B-8-18'
print('日期={0}'.format(read_date(str_date)))

处理ValueError异常
time data '201B-8-18' does not match format '%Y-%m-%d'
日期=None

多except代码块

import datetime as dt


def read_date_from_file(filename):
    try:
        file=open(filename)
        in_date=file.read()
        in_date=in_date.strip()
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except ValueError as e:
        print('处理ValueError异常')
        print(e)
    except FileNotFoundError as e:
        print('处理FileNotFoundError异常')
        print(e)
        
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))

处理FileNotFoundError异常
[Errno 2] No such file or directory: 'read.txt'
日期=None

import datetime as dt


def read_date_from_file(filename):
    try:
        file=open(filename)
        in_date=file.read()
        in_date=in_date.strip()
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except ValueError as e:
        print('处理ValueError异常')
        print(e)
    except FileNotFoundError as e:
        print('处理FileNotFoundError异常')
        print(e)
    except OSError as e:
        print('处理OSError异常')
        print(e)
        
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))

处理FileNotFoundError异常
[Errno 2] No such file or directory: 'read.txt'
日期=None

try-except语句嵌套

import datetime as dt

def read_date_from_file(filename):
    try:
        file=open(filename)
        try:
            in_date = file.read()
            in_date = in_date.strip()
            date = dt.datetime.strptime(in_date, '%Y-%m-%d')
            return date
        except ValueError as e:
            print('处理ValueError异常')
            print(e)
    except FileNotFoundError as e:
        print('处理FileNotFoundError异常')
        print(e)
    except OSError as e:
        print('处理OSError异常')
        print(e)
        
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))

处理FileNotFoundError异常
[Errno 2] No such file or directory: 'read.txt'
日期=None

多重异常捕获

import datetime as dt
def read_date_from_file(filename):
    try:
        file=open(filename)
        in_date=file.read()
        in_date=in_date.strip()
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except (ValueError,OSError) as e:
        print('调用---')
        print(e)
        
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))

调用---
[Errno 2] No such file or directory: 'read.txt'
日期=None

异常堆栈跟踪

import datetime as dt
import traceback as tb
def read_date_from_file(filename):
    try:
        file=open(filename)
        in_date=file.read()
        in_date=in_date.strip()
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except (ValueError,OSError) as e:
        print('调用---')
        print(e)
        tb.print_exc()
        
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))

调用---
[Errno 2] No such file or directory: 'read.txt'
日期=None


Traceback (most recent call last):
  File "C:\Users\HP\AppData\Local\Temp\ipykernel_8772\538862610.py", line 5, in read_date_from_file
    file=open(filename)
FileNotFoundError: [Errno 2] No such file or directory: 'read.txt'

释放资源

finally代码块

import datetime as dt
def read_date_from_file(filename):
    try:
        file=open(filename)
        in_date=file.read()
        in_date=in_date.strip()
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except ValueError as e:
        print('处理ValueError异常')
        print(e)
    except FileNotFoundError as e:
        print('处理FileNotFoundError异常')
        print(e)
    except OSError as e:
        print('处理OSError异常')
        print(e)
    finally:
        file.close()
        
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))

处理FileNotFoundError异常
[Errno 2] No such file or directory: 'read.txt'



---------------------------------------------------------------------------

UnboundLocalError                         Traceback (most recent call last)

Input In [7], in <cell line: 21>()
     18     finally:
     19         file.close()
---> 21 date=read_date_from_file('read.txt')
     22 print('日期={0}'.format(date))


Input In [7], in read_date_from_file(filename)
     17     print(e)
     18 finally:
---> 19     file.close()


UnboundLocalError: local variable 'file' referenced before assignment

else代码块

import datetime as dt
import traceback as tb

def read_date_from_file(filename):
    try:
        file=open(filename)
    except OSError as e:
        print('打开文件失败')
    else:
        print('打开文件成功')
        try:
            in_date = file.read()
            in_date = in_date.strip()
            date = dt.datetime.strptime(in_date, '%Y-%m-%d')
            return date
        except ValueError as e:
            print('处理ValueError异常')
            print(e)
        except OSError as e:
            print('处理OSError异常')
            print(e)
        finally:
            file.close()
            
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))

打开文件失败
日期=None

with as 代码块自动资源管理

import datetime as dt
def read_date_from_file(filename):
    try:
        with open(filename) as file:
            in_date=file.read()
        in_date=in_date.strip()
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except ValueError as e:
        print('处理ValueError异常')
        print(e)
    except OSError as e:
        print('处理OSError异常')
        print(e)
        
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))

处理OSError异常
[Errno 2] No such file or directory: 'read.txt'
日期=None

自定义异常类

class MyException(Exception):
    def __init__(self,message):
        super().__init__(message)

显式抛出异常

import datetime as dt
class MyException(Exception):
    def __init__(self,message):
        super().__init__(message)
        
def read_date_from_file(filename):
    try:
        file=open(filename)
        in_date=file.read()
        in_date=in_date.strip()
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except ValueError as e:
        raise MyException('不是有效日期')
    except FileNotFoundError as e:
        raise MyException('文件找不到')
    except OSError as e:
        raise MyException('文件无法打开或无法读取')

date=read_date_from_file('read.txt')
print('日期={0}'.format(date))

---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)

Input In [4], in read_date_from_file(filename)
      7 try:
----> 8     file=open(filename)
      9     in_date=file.read()


FileNotFoundError: [Errno 2] No such file or directory: 'read.txt'

During handling of the above exception, another exception occurred:


MyException                               Traceback (most recent call last)

Input In [4], in <cell line: 20>()
     17     except OSError as e:
     18         raise MyException('文件无法打开或无法读取')
---> 20 date=read_date_from_file('read.txt')
     21 print('日期={0}'.format(date))


Input In [4], in read_date_from_file(filename)
     14     raise MyException('不是有效日期')
     15 except FileNotFoundError as e:
---> 16     raise MyException('文件找不到')
     17 except OSError as e:
     18     raise MyException('文件无法打开或无法读取')


MyException: 文件找不到

常用模块

math模块

舍入函数

import math

math.ceil(1.4)

math.floor(1.4)

round(1.4)

math.ceil(1.5)

math.floor(1.5)

math.ceil(1.6)

math.floor(1.6)

round(1.5)

round(1.6)

幂和对数函数

math.log(8,2)

3.0

math.pow(2,3)

8.0

math.log(8)

2.0794415416798357

math.sqrt(1.6)

1.2649110640673518

三角函数

math.degrees(0.5*math.pi)

90.0

math.radians(180/math.pi)

1.0

a=math.radians(45/math.pi)

0.25

math.sin(a)

0.24740395925452294

math.asin(math.sin(a))

0.25

math.asin(0.2474)

0.24999591371483254

math.asin(0.24740395925452294)

0.25

math.cos(a)

0.9689124217106447

math.acos(0.9689124217106447)

0.2500000000000002

math.acos(math.cos(a))

0.2500000000000002

math.tan(a)

0.25534192122103627

math.atan(math.tan(a))

0.25

math.atan(0.25534192122103627)

0.25

random模块

import random
print('0.0<=x<1.0 random')
for i in range(0,10):
    x=random.random()
    print(x)
print('0<x<5 random')
for i in range(0,10):
    x=random.randrange(5)
    print(x)
print('05<=x<10 random')
for i in range(0,10):
    x=random.randrange(5,10)
    print(x)
print('05<=x<=10 random')
for i in range(0,10):
    x=random.randint(5,10)
    print(x)

0.0<=x<1.0 random
0.3905863037934756
0.8922407632329942
0.21352047760461534
0.5211523015401928
0.30030870435664747
0.9862984919490358
0.21171993560160762
0.6653280107488534
0.32488043176197134
0.3562099773397064
0<x<5 random
0
0
4
0
2
1
3
3
0
4
05<=x<10 random
7
8
7
6
8
5
9
8
7
7
05<=x<=10 random
5
5
8
7
7
9
9
8
7
5

datetime模块

datetime、date和time类

datetime类

import datetime

dt=datetime.datetime(2018,2,29)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Input In [37], in <cell line: 1>()
----> 1 dt=datetime.datetime(2018,2,29)


ValueError: day is out of range for month

dt=datetime.datetime(2018,2,28)

dt

datetime.datetime(2018, 2, 28, 0, 0)

dt=datetime.datetime(2018,2,28,23,60,59,10000)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Input In [40], in <cell line: 1>()
----> 1 dt=datetime.datetime(2018,2,28,23,60,59,10000)


ValueError: minute must be in 0..59

dt=datetime.datetime(2018,2,28,23,30,59,10000)

dt

datetime.datetime(2018, 2, 28, 23, 30, 59, 10000)

datetime.datetime.today()

datetime.datetime(2023, 3, 21, 18, 2, 6, 436821)

datetime.datetime.now()

datetime.datetime(2023, 3, 21, 18, 2, 32, 837270)

datetime.datetime.utcnow()

datetime.datetime(2023, 3, 21, 10, 2, 48, 100681)

datetime.datetime.fromtimestamp(999999999.999)

datetime.datetime(2001, 9, 9, 9, 46, 39, 999000)

datetime.datetime.utcfromtimestamp(999999999.999)

datetime.datetime(2001, 9, 9, 1, 46, 39, 999000)

date类

d=datetime.date(2018,2,29)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Input In [48], in <cell line: 1>()
----> 1 d=datetime.date(2018,2,29)


ValueError: day is out of range for month

d=datetime.date(2018,2,28)

datetime.date(2018, 2, 28)

datetime.date.today()

datetime.date(2023, 3, 21)

datetime.date.fromtimestamp(999999999.999)

datetime.date(2001, 9, 9)

time类

datetime.time(24,59,58,1999)

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Input In [53], in <cell line: 1>()
----> 1 datetime.time(24,59,58,1999)


ValueError: hour must be in 0..23

datetime.time(23,59,58,1999)

datetime.time(23, 59, 58, 1999)

日期时间计算

datetime.date.today()

datetime.date(2023, 3, 21)

d=datetime.date.today()

delta=datetime.timedelta(10)

d+=delta

datetime.date(2023, 3, 31)

d=datetime.date(2018,1,1)

delta=datetime.timedelta(weeks=5)

d-=delta

datetime.date(2017, 11, 27)

日期时间格式化和解析

d=datetime.datetime.today()

d.strftime('%Y-%m-%d %H:%M:%S')

'2023-03-21 18:10:33'

d.strftime('%Y-%m-%d')

'2023-03-21'

str_date='2018-02-29 10:40:26'

date=datetime.datetime.strptime(in_date,'%Y-%m-%d %H:%M:%S')

---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Input In [69], in <cell line: 1>()
----> 1 date=datetime.datetime.strptime(in_date,'%Y-%m-%d %H:%M:%S')


NameError: name 'in_date' is not defined

str_date='2018-02-28 10:40:26'

date=datetime.datetime.strptime(str_date,'%Y-%m-%d %H:%M:%S')

date

datetime.datetime(2018, 2, 28, 10, 40, 26)

date=datetime.datetime.strptime(str_date,'%Y-%m-%d')

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Input In [74], in <cell line: 1>()
----> 1 date=datetime.datetime.strptime(str_date,'%Y-%m-%d')


File E:\anaconda\lib\_strptime.py:568, in _strptime_datetime(cls, data_string, format)
    565 def _strptime_datetime(cls, data_string, format="%a %b %d %H:%M:%S %Y"):
    566     """Return a class cls instance based on the input string and the
    567     format string."""
--> 568     tt, fraction, gmtoff_fraction = _strptime(data_string, format)
    569     tzname, gmtoff = tt[-2:]
    570     args = tt[:6] + (fraction,)


File E:\anaconda\lib\_strptime.py:352, in _strptime(data_string, format)
    349     raise ValueError("time data %r does not match format %r" %
    350                      (data_string, format))
    351 if len(data_string) != found.end():
--> 352     raise ValueError("unconverted data remains: %s" %
    353                       data_string[found.end():])
    355 iso_year = year = None
    356 month = day = 1


ValueError: unconverted data remains:  10:40:26

时区

from datetime import datetime,timezone,timedelta

utc_dt=datetime(2008,8,19,23,59,59,tzinfo=timezone.utc)

utc_dt

datetime.datetime(2008, 8, 19, 23, 59, 59, tzinfo=datetime.timezone.utc)

utc_dt.strftime('%Y-%m-%d %H:%M:%S')

'2008-08-19 23:59:59'

utc_dt.strftime('%Y-%m-%d %H:%M:%S %z')

'2008-08-19 23:59:59 +0000'

bj_tz=timezone(offset=timedelta(hours=8),name='Asia/Beijing')

bj_tz

datetime.timezone(datetime.timedelta(seconds=28800), 'Asia/Beijing')

bj_dt=utc_dt.astimezone(bj_tz)

bj_dt

datetime.datetime(2008, 8, 20, 7, 59, 59, tzinfo=datetime.timezone(datetime.timedelta(seconds=28800), 'Asia/Beijing'))

bj_dt.strftime('%Y-%m-%d %H:%M:%S %Z')

'2008-08-20 07:59:59 Asia/Beijing'

bj_dt.strftime('%Y-%m-%d %H:%M:%S %z')

'2008-08-20 07:59:59 +0800'

bj_tz=timezone(timedelta(hours=8))

bj_dt=utc_dt.astimezone(bj_tz)

bj_dt.strftime('%Y-%m-%d %H:%M:%S %z')

'2008-08-20 07:59:59 +0800'

logging日志模块

日志级别

import logging
logging.basicConfig(level=logging.ERROR)

logging.debug('this is debug')
logging.info('this is info')
logging.warning('this is warning')
logging.error('this is error')
logging.critical('this is critical')

2023-03-21 20:15:10,230-MainThread-root-<cell line: 5>-INFO-this is info
2023-03-21 20:15:10,246-MainThread-root-<cell line: 6>-WARNING-this is warning
2023-03-21 20:15:10,247-MainThread-root-<cell line: 7>-ERROR-this is error
2023-03-21 20:15:10,248-MainThread-root-<cell line: 8>-CRITICAL-this is critical

import logging
logging.basicConfig(level=logging.DEBUG)
logger=logging.getLogger(__name__)

logger.debug('this is debug')
logger.info('this is info')
logger.warning('this is warning')
logger.error('this is error')
logger.critical('this is critical')

2023-03-21 20:14:59,694-MainThread-__main__-<cell line: 6>-INFO-this is info
2023-03-21 20:14:59,694-MainThread-__main__-<cell line: 7>-WARNING-this is warning
2023-03-21 20:14:59,698-MainThread-__main__-<cell line: 8>-ERROR-this is error
2023-03-21 20:14:59,700-MainThread-__main__-<cell line: 9>-CRITICAL-this is critical

日志信息格式化

import logging
logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s-%(threadName)s-'
                    '%(name)s-%(funcName)s-%(levelname)s-%(message)s')
logger=logging.getLogger(__name__)

logger.debug('this is debug')
logger.info('this is info')
logger.warning('this is warning')
logger.error('this is error')
logger.critical('this is critical')

def funlog():
    logger.info('enter funlog')
    
logger.info('use funlog')
funlog()

2023-03-21 20:14:51,110-MainThread-__main__-<cell line: 8>-INFO-this is info
2023-03-21 20:14:51,120-MainThread-__main__-<cell line: 9>-WARNING-this is warning
2023-03-21 20:14:51,122-MainThread-__main__-<cell line: 10>-ERROR-this is error
2023-03-21 20:14:51,123-MainThread-__main__-<cell line: 11>-CRITICAL-this is critical
2023-03-21 20:14:51,124-MainThread-__main__-<cell line: 16>-INFO-use funlog
2023-03-21 20:14:51,124-MainThread-__main__-funlog-INFO-enter funlog

日志重定位

import logging
logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s-%(threadName)s-'
                    '%(name)s-%(funcName)s-%(levelname)s-%(message)s')
logger=logging.getLogger(__name__)

logger.debug('this is debug')
logger.info('this is info')
logger.warning('this is warning')
logger.error('this is error')
logger.critical('this is critical')

def funlog():
    logger.info('enter funlog')
    
logger.info('use funlog')
funlog()

2023-03-21 20:17:26,157-MainThread-__main__-<cell line: 8>-INFO-this is info
2023-03-21 20:17:26,165-MainThread-__main__-<cell line: 9>-WARNING-this is warning
2023-03-21 20:17:26,166-MainThread-__main__-<cell line: 10>-ERROR-this is error
2023-03-21 20:17:26,167-MainThread-__main__-<cell line: 11>-CRITICAL-this is critical
2023-03-21 20:17:26,169-MainThread-__main__-<cell line: 16>-INFO-use funlog
2023-03-21 20:17:26,171-MainThread-__main__-funlog-INFO-enter funlog

使用配置文件

import logging
import logging.config

logging.config.fileConfig("logger.conf")
logger=logging.getLogger('loggerl')

logger.debug('this is debug')
logger.info('this is info')
logger.warning('this is warning')
logger.error('this is error')
logger.critical('this is critical')

def funlog():
    logger.info('enter funlog')
logger.info('use funlog')
funlog()

this is debug
this is info
this is warning
this is error
this is critical
use funlog
enter funlog

正则表达式

正则表达式字符串

普通字符
元字符

元字符

字符转义

开始与结束字符

import re

p1 = r'\w+@zhijieketang\.com'
p2 = r'^\w+@zhijieketang\.com$'

text = "Tony's email is tony_guan588@zhijieketang.com."
m = re.search(p1, text)
print(m)  ## 匹配
m = re.search(p2, text)
print(m)  ## 不匹配

email = 'tony_guan588@zhijieketang.com'
m = re.search(p2, email)
print(m)

<re.Match object; span=(16, 45), match='tony_guan588@zhijieketang.com'>
None
<re.Match object; span=(0, 29), match='tony_guan588@zhijieketang.com'>

字符类

定义字符类

import re

p = r'[Jj]ava'
## p = r'Java|java|JAVA'

m = re.search(p, 'I like Java and Python.')
print(m)  ## 匹配

m = re.search(p, 'I like JAVA and Python.')
print(m)  ## 不匹配

m = re.search(p, 'I like java and Python.')
print(m)  ## 匹配

<re.Match object; span=(7, 11), match='Java'>
None
<re.Match object; span=(7, 11), match='java'>

字符类取反

import re

p = r'[^0123456789]'

m = re.search(p, '1000')
print(m)  ## 不匹配

m = re.search(p, 'Python 3')
print(m)  ## 匹配

None
<re.Match object; span=(0, 1), match='P'>

区间

import re

m = re.search(r'[A-Za-z0-9]', 'A10.3')
print(m)  ## 匹配

m = re.search(r'[0-25-7]', 'A3489C')
print(m)  ## 不匹配

<re.Match object; span=(0, 1), match='A'>
None

预定义字符类

import re

## p = r'[^0123456789]'
p = r'\D'

m = re.search(p, '1000')
print(m)  ## 不匹配

m = re.search(p, 'Python 3')
print(m)  ## 匹配

text = '你们好Hello'
m = re.search(r'\w', text)
print(m)  ## 匹配

None
<re.Match object; span=(0, 1), match='P'>
<re.Match object; span=(0, 1), match='你'>

量词

量词的使用

import re

m = re.search(r'\d?', '87654321')  ## 出现数字一次
print(m)  ## 匹配字符'8'

m = re.search(r'\d?', 'ABC')  ## 出现数字零次
print(m)  ## 匹配字符''

m = re.search(r'\d*', '87654321')  ## 出现数字多次
print(m)  ## 匹配字符'87654321'

m = re.search(r'\d*', 'ABC')  ## 出现数字零次
print(m)  ## 匹配字符''

m = re.search(r'\d+', '87654321')  ## 出现数字多次
print(m)  ## 匹配字符'87654321'

m = re.search(r'\d+', 'ABC')
print(m)  ## 不匹配

m = re.search(r'\d{8}', '87654321')  ## 出现数字8次
print('8765432', m)  ## 匹配字符'87654321'

m = re.search(r'\d{8}', 'ABC')
print(m)  ## 不匹配

m = re.search(r'\d{7,8}', '87654321')  ## 出现数字8次
print(m)  ## 匹配字符'87654321'

m = re.search(r'\d{9,}', '87654321')
print(m)  ## 不匹配

<re.Match object; span=(0, 1), match='8'>
<re.Match object; span=(0, 0), match=''>
<re.Match object; span=(0, 8), match='87654321'>
<re.Match object; span=(0, 0), match=''>
<re.Match object; span=(0, 8), match='87654321'>
None
8765432 <re.Match object; span=(0, 8), match='87654321'>
None
<re.Match object; span=(0, 8), match='87654321'>
None

贪婪量词和懒惰量词

import re

## 使用贪婪量词
m = re.search(r'\d{5,8}', '87654321')  ## 出现数字8次
print(m)  ## 匹配字符'87654321'

## 使用惰性量词
m = re.search(r'\d{5,8}?', '87654321')  ## 出现数字5次
print(m)  ## 匹配字符'87654'

<re.Match object; span=(0, 8), match='87654321'>
<re.Match object; span=(0, 5), match='87654'>

分组

分组的使用

import re

p = r'(121){2}'
m = re.search(p, '121121abcabc')
print(m)  ## 匹配
print(m.group())  ## 返回匹配字符串
print(m.group(1))  ## 获得第一组内容

p = r'(\d{3,4})-(\d{7,8})'
m = re.search(p, '010-87654321')
print(m)  ## 匹配
print(m.group())  ## 返回匹配字符串
print(m.groups())  ## 获得所有组内容

<re.Match object; span=(0, 6), match='121121'>
121121
121
<re.Match object; span=(0, 12), match='010-87654321'>
010-87654321
('010', '87654321')

分组命名

import re

p = r'(?P<area_code>\d{3,4})-(?P<phone_code>\d{7,8})'
m = re.search(p, '010-87654321')
print(m)  ## 匹配
print(m.group())  ## 返回匹配字符串
print(m.groups())  ## 获得所有组内容

## 通过组编号返回组内容
print(m.group(1))
print(m.group(2))

## 通过组名返回组内容
print(m.group('area_code'))
print(m.group('phone_code'))

<re.Match object; span=(0, 12), match='010-87654321'>
010-87654321
('010', '87654321')
010
87654321
010
87654321

反向引用分组

import re

## p = r'<([\w]+)>.*</([\w]+)>'
p = r'<([\w]+)>.*</\1>'  ## 使用反向引用

m = re.search(p, '<a>abc</a>')
print(m)  ## 匹配

m = re.search(p, '<a>abc</b>')
print(m)  ## 不匹配

<re.Match object; span=(0, 10), match='<a>abc</a>'>
None

非捕获分组

import re

s = 'img1.jpg,img2.jpg,img3.bmp'

#捕获分组
p = r'\w+(\.jpg)'
mlist = re.findall(p, s)
print(mlist)

#非捕获分组
p = r'\w+(?:\.jpg)'
mlist = re.findall(p, s)
print(mlist)

['.jpg', '.jpg']
['img1.jpg', 'img2.jpg']

re模块

search（）和match（）函数

import re

p = r'\w+@zhijieketang\.com'

text = "Tony's email is tony_guan588@zhijieketang.com."
m = re.search(p, text)
print(m)  ## 匹配

m = re.match(p, text)
print(m)  ## 不匹配

email = 'tony_guan588@zhijieketang.com'
m = re.search(p, email)
print(m)  ## 匹配

m = re.match(p, email)
print(m)  ## 匹配

## match对象几个方法
print('match对象几个方法:')
print(m.group())
print(m.start())
print(m.end())
print(m.span())

<re.Match object; span=(16, 45), match='tony_guan588@zhijieketang.com'>
None
<re.Match object; span=(0, 29), match='tony_guan588@zhijieketang.com'>
<re.Match object; span=(0, 29), match='tony_guan588@zhijieketang.com'>
match对象几个方法:
tony_guan588@zhijieketang.com
0
29
(0, 29)

findall()和finditer（）函数

import re

p = r'[Jj]ava'
text = 'I like Java and java.'

match_list = re.findall(p, text)
print(match_list)  ## 匹配

match_iter = re.finditer(p, text)
for m in match_iter:
    print(m.group())

['Java', 'java']
Java
java

字符串分割

import re

p = r'\d+'
text = 'AB12CD34EF'

clist = re.split(p, text)
print(clist)

clist = re.split(p, text, maxsplit=1)
print(clist)

clist = re.split(p, text, maxsplit=2)
print(clist)

['AB', 'CD', 'EF']
['AB', 'CD34EF']
['AB', 'CD', 'EF']

字符串替换

import re

p = r'\d+'
text = 'AB12CD34EF'

repace_text = re.sub(p, ' ', text)
print(repace_text)

repace_text = re.sub(p, ' ', text, count=1)
print(repace_text)

repace_text = re.sub(p, ' ', text, count=2)
print(repace_text)

AB CD EF
AB CD34EF
AB CD EF

编译正则表达式

re.compile(pattern[,flags=0])

已编译正则表达式对象

import re

p = r'\w+@zhijieketang\.com'
regex = re.compile(p)

text = "Tony's email is tony_guan588@zhijieketang.com."
m = regex.search(text)
print(m)  ## 匹配

m = regex.match(text)
print(m)  ## 不匹配

p = r'[Jj]ava'
regex = re.compile(p)
text = 'I like Java and java.'

match_list = regex.findall(text)
print(match_list)  ## 匹配

match_iter = regex.finditer(text)
for m in match_iter:
    print(m.group())

p = r'\d+'
regex = re.compile(p)
text = 'AB12CD34EF'

clist = regex.split(text)
print(clist)

repace_text = regex.sub(' ', text)
print(repace_text)

<re.Match object; span=(16, 45), match='tony_guan588@zhijieketang.com'>
None
['Java', 'java']
Java
java
['AB', 'CD', 'EF']
AB CD EF

编译标志

ASCII和Unicode

import re

text = '你们好Hello'

p = r'\w+'
regex = re.compile(p, re.U)

m = regex.search(text)
print(m)  ## 匹配

m = regex.match(text)
print(m)  ## 匹配

regex = re.compile(p, re.A)

m = regex.search(text)
print(m)  ## 匹配

m = regex.match(text)
print(m)  ## 不匹配

<re.Match object; span=(0, 8), match='你们好Hello'>
<re.Match object; span=(0, 8), match='你们好Hello'>
<re.Match object; span=(3, 8), match='Hello'>
None

忽略大小写

import re

p = r'(java).*(python)'
regex = re.compile(p, re.I)

m = regex.search('I like Java and Python.')
print(m)  ## 匹配

m = regex.search('I like JAVA and Python.')
print(m)  ## 匹配

m = regex.search('I like java and Python.')
print(m)  ## 匹配

<re.Match object; span=(7, 22), match='Java and Python'>
<re.Match object; span=(7, 22), match='JAVA and Python'>
<re.Match object; span=(7, 22), match='java and Python'>

点元字符匹配换行符

import re

p = r'.+'
regex = re.compile(p)

m = regex.search('Hello\nWorld.')
print(m)  ## 匹配

regex = re.compile(p, re.DOTALL)

m = regex.search('Hello\nWorld.')
print(m)  ## 匹配

<re.Match object; span=(0, 5), match='Hello'>
<re.Match object; span=(0, 12), match='Hello\nWorld.'>

多行模式

import re

p = r'^World'
regex = re.compile(p)

m = regex.search('Hello\nWorld.')
print(m)  ## 不匹配

regex = re.compile(p, re.M)

m = regex.search('Hello\nWorld.')
print(m)  ## 匹配

None
<re.Match object; span=(6, 11), match='World'>

详细模式

import re

p = """(java)     #匹配java字符串
        .*        #匹配任意字符零或多个
        (python)  #匹配python字符串
    """

regex = re.compile(p, re.I | re.VERBOSE)

m = regex.search('I like Java and Python.')
print(m)  ## 匹配

m = regex.search('I like JAVA and Python.')
print(m)  ## 匹配

m = regex.search('I like java and Python.')
print(m)  ## 匹配

<re.Match object; span=(7, 22), match='Java and Python'>
<re.Match object; span=(7, 22), match='JAVA and Python'>
<re.Match object; span=(7, 22), match='java and Python'>

数据交换格式

CSV数据交换格式

reader（）函数

import csv

with open('data/books.csv', 'r',  encoding='gbk') as rf:
    reader = csv.reader(rf, dialect=csv.excel)
    for row in reader:
        print('|'.join(row))

1|软件工程|戴国强|机械工业出版社|19980528|2
2|汇编语言|李利光|北京大学出版社|19980318|2
3|计算机基础|王飞|经济科学出版社|19980218|1
4|FLASH精选|刘扬|中国纺织出版社|19990312|2
5|java基础|王一|电子工业出版社|19990528|3
6|世界杯|柳飞|世界出版社|19990412|2
7|JAVA程序设计|张余|人民邮电出版社|19990613|1
8|新概念3|余智|外语出版社|19990723|2
9|军事要闻|张强|解放军出版社|19990722|3
10|大众生活|许阳|电子出版社|19990819|3
11|南方旅游|王爱国|南方出版社|19990930|2
13|幽灵|钱力华|华光出版社|19991008|1
14|期货分析|孙宝|飞鸟出版社|19991122|3
15|人工智能|周未|机械工业出版社|19991223|3
16|数据库系统概念|吴红|机械工业出版社|20000328|3
17|计算机理论基础|戴家|机械工业出版社|20000218|4
18|编译原理|郑键|机械工业出版社|20000415|2
19|通讯与网络|欧阳杰|机械工业出版社|20000517|1
20|现代操作系统|王小国|机械工业出版社|20010128|1
21|网络基础|王大尉|北京大学出版社|20000617|1
22|万紫千红|丛丽|北京大学出版社|20000702|3
23|经济概论|思佳|北京大学出版社|20000819|3
24|经济与科学|毛波|经济科学出版社|20000923|2
25|计算机体系结构|方丹|机械工业出版社|20000328|4
26|软件工程|牛田|经济科学出版社|20000328|4
27|世界语言大观|候丙辉|经济科学出版社|20000814|2
28|高级语言程序设计|寇国华|清华大学出版社|20000117|3
29|操作系统概论|聂元名|清华大学出版社|20001028|1
30|数据库及应用|孙家萧|清华大学出版社|20000619|1
31|软件工程|戴志名|电子工业出版社|20000324|3
32|SOL使用手册|贺民|电子工业出版社|19990425|2
33|模拟电路|邓英才|电子工业出版社|20000527|2
34|集邮爱好者|李云|人民邮电出版社|20000630|1
36|高等数学|李放|人民邮电出版社|20000812|1
37|南方周末|邓光明|南方出版社|20000923|3
38|十大旅游胜地|潭晓明|南方出版社|20000403|2
39|黑幕|李仪|华光出版社|20000508|24

writer()函数

import csv

with open('data/books.csv', 'r', encoding='gbk') as rf:
    reader = csv.reader(rf)
    with open('data/books2.csv', 'w', newline='', encoding='gbk') as wf:
        writer = csv.writer(wf, delimiter='\t')
        for row in reader:
            print('|'.join(row))
            writer.writerow(row)

1|软件工程|戴国强|机械工业出版社|19980528|2
2|汇编语言|李利光|北京大学出版社|19980318|2
3|计算机基础|王飞|经济科学出版社|19980218|1
4|FLASH精选|刘扬|中国纺织出版社|19990312|2
5|java基础|王一|电子工业出版社|19990528|3
6|世界杯|柳飞|世界出版社|19990412|2
7|JAVA程序设计|张余|人民邮电出版社|19990613|1
8|新概念3|余智|外语出版社|19990723|2
9|军事要闻|张强|解放军出版社|19990722|3
10|大众生活|许阳|电子出版社|19990819|3
11|南方旅游|王爱国|南方出版社|19990930|2
13|幽灵|钱力华|华光出版社|19991008|1
14|期货分析|孙宝|飞鸟出版社|19991122|3
15|人工智能|周未|机械工业出版社|19991223|3
16|数据库系统概念|吴红|机械工业出版社|20000328|3
17|计算机理论基础|戴家|机械工业出版社|20000218|4
18|编译原理|郑键|机械工业出版社|20000415|2
19|通讯与网络|欧阳杰|机械工业出版社|20000517|1
20|现代操作系统|王小国|机械工业出版社|20010128|1
21|网络基础|王大尉|北京大学出版社|20000617|1
22|万紫千红|丛丽|北京大学出版社|20000702|3
23|经济概论|思佳|北京大学出版社|20000819|3
24|经济与科学|毛波|经济科学出版社|20000923|2
25|计算机体系结构|方丹|机械工业出版社|20000328|4
26|软件工程|牛田|经济科学出版社|20000328|4
27|世界语言大观|候丙辉|经济科学出版社|20000814|2
28|高级语言程序设计|寇国华|清华大学出版社|20000117|3
29|操作系统概论|聂元名|清华大学出版社|20001028|1
30|数据库及应用|孙家萧|清华大学出版社|20000619|1
31|软件工程|戴志名|电子工业出版社|20000324|3
32|SOL使用手册|贺民|电子工业出版社|19990425|2
33|模拟电路|邓英才|电子工业出版社|20000527|2
34|集邮爱好者|李云|人民邮电出版社|20000630|1
36|高等数学|李放|人民邮电出版社|20000812|1
37|南方周末|邓光明|南方出版社|20000923|3
38|十大旅游胜地|潭晓明|南方出版社|20000403|2
39|黑幕|李仪|华光出版社|20000508|24

XML数据交换格式

XML文档结构

声明
根元素
子元素
属性
命名空间
限定名

解析XML文档

import xml.etree.ElementTree as ET

tree = ET.parse('data1/Notes.xml')  ## 创建XML文档树
print(type(tree))  ## xml.etree.ElementTree.ElementTree

root = tree.getroot()  ## root是根元素
print(type(root))  ## xml.etree.ElementTree.Element
print(root.tag)  ## Notes

for index, child in enumerate(root):
    print('第{0}个{1}元素，属性：{2}'.format(index, child.tag, child.attrib))
    for i, child_child in enumerate(child):
        print('    标签：{0}，内容：{1}'.format(child_child.tag, child_child.text))

<class 'xml.etree.ElementTree.ElementTree'>
<class 'xml.etree.ElementTree.Element'>
Notes
第0个Note元素，属性：{'id': '1'}
    标签：CDate，内容：2018-3-21
    标签：Content，内容：发布Python0
    标签：UserID，内容：tony
第1个Note元素，属性：{'id': '2'}
    标签：CDate，内容：2018-3-22
    标签：Content，内容：发布Python1
    标签：UserID，内容：tony
第2个Note元素，属性：{'id': '3'}
    标签：CDate，内容：2018-3-23
    标签：Content，内容：发布Python2
    标签：UserID，内容：tony
第3个Note元素，属性：{'id': '4'}
    标签：CDate，内容：2018-3-24
    标签：Content，内容：发布Python3
    标签：UserID，内容：tony
第4个Note元素，属性：{'id': '5'}
    标签：CDate，内容：2018-3-25
    标签：Content，内容：发布Python4
    标签：UserID，内容：tony

XPath

find(match,namespace=None)
findall(match,namespace=None)
findtext(match,default=None,namespace=None)

import xml.etree.ElementTree as ET

tree = ET.parse('data1/Notes.xml')
root = tree.getroot()

node = root.find("./Note")  ## 当前节点下的第一个Note子节点
print(node.tag, node.attrib)
node = root.find("./Note/CDate")  ## Note子节点下的第一个CDate节点
print(node.text)
node = root.find("./Note/CDate/..")  ## Note节点
print(node.tag, node.attrib)
node = root.find(".//CDate")  ## 当前节点查找所有后代节点中第一个CDate节点
print(node.text)

node = root.find("./Note[@id]")  ## 具有id属性Note节点
print(node.tag, node.attrib)

node = root.find("./Note[@id='2']")  ## id属性等于'2'的Note节点
print(node.tag, node.attrib)

node = root.find("./Note[2]")  ## 第二个Note节点
print(node.tag, node.attrib)

node = root.find("./Note[last()]")  ## 最后一个Note节点
print(node.tag, node.attrib)

node = root.find("./Note[last()-2]")  ## 倒数第三个Note节点
print(node.tag, node.attrib)

Note {'id': '1'}
2018-3-21
Note {'id': '1'}
2018-3-21
Note {'id': '1'}
Note {'id': '2'}
Note {'id': '2'}
Note {'id': '5'}
Note {'id': '3'}

JSON数据交换格式

JSON文档结构

JSON数据编码

import json

## 准备数据
py_dict = {'name': 'tony', 'age': 30, 'sex': True}  ## 创建字典对象
py_list = [1, 3]  ## 创建列表对象
py_tuple = ('A', 'B', 'C')  ## 创建元组对象

py_dict['a'] = py_list  ## 添加列表到字典中
py_dict['b'] = py_tuple  ## 添加元组到字典中

print(py_dict)
print(type(py_dict))  ## <class 'dict'>

## 编码过程
json_obj = json.dumps(py_dict)
print(json_obj)
print(type(json_obj))  ## <class 'str'>

## 编码过程
json_obj = json.dumps(py_dict, indent=4)
## 输出格式化后的字符串
print(json_obj)

## 写入JSON数据到data1.json文件
with open('data2/data1.json', 'w') as f:
    json.dump(py_dict, f)

## 写入JSON数据到data2.json文件
with open('data2/data2.json', 'w') as f:
    json.dump(py_dict, f, indent=4)

{'name': 'tony', 'age': 30, 'sex': True, 'a': [1, 3], 'b': ('A', 'B', 'C')}
<class 'dict'>
{"name": "tony", "age": 30, "sex": true, "a": [1, 3], "b": ["A", "B", "C"]}
<class 'str'>
{
    "name": "tony",
    "age": 30,
    "sex": true,
    "a": [
        1,
        3
    ],
    "b": [
        "A",
        "B",
        "C"
    ]
}

JSON数据解码

import json

## 准备数据
json_obj = r'{"name": "tony", "age": 30, "sex": true, "a": [1, 3], "b": ["A", "B", "C"]}'
#json_obj = "{'name': 'tony', 'age': 30, 'sex': true, 'a': [1, 3], 'b': ['A', 'B', 'C']}"

py_dict = json.loads(json_obj)
print(type(py_dict))  ## <class 'dict'>
print(py_dict['name'])
print(py_dict['age'])
print(py_dict['sex'])

py_lista = py_dict['a']  ## 取出列表对象
print(py_lista)
py_listb = py_dict['b']  ## 取出列表对象
print(py_listb)

## 读取JSON数据到data2.json文件
with open('data2/data2.json', 'r') as f:
    data = json.load(f)
    print(data)
    print(type(data))  ## <class 'dict'>

<class 'dict'>
tony
30
True
[1, 3]
['A', 'B', 'C']
{'name': 'tony', 'age': 30, 'sex': True, 'a': [1, 3], 'b': ['A', 'B', 'C']}
<class 'dict'>

配置文件

配置文件结构

读取配置文件

import configparser

config = configparser.ConfigParser()  ## 创建配置解析器对象

config.read('data3/Setup.ini', encoding='utf-8')  ## 读取并解析配置文件

print(config.sections())  ## 返回所有的节

section1 = config['Startup']  ## 返回Startup节
print(config.options('Startup'))

print(section1['RequireOS'])
print(section1['RequireIE'])

print(config['Product']['msi'])

print(config['Windows 2000']['MajorVersion'])  ## 返回MajorVersion数据
print(config['Windows 2000']['ServicePackMajor'])

value = config.get('Windows 2000', 'MajorVersion')  ## 返回MajorVersion数据
print(type(value))  ## <class 'str'>

value = config.getint('Windows 2000', 'MajorVersion')  ## 返回MajorVersion数据
print(type(value))  ## <class 'int'>

['Startup', 'Product', 'Windows 2000']
['requireos', 'requiremsi', 'requireie']
Windows 2000
6.0.2600.0
AcroRead.msi
5
4
<class 'str'>
<class 'int'>

写入配置文件

import configparser

config = configparser.ConfigParser()  ## 创建配置解析器对象

config.read('data3/Setup.ini', encoding='utf-8')  ## 读取并解析配置文件

## 写入配置文件
config['Startup']['RequireMSI'] = '8.0'
config['Product']['RequireMSI'] = '4.0'

config.add_section('Section2')   #添加节
config.set('Section2', 'name', 'Mac')   #添加配置项

with open('data3/Setup.ini', 'w') as fw:
    config.write(fw)

数据库编程

数据持久化技术概述

文本文件
数据库

MySQL数据库管理系统

Python DB-API

建立数据连接

创建游标

案例：MySQL数据库CURD操作

安装PyMySQL模块

数据查询操作

有条件查询实现代码

import pymysql

## 1. 建立数据库连接
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='986370165',
                             database='MyDB',
                             charset='utf8')

try:
    ## 2. 创建游标对象
    with connection.cursor() as cursor:

        ## 3. 执行SQL操作
        ## sql = 'select name, userid from user where userid >%s'
        ## cursor.execute(sql, [0])
        sql = 'select name, userid from user where userid >%(id)s'
        cursor.execute(sql, {'id': 0})

        ## 4. 提取结果集
        result_set = cursor.fetchall()

        for row in result_set:
            print('id：{0} - name：{1}'.format(row[1], row[0]))

    ## with代码块结束 5. 关闭游标

finally:
    ## 6. 关闭数据连接
    connection.close()

id：1 - name：Tom
id：2 - name：Ben

无条件查询实现代码

import pymysql

## 1. 建立数据库连接
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='986370165',
                             database='MyDB',
                             charset='utf8')

try:
    ## 2. 创建游标对象
    with connection.cursor() as cursor:

        ## 3. 执行SQL操作
        sql = 'select max(userid) from user'
        cursor.execute(sql)

        ## 4. 提取结果集
        row = cursor.fetchone()

        if row is not None:
            print('最大用户Id ：{0}'.format(row[0]))

    ## with代码块结束 5. 关闭游标

finally:
    ## 6. 关闭数据连接
    connection.close()

最大用户Id ：2

数据修改操作

数据插入

import pymysql


## 查询最大用户Id
def read_max_userid():
    ## 1. 建立数据库连接
    connection = pymysql.connect(host='localhost',
                                 user='root',
                                 password='986370165',
                                 database='MyDB',
                                 charset='utf8')

    try:
        ## 2. 创建游标对象
        with connection.cursor() as cursor:

            ## 3. 执行SQL操作
            sql = 'select max(userid) from user'
            cursor.execute(sql)

            ## 4. 提取结果集
            row = cursor.fetchone()

            if row is not None:
                print('最大用户Id ：{0}'.format(row[0]))
                return row[0]

        ## with代码块结束 5. 关闭游标

    finally:
        ## 6. 关闭数据连接
        connection.close()


## 1. 建立数据库连接
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='986370165',
                             database='MyDB',
                             charset='utf8')

## 查询最大值
maxid = read_max_userid()

try:
    ## 2. 创建游标对象
    with connection.cursor() as cursor:

        ## 3. 执行SQL操作
        sql = 'insert into user (userid, name) values (%s,%s)'
        nextid = maxid + 1
        name = 'Tony' + str(nextid)
        affectedcount = cursor.execute(sql, (nextid, name))

        print('影响的数据行数：{0}'.format(affectedcount))
        ## 4. 提交数据库事物
        connection.commit()

    ## with代码块结束 5. 关闭游标

except pymysql.DatabaseError:
    ## 4. 回滚数据库事物
    connection.rollback()
finally:
    ## 6. 关闭数据连接
    connection.close()

最大用户Id ：2
影响的数据行数：1

数据更新

import pymysql

## 1. 建立数据库连接
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='986370165',
                             database='MyDB',
                             charset='utf8')

try:
    ## 2. 创建游标对象
    with connection.cursor() as cursor:

        ## 3. 执行SQL操作
        sql = 'update user set name = %s where userid > %s'
        affectedcount = cursor.execute(sql, ('Tom', 2))

        print('影响的数据行数：{0}'.format(affectedcount))
        ## 4. 提交数据库事物
        connection.commit()

    ## with代码块结束 5. 关闭游标

except pymysql.DatabaseError as e:
    ## 4. 回滚数据库事物
    connection.rollback()
    print(e)
finally:
    ## 6. 关闭数据连接
    connection.close()

影响的数据行数：1

数据删除

import pymysql


## 查询最大用户Id
def read_max_userid():
    ## 1. 建立数据库连接
    connection = pymysql.connect(host='localhost',
                                 user='root',
                                 password='986370165',
                                 database='MyDB',
                                 charset='utf8')

    try:
        ## 2. 创建游标对象
        with connection.cursor() as cursor:

            ## 3. 执行SQL操作
            sql = 'select max(userid) from user'
            cursor.execute(sql)

            ## 4. 提取结果集
            row = cursor.fetchone()

            if row is not None:
                print('最大用户Id ：{0}'.format(row[0]))
                return row[0]

        ## with代码块结束 5. 关闭游标

    finally:
        ## 6. 关闭数据连接
        connection.close()


## 1. 建立数据库连接
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='986370165',
                             database='MyDB',
                             charset='utf8')

## 查询最大值
maxid = read_max_userid()

try:
    ## 2. 创建游标对象
    with connection.cursor() as cursor:

        ## 3. 执行SQL操作
        sql = 'delete from user where userid = %s'
        affectedcount = cursor.execute(sql, (maxid))

        print('影响的数据行数：{0}'.format(affectedcount))
        ## 4. 提交数据库事物
        connection.commit()

    ## with代码块结束 5. 关闭游标

except pymysql.DatabaseError:
    ## 4. 回滚数据库事物
    connection.rollback()
finally:
    ## 6. 关闭数据连接
    connection.close()

最大用户Id ：3
影响的数据行数：1

NoSQL数据存储

dbm数据库的打开和关闭

dbm.open(file,flag=’r’)

‘r’,’w’,’c’,’n’

with dbm.open(file,’c’) as db:

pass

dbm数据存储

import dbm

with dbm.open('mydb', 'c') as db:
    db['name'] = 'tony'  ## 更新数据
    print(db['name'].decode())  ## 取出数据

    age = int(db.get('age', b'18').decode())  ## 取出数据
    print(age)

    if 'age' in db:  ## 判断是否存在age数据
        db['age'] = '20'  ## 或者 b'20'

    del db['name']  ## 删除name数据

tony
18

wxPython图形用户界面编程

Python图形用户界面开发工具包

Tkinter
PyQt
wxPython

wxPython安装

wxPython基础

窗口
控件
事件处理
布局管理

wxPython类层次结构

第一个wxPython程序

import wx

## 创建应用程序对象
app = wx.App()
## 创建窗口对象
frm = wx.Frame(None, title="第一个GUI程序!", size=(400, 300), pos=(100, 100))

frm.Show()  ## 显示窗口

app.MainLoop()  ## 进入主事件循环

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title="第一个GUI程序!", size=(400, 300), pos=(100, 100))
        ## TODO


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True

    def OnExit(self):
        print('应用程序退出')
        return 0


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

应用程序退出

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title="第一个GUI程序!", size=(400, 300))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(parent=self)
        statictext = wx.StaticText(parent=panel, label='Hello World!', pos=(10, 10))


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

wxPython界面构建层次结构

事件处理

一对一事件处理

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='一对一事件处理', size=(300, 180))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(parent=self)
        self.statictext = wx.StaticText(parent=panel, pos=(110, 20))
        b = wx.Button(parent=panel, label='OK', pos=(100, 50))
        self.Bind(wx.EVT_BUTTON, self.on_click, b)

    def on_click(self, event):
        print(type(event))  ## <class 'wx._core.CommandEvent'>
        self.statictext.SetLabelText('Hello, world.')


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

<class 'wx._core.CommandEvent'>
<class 'wx._core.CommandEvent'>
<class 'wx._core.CommandEvent'>

一对多事件处理

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='一对多事件处理', size=(300, 180))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(parent=self)
        self.statictext = wx.StaticText(parent=panel, pos=(110, 15))
        b1 = wx.Button(parent=panel, id=10, label='Button1', pos=(100, 45))
        b2 = wx.Button(parent=panel, id=11, label='Button2', pos=(100, 85))
        ## self.Bind(wx.EVT_BUTTON, self.on_click, b1)
        ## self.Bind(wx.EVT_BUTTON, self.on_click, id=11)
        self.Bind(wx.EVT_BUTTON, self.on_click, id=10, id2=20)

    def on_click(self, event):
        event_id = event.GetId()
        print(event_id)
        if event_id == 10:
            self.statictext.SetLabelText('Button1单击')
        else:
            self.statictext.SetLabelText('Button2单击')


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

示例：鼠标事件处理

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title="鼠标事件处理", size=(400, 300))
        self.Centre()  ## 设置窗口居中
        self.Bind(wx.EVT_LEFT_DOWN, self.on_left_down)
        self.Bind(wx.EVT_LEFT_UP, self.on_left_up)
        self.Bind(wx.EVT_MOTION, self.on_mouse_move)

    def on_left_down(self, evt):
        print('鼠标按下')

    def on_left_up(self, evt):
        print('鼠标释放')

    def on_mouse_move(self, event):
        if event.Dragging() and event.LeftIsDown():
            pos = event.GetPosition()
            print(pos)


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

鼠标按下
鼠标释放
鼠标按下
鼠标释放
鼠标按下
鼠标释放
鼠标按下
(129, 99)
鼠标释放
鼠标按下
鼠标释放
鼠标按下
鼠标释放
鼠标按下
(58, 114)
(60, 115)
(61, 116)
(62, 117)
(63, 117)
(64, 117)
(66, 118)
(67, 119)
(69, 119)
(73, 119)
(79, 119)
鼠标释放
鼠标按下
鼠标释放
鼠标按下
(81, 169)
(75, 170)
(72, 170)
(68, 171)
(65, 171)
(63, 171)
(61, 171)
(60, 171)
鼠标释放
鼠标按下
鼠标释放
鼠标按下
鼠标释放
鼠标按下
(201, 55)
(202, 55)
(204, 57)
(206, 59)
(208, 61)
(211, 63)
(214, 65)
(217, 68)
(221, 71)
(224, 74)
(228, 77)
(232, 80)
(235, 83)
(239, 85)
(241, 87)
(243, 88)
(245, 89)
(246, 91)
(249, 92)
(251, 92)
(252, 93)
(253, 93)
(254, 93)
(255, 93)
(256, 93)
(257, 94)
(259, 94)
(260, 94)
(261, 94)
(262, 94)
(264, 94)
(265, 94)
(266, 94)
(267, 94)
(269, 94)
(270, 94)
(272, 94)
(273, 94)
(275, 94)
(276, 94)
(277, 94)
(277, 93)
(278, 92)
(279, 91)
(279, 90)
(279, 89)
(279, 88)
(279, 87)
(280, 86)
(280, 85)
(280, 84)
(280, 83)
(279, 83)
(278, 84)
(277, 85)
(274, 87)
(272, 88)
(268, 91)
(264, 94)
(259, 97)
(253, 102)
(247, 107)
(240, 111)
(233, 116)
(227, 120)
(222, 123)
(219, 125)
(215, 128)
(211, 131)
(207, 133)
(201, 135)
(197, 137)
(194, 138)
(190, 139)
(186, 140)
(184, 141)
(180, 141)
(177, 141)
(175, 141)
(171, 141)
(169, 141)
(166, 140)
(162, 139)
(158, 137)
(154, 135)
(153, 133)
(149, 131)
(143, 127)
(138, 123)
(133, 120)
(129, 116)
(125, 113)
(121, 108)
(117, 104)
(114, 100)
(112, 97)
(111, 94)
(108, 88)
(106, 84)
(105, 80)
(105, 77)
(105, 73)
(105, 70)
(106, 67)
(107, 63)
(108, 61)
(110, 58)
(112, 55)
(114, 53)
(116, 51)
(119, 48)
(122, 46)
(125, 44)
(128, 43)
(132, 41)
(135, 40)
(140, 39)
(145, 38)
(150, 38)
(155, 37)
(161, 37)
(166, 37)
(171, 37)
(175, 37)
(179, 37)
(181, 37)
(185, 38)
(189, 40)
(191, 41)
(194, 43)
(197, 45)
(200, 47)
(202, 48)
(205, 50)
(208, 52)
(209, 55)
(212, 57)
(214, 59)
(216, 62)
(217, 65)
(219, 67)
(221, 70)
(222, 73)
(224, 75)
(224, 78)
(224, 79)
(224, 81)
(224, 83)
(224, 86)
(224, 88)
(224, 90)
(224, 92)
(224, 94)
(223, 96)
(222, 99)
(220, 100)
(219, 102)
(216, 104)
(213, 107)
(209, 109)
(205, 111)
(201, 113)
(196, 114)
(191, 115)
(187, 115)
(182, 116)
(179, 116)
(174, 116)
(167, 116)
(163, 116)
(158, 116)
(153, 116)
(149, 116)
(145, 115)
(143, 114)
(139, 113)
(135, 111)
(132, 110)
(130, 108)
(128, 107)
(126, 106)
(125, 105)
(123, 103)
(122, 102)
(120, 100)
(118, 96)
(116, 92)
(115, 87)
(115, 83)
(114, 79)
(114, 76)
(114, 72)
(115, 68)
(116, 65)
(117, 63)
(119, 59)
(122, 55)
(125, 52)
(128, 48)
(134, 45)
(138, 42)
(145, 39)
(153, 37)
(162, 34)
(168, 34)
(181, 34)
(191, 34)
(200, 34)
(210, 35)
(218, 37)
(228, 41)
(237, 45)
(246, 50)
(253, 54)
(259, 59)
(265, 64)
(270, 69)
(276, 74)
(281, 80)
(283, 84)
(286, 91)
(288, 96)
(292, 103)
(293, 107)
(294, 112)
(294, 116)
(294, 121)
(294, 124)
(294, 127)
(294, 129)
(292, 132)
(291, 135)
(291, 137)
(289, 139)
(287, 142)
(284, 144)
(283, 145)
(280, 147)
(277, 149)
(276, 150)
(273, 151)
(269, 153)
(264, 153)
(259, 154)
(254, 154)
(249, 154)
(244, 154)
(237, 152)
(232, 151)
(228, 150)
(223, 147)
(216, 145)
(212, 142)
(206, 138)
(203, 135)
(200, 133)
(198, 130)
(195, 127)
(194, 123)
(192, 122)
(192, 118)
(191, 116)
(191, 112)
(191, 109)
(192, 106)
(193, 104)
(195, 101)
(196, 100)
(198, 98)
(200, 96)
(201, 95)
(203, 95)
(206, 94)
(208, 93)
(211, 93)
(214, 93)
(216, 93)
(219, 94)
(221, 94)
(223, 96)
(226, 99)
(229, 102)
(232, 107)
(237, 113)
(240, 119)
(242, 126)
(245, 131)
(247, 136)
(247, 140)
(247, 145)
(247, 150)
(247, 153)
(246, 157)
(243, 161)
(240, 165)
(237, 168)
(233, 171)
(227, 175)
(223, 177)
(214, 180)
(203, 182)
(190, 182)
(178, 182)
(166, 182)
(152, 181)
(139, 180)
(126, 177)
(113, 175)
(105, 171)
(96, 167)
(91, 165)
(87, 163)
(85, 163)
(85, 162)
(84, 161)
(84, 160)
(83, 158)
鼠标释放

布局管理

Box布局器

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='Box布局', size=(300, 120))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(parent=self)
        ## 创建垂直方向Box布局管理器对象
        vbox = wx.BoxSizer(wx.VERTICAL)
        self.statictext = wx.StaticText(parent=panel, label='Button1单击')
        ## 添加静态文本到Box布局管理器
        vbox.Add(self.statictext, proportion=2, flag=wx.FIXED_MINSIZE | wx.TOP | wx.CENTER, border=10)

        b1 = wx.Button(parent=panel, id=10, label='Button1')
        b2 = wx.Button(parent=panel, id=11, label='Button2')
        self.Bind(wx.EVT_BUTTON, self.on_click, id=10, id2=20)
        ## 创建水平方向的Box布局管理器对象
        hbox = wx.BoxSizer(wx.HORIZONTAL)
        ## 添加b1到水平Box布局管理
        hbox.Add(b1, 0, wx.EXPAND | wx.BOTTOM, 5)
        ## 添加b2到水平Box布局管理
        hbox.Add(b2, 0, wx.EXPAND | wx.BOTTOM, 5)

        ## 将水平Box布局管理器到垂直Box布局管理器
        vbox.Add(hbox, proportion=1, flag=wx.CENTER)

        panel.SetSizer(vbox)

    def on_click(self, event):
        event_id = event.GetId()
        print(event_id)
        if event_id == 10:
            self.statictext.SetLabelText('Button1单击')
        else:
            self.statictext.SetLabelText('Button2单击')


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

StaticBox布局

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='StaticBox布局', size=(300, 120))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(parent=self)
        ## 创建垂直方向的Box布局管理器对象
        vbox = wx.BoxSizer(wx.VERTICAL)
        self.statictext = wx.StaticText(parent=panel, label='Button1单击')
        ## 添加静态文本到Box布局管理器
        vbox.Add(self.statictext, proportion=2, flag=wx.FIXED_MINSIZE | wx.TOP | wx.CENTER, border=10)

        b1 = wx.Button(parent=panel, id=10, label='Button1')
        b2 = wx.Button(parent=panel, id=11, label='Button2')
        self.Bind(wx.EVT_BUTTON, self.on_click, id=10, id2=20)

        ## 创建静态框对象
        sb = wx.StaticBox(panel, label="按钮框")
        ## 创建水平方向的StaticBox布局管理器
        hsbox = wx.StaticBoxSizer(sb, wx.HORIZONTAL)
        ## 添加b1到水平StaticBox布局管理
        hsbox.Add(b1, 0, wx.EXPAND | wx.BOTTOM, 5)
        ## 添加b2到水平StaticBox布局管理
        hsbox.Add(b2, 0, wx.EXPAND | wx.BOTTOM, 5)

        ## 添加hbox到vbox
        vbox.Add(hsbox, proportion=1, flag=wx.CENTER)

        panel.SetSizer(vbox)

    def on_click(self, event):
        event_id = event.GetId()
        print(event_id)
        if event_id == 10:
            self.statictext.SetLabelText('Button1单击')
        else:
            self.statictext.SetLabelText('Button2单击')


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

11
10
11

Grid布局

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='Grid布局', size=(300, 300))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(self)
        btn1 = wx.Button(panel, label='1')
        btn2 = wx.Button(panel, label='2')
        btn3 = wx.Button(panel, label='3')
        btn4 = wx.Button(panel, label='4')
        btn5 = wx.Button(panel, label='5')
        btn6 = wx.Button(panel, label='6')
        btn7 = wx.Button(panel, label='7')
        btn8 = wx.Button(panel, label='8')
        btn9 = wx.Button(panel, label='9')

        grid = wx.GridSizer(cols=3, rows=3, vgap=0, hgap=0)

        ## grid.AddMany([
        ##     (btn1, 0, wx.EXPAND),
        ##     (btn2, 0, wx.EXPAND),
        ##     (btn3, 0, wx.EXPAND),
        ##     (btn4, 0, wx.EXPAND),
        ##     (btn5, 0, wx.EXPAND),
        ##     (btn6, 0, wx.EXPAND),
        ##     (btn7, 0, wx.EXPAND),
        ##     (btn8, 0, wx.EXPAND),
        ##     (btn9, 0, wx.EXPAND)
        ## ])

        grid.Add(btn1, 0, wx.EXPAND)
        grid.Add(btn2, 0, wx.EXPAND)
        grid.Add(btn3, 0, wx.EXPAND)
        grid.Add(btn4, 0, wx.EXPAND)
        grid.Add(btn5, 0, wx.EXPAND)
        grid.Add(btn6, 0, wx.EXPAND)
        grid.Add(btn7, 0, wx.EXPAND)
        grid.Add(btn8, 0, wx.EXPAND)
        grid.Add(btn9, 0, wx.EXPAND)

        panel.SetSizer(grid)


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

FlexGrid布局

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='FlexGrid布局', size=(400, 200))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(parent=self)

        fgs = wx.FlexGridSizer(3, 2, 10, 10)

        title = wx.StaticText(panel, label="标题：")
        author = wx.StaticText(panel, label="作者名：")
        review = wx.StaticText(panel, label="内容：")

        tc1 = wx.TextCtrl(panel)
        tc2 = wx.TextCtrl(panel)
        tc3 = wx.TextCtrl(panel, style=wx.TE_MULTILINE)

        fgs.AddMany([title, (tc1, 1, wx.EXPAND),
                     author, (tc2, 1, wx.EXPAND),
                     review, (tc3, 1, wx.EXPAND)])

        fgs.AddGrowableRow(0, 1)
        fgs.AddGrowableRow(1, 1)
        fgs.AddGrowableRow(2, 3)
        fgs.AddGrowableCol(0, 1)
        fgs.AddGrowableCol(1, 2)

        hbox = wx.BoxSizer(wx.HORIZONTAL)
        hbox.Add(fgs, proportion=1, flag=wx.ALL | wx.EXPAND, border=15)

        panel.SetSizer(hbox)


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

wxPython控件

静态文本和按钮

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='静态文本和按钮', size=(300, 200))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(parent=self)
        ## 创建垂直方向的Box布局管理器
        vbox = wx.BoxSizer(wx.VERTICAL)

        self.statictext = wx.StaticText(parent=panel, label='StaticText1', style=wx.ALIGN_CENTRE_HORIZONTAL)
        b1 = wx.Button(parent=panel, label='OK')
        self.Bind(wx.EVT_BUTTON, self.on_click, b1)

        b2 = wx.ToggleButton(panel, -1, 'ToggleButton')
        self.Bind(wx.EVT_BUTTON, self.on_click, b2)

        bmp = wx.Bitmap('icon/1.png', wx.BITMAP_TYPE_PNG)
        b3 = wx.BitmapButton(panel, -1, bmp)
        self.Bind(wx.EVT_BUTTON, self.on_click, b3)

        ## 添加静态文本和按钮到Box布局管理器
        vbox.Add(100, 10, proportion=1, flag=wx.CENTER | wx.FIXED_MINSIZE)
        vbox.Add(self.statictext, proportion=1, flag=wx.CENTER | wx.FIXED_MINSIZE)
        vbox.Add(b1, proportion=1, flag=wx.CENTER | wx.EXPAND)
        vbox.Add(b2, proportion=1, flag=wx.CENTER | wx.EXPAND)
        vbox.Add(b3, proportion=1, flag=wx.CENTER | wx.EXPAND)

        panel.SetSizer(vbox)

    def on_click(self, event):
        self.statictext.SetLabelText('Hello, world.')


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

文本输入控件

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='文本框', size=(400, 200))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(self)

        hbox = wx.BoxSizer(wx.HORIZONTAL)

        fgs = wx.FlexGridSizer(3, 2, 10, 10)

        userid = wx.StaticText(panel, label="用户ID：")
        pwd = wx.StaticText(panel, label="密码：")
        content = wx.StaticText(panel, label="多行文本：")

        tc1 = wx.TextCtrl(panel)
        tc2 = wx.TextCtrl(panel, style=wx.TE_PASSWORD)
        tc3 = wx.TextCtrl(panel, style=wx.TE_MULTILINE)

        ## 设置tc1初始值
        tc1.SetValue('tony')
        ## 获取tc1值
        print('读取用户ID：{0}'.format(tc1.GetValue()))

        fgs.AddMany([userid, (tc1, 1, wx.EXPAND),
                     pwd, (tc2, 1, wx.EXPAND),
                     content, (tc3, 1, wx.EXPAND)])
        fgs.AddGrowableRow(0, 1)
        fgs.AddGrowableRow(1, 1)
        fgs.AddGrowableRow(2, 3)
        fgs.AddGrowableCol(0, 1)
        fgs.AddGrowableCol(1, 2)
        hbox.Add(fgs, proportion=1, flag=wx.ALL | wx.EXPAND, border=15)
        panel.SetSizer(hbox)


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

读取用户ID：tony

复选框和单选按钮

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='复选框和单选按钮', size=(400, 130))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(self)

        hbox1 = wx.BoxSizer(wx.HORIZONTAL)

        statictext = wx.StaticText(panel, label='选择你喜欢的编程语言：')
        cb1 = wx.CheckBox(panel, 1, 'Python')
        cb2 = wx.CheckBox(panel, 2, 'Java')
        cb2.SetValue(True)
        cb3 = wx.CheckBox(panel, 3, 'C++')
        self.Bind(wx.EVT_CHECKBOX, self.on_checkbox_click, id=1, id2=3)

        hbox1.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
        hbox1.Add(cb1, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
        hbox1.Add(cb2, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
        hbox1.Add(cb3, 1, flag=wx.ALL | wx.FIXED_MINSIZE)

        hbox2 = wx.BoxSizer(wx.HORIZONTAL)

        statictext = wx.StaticText(panel, label='选择性别：')
        radio1 = wx.RadioButton(panel, 4, '男', style=wx.RB_GROUP)
        radio2 = wx.RadioButton(panel, 5, '女')
        self.Bind(wx.EVT_RADIOBUTTON, self.on_radio1_click, id=4, id2=5)

        hbox2.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
        hbox2.Add(radio1, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
        hbox2.Add(radio2, 1, flag=wx.ALL | wx.FIXED_MINSIZE)

        hbox3 = wx.BoxSizer(wx.HORIZONTAL)

        statictext = wx.StaticText(panel, label='选择你最喜欢吃的水果：')
        radio3 = wx.RadioButton(panel, 6, '苹果', style=wx.RB_GROUP)
        radio4 = wx.RadioButton(panel, 7, '橘子')
        radio5 = wx.RadioButton(panel, 8, '香蕉')
        self.Bind(wx.EVT_RADIOBUTTON, self.on_radio2_click, id=6, id2=8)

        hbox3.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
        hbox3.Add(radio3, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
        hbox3.Add(radio4, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
        hbox3.Add(radio5, 1, flag=wx.ALL | wx.FIXED_MINSIZE)

        vbox = wx.BoxSizer(wx.VERTICAL)
        vbox.Add(hbox1, 1, flag=wx.ALL | wx.EXPAND, border=5)
        vbox.Add(hbox2, 1, flag=wx.ALL | wx.EXPAND, border=5)
        vbox.Add(hbox3, 1, flag=wx.ALL | wx.EXPAND, border=5)
        panel.SetSizer(vbox)

    def on_checkbox_click(self, event):
        cb = event.GetEventObject()
        print('选择 {0}，状态{1}'.format(cb.GetLabel(), event.IsChecked()))

    def on_radio1_click(self, event):
        rb = event.GetEventObject()
        print('第一组 {0} 被选中'.format(rb.GetLabel()))

    def on_radio2_click(self, event):
        rb = event.GetEventObject()
        print('第二组 {0} 被选中'.format(rb.GetLabel()))

class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

第二组 橘子 被选中
第二组 香蕉 被选中
第一组 女 被选中
选择 C++，状态True
选择 Python，状态True

下拉列表

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='下拉列表', size=(400, 130))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(self)

        hbox1 = wx.BoxSizer(wx.HORIZONTAL)

        statictext = wx.StaticText(panel, label='选择你喜欢的编程语言：')

        list1 = ['Python', 'C++', 'Java']
        ch1 = wx.ComboBox(panel, -1, value='C', choices=list1, style=wx.CB_SORT)
        self.Bind(wx.EVT_COMBOBOX, self.on_combobox, ch1)

        hbox1.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
        hbox1.Add(ch1, 1, flag=wx.ALL | wx.FIXED_MINSIZE)

        hbox2 = wx.BoxSizer(wx.HORIZONTAL)

        statictext = wx.StaticText(panel, label='选择性别：')
        list2 = ['男', '女']
        ch2 = wx.Choice(panel, -1, choices=list2)
        self.Bind(wx.EVT_CHOICE, self.on_choice, ch2)

        hbox2.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
        hbox2.Add(ch2, 1, flag=wx.ALL | wx.FIXED_MINSIZE)

        vbox = wx.BoxSizer(wx.VERTICAL)
        vbox.Add(hbox1, 1, flag=wx.ALL | wx.EXPAND, border=5)
        vbox.Add(hbox2, 1, flag=wx.ALL | wx.EXPAND, border=5)
        panel.SetSizer(vbox)

    def on_combobox(self, event):
        print('选择 {0}'.format(event.GetString()))

    def on_choice(self, event):
        print('选择 {0}'.format(event.GetString()))


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

选择 Java

列表

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='下拉列表', size=(350, 180))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(self)

        hbox1 = wx.BoxSizer(wx.HORIZONTAL)

        statictext = wx.StaticText(panel, label='选择你喜欢的编程语言：')

        list1 = ['Python', 'C++', 'Java']
        lb1 = wx.ListBox(panel, -1, choices=list1, style=wx.LB_SINGLE)
        self.Bind(wx.EVT_LISTBOX, self.on_listbox1, lb1)

        hbox1.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
        hbox1.Add(lb1, 1, flag=wx.ALL | wx.FIXED_MINSIZE)

        hbox2 = wx.BoxSizer(wx.HORIZONTAL)

        statictext = wx.StaticText(panel, label='选择你喜欢吃的水果：')
        list2 = ['苹果', '橘子', '香蕉']
        lb2 = wx.ListBox(panel, -1, choices=list2, style=wx.LB_EXTENDED)
        self.Bind(wx.EVT_LISTBOX, self.on_listbox2, lb2)

        hbox2.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
        hbox2.Add(lb2, 1, flag=wx.ALL | wx.FIXED_MINSIZE)

        vbox = wx.BoxSizer(wx.VERTICAL)
        vbox.Add(hbox1, 1, flag=wx.ALL | wx.EXPAND, border=5)
        vbox.Add(hbox2, 1, flag=wx.ALL | wx.EXPAND, border=5)
        panel.SetSizer(vbox)

    def on_listbox1(self, event):
        listbox = event.GetEventObject()
        print('选择 {0}'.format(listbox.GetSelection()))

    def on_listbox2(self, event):
        listbox = event.GetEventObject()
        print('选择 {0}'.format(listbox.GetSelections()))


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

选择 1
选择 2

静态图片控件

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='静态图片控件', size=(300, 300))
        self.bmps = [wx.Bitmap('images/bird5.gif', wx.BITMAP_TYPE_GIF),
                     wx.Bitmap('images/bird4.gif', wx.BITMAP_TYPE_GIF),
                     wx.Bitmap('images/bird3.gif', wx.BITMAP_TYPE_GIF)]

        self.Centre()  ## 设置窗口居中
        self.panel = wx.Panel(parent=self)
        ## 创建垂直方向的Box布局管理器
        vbox = wx.BoxSizer(wx.VERTICAL)

        b1 = wx.Button(parent=self.panel, id=1, label='Button1')
        b2 = wx.Button(self.panel, id=2, label='Button2')
        self.Bind(wx.EVT_BUTTON, self.on_click, id=1, id2=2)

        self.image = wx.StaticBitmap(self.panel, -1, self.bmps[0])

        ## 添加标控件到Box布局管理器
        vbox.Add(b1, proportion=1, flag=wx.CENTER | wx.EXPAND)
        vbox.Add(b2, proportion=1, flag=wx.CENTER | wx.EXPAND)
        vbox.Add(self.image, proportion=3, flag=wx.CENTER)

        self.panel.SetSizer(vbox)

    def on_click(self, event):
        event_id = event.GetId()
        if event_id == 1:
            self.image.SetBitmap(self.bmps[1])
        else:
            self.image.SetBitmap(self.bmps[2])
        self.panel.Layout()


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

高级窗口

分隔窗口

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='分隔窗口', size=(350, 180))
        self.Centre()  ## 设置窗口居中

        splitter = wx.SplitterWindow(self, -1)
        leftpanel = wx.Panel(splitter)
        rightpanel = wx.Panel(splitter)
        splitter.SplitVertically(leftpanel, rightpanel, 100)
        splitter.SetMinimumPaneSize(80)

        list2 = ['苹果', '橘子', '香蕉']
        lb2 = wx.ListBox(leftpanel, -1, choices=list2, style=wx.LB_SINGLE)
        self.Bind(wx.EVT_LISTBOX, self.on_listbox, lb2)

        vbox1 = wx.BoxSizer(wx.VERTICAL)
        vbox1.Add(lb2, 1, flag=wx.ALL | wx.EXPAND, border=5)
        leftpanel.SetSizer(vbox1)

        vbox2 = wx.BoxSizer(wx.VERTICAL)
        self.content = wx.StaticText(rightpanel, label='右侧面板')
        vbox2.Add(self.content, 1, flag=wx.ALL | wx.EXPAND, border=5)
        rightpanel.SetSizer(vbox2)

    def on_listbox(self, event):
        s = '选择 {0}'.format(event.GetString())
        self.content.SetLabel(s)


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

使用树

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='树控件', size=(500, 400))
        self.Centre()  ## 设置窗口居中

        splitter = wx.SplitterWindow(self)
        leftpanel = wx.Panel(splitter)
        rightpanel = wx.Panel(splitter)
        splitter.SplitVertically(leftpanel, rightpanel, 200)
        splitter.SetMinimumPaneSize(80)

        self.tree = self.CreateTreeCtrl(leftpanel)
        self.Bind(wx.EVT_TREE_SEL_CHANGING, self.on_click, self.tree)
        vbox1 = wx.BoxSizer(wx.VERTICAL)
        vbox1.Add(self.tree, 1, flag=wx.ALL | wx.EXPAND, border=5)
        leftpanel.SetSizer(vbox1)

        vbox2 = wx.BoxSizer(wx.VERTICAL)
        self.content = wx.StaticText(rightpanel, label='右侧面板')
        vbox2.Add(self.content, 1, flag=wx.ALL | wx.EXPAND, border=5)
        rightpanel.SetSizer(vbox2)

    def on_click(self, event):
        item = event.GetItem()
        self.content.SetLabel(self.tree.GetItemText(item))

    def CreateTreeCtrl(self, parent):
        tree = wx.TreeCtrl(parent)

        items = []

        imglist = wx.ImageList(16, 16, True, 2)
        imglist.Add(wx.ArtProvider.GetBitmap(wx.ART_FOLDER, size=wx.Size(16, 16)))
        imglist.Add(wx.ArtProvider.GetBitmap(wx.ART_NORMAL_FILE, size=wx.Size(16, 16)))
        tree.AssignImageList(imglist)

        root = tree.AddRoot("TreeRoot", image=0)

        items.append(tree.AppendItem(root, "Item 1", 0))
        items.append(tree.AppendItem(root, "Item 2", 0))
        items.append(tree.AppendItem(root, "Item 3", 0))
        items.append(tree.AppendItem(root, "Item 4", 0))
        items.append(tree.AppendItem(root, "Item 5", 0))

        for ii in range(len(items)):
            id = items[ii]
            tree.AppendItem(id, "Subitem 1", 1)
            tree.AppendItem(id, "Subitem 2", 1)
            tree.AppendItem(id, "Subitem 3", 1)
            tree.AppendItem(id, "Subitem 4", 1)
            tree.AppendItem(id, "Subitem 5", 1)

        tree.Expand(root)  ## 展开根下子节点
        tree.Expand(items[0])  ## 展开Item 1下子节点
        tree.Expand(items[3])  ## 展开Item 4下子节点
        tree.SelectItem(root)  ## 选中根节点

        return tree


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

使用网络

import wx
import wx.grid

data = [['0036', '高等数学', '李放', '人民邮电出版社', '20000812', '1'],
        ['0004', 'FLASH精选', '刘扬', '中国纺织出版社', '19990312', '2'],
        ['0026', '软件工程', '牛田', '经济科学出版社', '20000328', '4'],
        ['0015', '人工智能', '周未', '机械工业出版社', '19991223', '3'],
        ['0037', '南方周末', '邓光明', '南方出版社', '20000923', '3'],
        ['0008', '新概念3', '余智', '外语出版社', '19990723', '2'],
        ['0019', '通讯与网络', '欧阳杰', '机械工业出版社', '20000517', '1'],
        ['0014', '期货分析', '孙宝', '飞鸟出版社', '19991122', '3'],
        ['0023', '经济概论', '思佳', '北京大学出版社', '20000819', '3'],
        ['0017', '计算机理论基础', '戴家', '机械工业出版社', '20000218', '4'],
        ['0002', '汇编语言', '李利光', '北京大学出版社', '19980318', '2'],
        ['0033', '模拟电路', '邓英才', '电子工业出版社', '20000527', '2'],
        ['0011', '南方旅游', '王爱国', '南方出版社', '19990930', '2'],
        ['0039', '黑幕', '李仪', '华光出版社', '20000508', '14'],
        ['0001', '软件工程', '戴国强', '机械工业出版社', '19980528', '2'],
        ['0034', '集邮爱好者', '李云', '人民邮电出版社', '20000630', '1'],
        ['0031', '软件工程', '戴志名', '电子工业出版社', '20000324', '3'],
        ['0030', '数据库及应用', '孙家萧', '清华大学出版社', '20000619', '1'],
        ['0024', '经济与科学', '毛波', '经济科学出版社', '20000923', '2'],
        ['0009', '军事要闻', '张强', '解放军出版社', '19990722', '3'],
        ['0003', '计算机基础', '王飞', '经济科学出版社', '19980218', '1'],
        ['0020', '现代操作系统', '王小国', '机械工业出版社', '20010128', '1'],
        ['0025', '计算机体系结构', '方丹', '机械工业出版社', '20000328', '4'],
        ['0010', '大众生活', '许阳', '电子出版社', '19990819', '3'],
        ['0021', '网络基础', '王大尉', '北京大学出版社', '20000617', '1'],
        ['0006', '世界杯', '柳飞', '世界出版社', '19990412', '2'],
        ['0028', '高级语言程序设计', '寇国华', '清华大学出版社', '20000117', '3'],
        ['0038', '十大旅游胜地', '潭晓明', '南方出版社', '20000403', '2'],
        ['0018', '编译原理', '郑键', '机械工业出版社', '20000415', '2'],
        ['0007', 'JAVA程序设计', '张余', '人民邮电出版社', '19990613', '1'],
        ['0013', '幽灵', '钱力华', '华光出版社', '19991008', '1'],
        ['0022', '万紫千红', '丛丽', '北京大学出版社', '20000702', '3'],
        ['0027', '世界语言大观', '候丙辉', '经济科学出版社', '20000814', '2'],
        ['0029', '操作系统概论', '聂元名', '清华大学出版社', '20001028', '1'],
        ['0016', '数据库系统概念', '吴红', '机械工业出版社', '20000328', '3'],
        ['0005', 'java基础', '王一', '电子工业出版社', '19990528', '3'],
        ['0032', 'SQL使用手册', '贺民', '电子工业出版社', '19990425', '2']]

column_names = ['书籍编号', '书籍名称', '作者', '出版社', '出版日期', '库存数量']


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='网格控件', size=(550, 500))
        self.Centre()  ## 设置窗口居中
        self.grid = self.CreateGrid(self)
        self.Bind(wx.grid.EVT_GRID_LABEL_LEFT_CLICK, self.OnLabelLeftClick)

    def OnLabelLeftClick(self, event):
        print("RowIdx：{0}".format(event.GetRow()))
        print("ColIdx：{0}".format(event.GetCol()))
        print(data[event.GetRow()])
        event.Skip()

    def CreateGrid(self, parent):
        grid = wx.grid.Grid(parent)
        grid.CreateGrid(len(data), len(data[0]))

        for row in range(len(data)):
            for col in range(len(data[row])):
                grid.SetColLabelValue(col, column_names[col])
                grid.SetCellValue(row, col, data[row][col])
        ## 设置行和列自定调整
        grid.AutoSize()

        return grid


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

import wx
import wx.grid

data = [['0036', '高等数学', '李放', '人民邮电出版社', '20000812', '1'],
        ['0004', 'FLASH精选', '刘扬', '中国纺织出版社', '19990312', '2'],
        ['0026', '软件工程', '牛田', '经济科学出版社', '20000328', '4'],
        ['0015', '人工智能', '周未', '机械工业出版社', '19991223', '3'],
        ['0037', '南方周末', '邓光明', '南方出版社', '20000923', '3'],
        ['0008', '新概念3', '余智', '外语出版社', '19990723', '2'],
        ['0019', '通讯与网络', '欧阳杰', '机械工业出版社', '20000517', '1'],
        ['0014', '期货分析', '孙宝', '飞鸟出版社', '19991122', '3'],
        ['0023', '经济概论', '思佳', '北京大学出版社', '20000819', '3'],
        ['0017', '计算机理论基础', '戴家', '机械工业出版社', '20000218', '4'],
        ['0002', '汇编语言', '李利光', '北京大学出版社', '19980318', '2'],
        ['0033', '模拟电路', '邓英才', '电子工业出版社', '20000527', '2'],
        ['0011', '南方旅游', '王爱国', '南方出版社', '19990930', '2'],
        ['0039', '黑幕', '李仪', '华光出版社', '20000508', '14'],
        ['0001', '软件工程', '戴国强', '机械工业出版社', '19980528', '2'],
        ['0034', '集邮爱好者', '李云', '人民邮电出版社', '20000630', '1'],
        ['0031', '软件工程', '戴志名', '电子工业出版社', '20000324', '3'],
        ['0030', '数据库及应用', '孙家萧', '清华大学出版社', '20000619', '1'],
        ['0024', '经济与科学', '毛波', '经济科学出版社', '20000923', '2'],
        ['0009', '军事要闻', '张强', '解放军出版社', '19990722', '3'],
        ['0003', '计算机基础', '王飞', '经济科学出版社', '19980218', '1'],
        ['0020', '现代操作系统', '王小国', '机械工业出版社', '20010128', '1'],
        ['0025', '计算机体系结构', '方丹', '机械工业出版社', '20000328', '4'],
        ['0010', '大众生活', '许阳', '电子出版社', '19990819', '3'],
        ['0021', '网络基础', '王大尉', '北京大学出版社', '20000617', '1'],
        ['0006', '世界杯', '柳飞', '世界出版社', '19990412', '2'],
        ['0028', '高级语言程序设计', '寇国华', '清华大学出版社', '20000117', '3'],
        ['0038', '十大旅游胜地', '潭晓明', '南方出版社', '20000403', '2'],
        ['0018', '编译原理', '郑键', '机械工业出版社', '20000415', '2'],
        ['0007', 'JAVA程序设计', '张余', '人民邮电出版社', '19990613', '1'],
        ['0013', '幽灵', '钱力华', '华光出版社', '19991008', '1'],
        ['0022', '万紫千红', '丛丽', '北京大学出版社', '20000702', '3'],
        ['0027', '世界语言大观', '候丙辉', '经济科学出版社', '20000814', '2'],
        ['0029', '操作系统概论', '聂元名', '清华大学出版社', '20001028', '1'],
        ['0016', '数据库系统概念', '吴红', '机械工业出版社', '20000328', '3'],
        ['0005', 'java基础', '王一', '电子工业出版社', '19990528', '3'],
        ['0032', 'SQL使用手册', '贺民', '电子工业出版社', '19990425', '2']]

column_names = ['书籍编号', '书籍名称书籍名称', '作者', '出版社', '出版日期', '库存数量']


class MyGridTable(wx.grid.GridTableBase):
    def __init__(self):
        super().__init__()
        self.colLabels = column_names

    def GetNumberRows(self):
        return len(data)

    def GetNumberCols(self):
        return len(data[0])

    def GetValue(self, row, col):
        return data[row][col]

    def GetColLabelValue(self, col):
        return self.colLabels[col]


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='网格控件', size=(550, 500))
        self.Centre()  ## 设置窗口居中
        self.grid = self.CreateGrid(self)
        self.Bind(wx.grid.EVT_GRID_LABEL_LEFT_CLICK, self.OnLabelLeftClick)

    def OnLabelLeftClick(self, event):
        print("RowIdx：{0}".format(event.GetRow()))
        print("ColIdx：{0}".format(event.GetCol()))
        print(data[event.GetRow()])
        event.Skip()

    def CreateGrid(self, parent):
        grid = wx.grid.Grid(parent)
        tablebase = MyGridTable()
        grid.SetTable(tablebase, True)
        ## 设置行和列自定调整
        grid.AutoSize()

        return grid


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

使用菜单

import wx
import wx.grid


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='使用菜单', size=(550, 500))
        self.Centre()  ## 设置窗口居中
        
        self.text = wx.TextCtrl(self, -1, style=wx.EXPAND | wx.TE_MULTILINE)
        vbox = wx.BoxSizer(wx.VERTICAL)
        vbox.Add(self.text, proportion=1, flag=wx.EXPAND | wx.ALL, border=1)
        self.SetSizer(vbox)

        menubar = wx.MenuBar()

        file_menu = wx.Menu()
        new_item = wx.MenuItem(file_menu, wx.ID_NEW, text="新建", kind=wx.ITEM_NORMAL)
        self.Bind(wx.EVT_MENU, self.on_newitem_click, id=wx.ID_NEW)
        file_menu.Append(new_item)
        file_menu.AppendSeparator()

        edit_menu = wx.Menu()
        copy_item = wx.MenuItem(edit_menu, 100, text="复制", kind=wx.ITEM_NORMAL)
        edit_menu.Append(copy_item)

        cut_item = wx.MenuItem(edit_menu, 101, text="剪切", kind=wx.ITEM_NORMAL)
        edit_menu.Append(cut_item)

        paste_item = wx.MenuItem(edit_menu, 102, text="粘贴", kind=wx.ITEM_NORMAL)
        edit_menu.Append(paste_item)

        self.Bind(wx.EVT_MENU, self.on_editmenu_click, id=100, id2=102)

        file_menu.Append(wx.ID_ANY, "编辑", edit_menu)

        menubar.Append(file_menu, '文件')
        self.SetMenuBar(menubar)

    def on_newitem_click(self, event):
        self.text.SetLabel('单击【新建】菜单')

    def on_editmenu_click(self, event):
        event_id = event.GetId()
        if event_id == 100:
            self.text.SetLabel('单击【复制】菜单')
        elif event_id == 101:
            self.text.SetLabel('单击【剪切】菜单')
        else:
            self.text.SetLabel('单击【粘贴】菜单')


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

C:\Users\HP\AppData\Local\Temp\ipykernel_21396\3458874188.py:36: DeprecationWarning: Menu.Append() is deprecated
  file_menu.Append(wx.ID_ANY, "编辑", edit_menu)

使用工具栏

import wx
import wx.grid


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='使用工具栏', size=(550, 500))
        self.Centre()  ## 设置窗口居中
        self.Show(True)

        self.text = wx.TextCtrl(self, -1, style=wx.EXPAND | wx.TE_MULTILINE)
        vbox = wx.BoxSizer(wx.VERTICAL)
        vbox.Add(self.text, proportion=1, flag=wx.EXPAND | wx.ALL, border=1)
        self.SetSizer(vbox)

        menubar = wx.MenuBar()

        file_menu = wx.Menu()
        new_item = wx.MenuItem(file_menu, wx.ID_NEW, text="新建", kind=wx.ITEM_NORMAL)
        file_menu.Append(new_item)
        file_menu.AppendSeparator()

        edit_menu = wx.Menu()
        copy_item = wx.MenuItem(edit_menu, 100, text="复制", kind=wx.ITEM_NORMAL)
        edit_menu.Append(copy_item)

        cut_item = wx.MenuItem(edit_menu, 101, text="剪切", kind=wx.ITEM_NORMAL)
        edit_menu.Append(cut_item)

        paste_item = wx.MenuItem(edit_menu, 102, text="粘贴", kind=wx.ITEM_NORMAL)
        edit_menu.Append(paste_item)

        file_menu.Append(wx.ID_ANY, "编辑", edit_menu)

        menubar.Append(file_menu, '文件')
        self.SetMenuBar(menubar)

        tb = wx.ToolBar(self, wx.ID_ANY)
        self.ToolBar = tb
        tsize = (24, 24)
        new_bmp = wx.ArtProvider.GetBitmap(wx.ART_NEW, wx.ART_TOOLBAR, tsize)
        open_bmp = wx.ArtProvider.GetBitmap(wx.ART_FILE_OPEN, wx.ART_TOOLBAR, tsize)
        copy_bmp = wx.ArtProvider.GetBitmap(wx.ART_COPY, wx.ART_TOOLBAR, tsize)
        paste_bmp = wx.ArtProvider.GetBitmap(wx.ART_PASTE, wx.ART_TOOLBAR, tsize)

        tb.AddTool(10, "New", new_bmp, kind=wx.ITEM_NORMAL, shortHelp="New")
        tb.AddTool(20, "Open", open_bmp, kind=wx.ITEM_NORMAL, shortHelp="Open")
        tb.AddSeparator()
        tb.AddTool(30, "Copy", copy_bmp, kind=wx.ITEM_NORMAL, shortHelp="Copy")
        tb.AddTool(40, "Paste", paste_bmp, kind=wx.ITEM_NORMAL, shortHelp="Paste")
        tb.AddSeparator()

        tb.AddTool(201, "back", wx.Bitmap("menu_icon/back.png"), kind=wx.ITEM_NORMAL, shortHelp="Back")
        tb.AddTool(202, "forward", wx.Bitmap("menu_icon/forward.png"), kind=wx.ITEM_NORMAL, shortHelp="Forward")
        self.Bind(wx.EVT_MENU, self.on_click, id=201, id2=202)
        tb.AddSeparator()

        tb.Realize()

    def on_click(self, event):
        event_id = event.GetId()
        if event_id == 201:
            self.text.SetLabel('单击【Back】按钮')
        else:
            self.text.SetLabel('单击【Forward】按钮')


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

C:\Users\HP\AppData\Local\Temp\ipykernel_24844\2637029235.py:34: DeprecationWarning: Menu.Append() is deprecated
  file_menu.Append(wx.ID_ANY, "编辑", edit_menu)

项目实战1：网络爬虫余爬取股票数据

网络爬虫基数概述

网络通信技术

多线程技术

数据交换技术

web前端技术

数据存储技术

爬取数据

网页中静态和动态数据

使用urllib爬取数据

获得静态数据

import urllib.request


url = "file:///C:/Users/HP/nasdaq-Apple1.html"
req = urllib.request.Request(url)

with urllib.request.urlopen(req) as response:
    data = response.read()
    htmlstr = data.decode()
    print(htmlstr)

<!doctype html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="Generator" content="EditPlus®">
    <meta name="Author" content="">
    <meta name="Keywords" content="">
    <meta name="Description" content="">
    <title>Document</title>
</head>
<body>
<div id="quotes_content_left_pnlAJAX">
    <table class="historical-data__table">
        <thead class="historical-data__table-headings">
        <tr class="historical-data__row historical-data__row--headings">
            <th class="historical-data__table-heading" scope="col">Date</th>
            <th class="historical-data__table-heading" scope="col">Open</th>
            <th class="historical-data__table-heading" scope="col">High</th>
            <th class="historical-data__table-heading" scope="col">Low</th>
            <th class="historical-data__table-heading" scope="col">Close/Last</th>
            <th class="historical-data__table-heading" scope="col">Volume</th>
        </tr>
        </thead>
        <tbody class="historical-data__table-body">
        <tr class="historical-data__row">
            <th>10/04/2019</th>
            <td>225.64</td>
            <td>227.49</td>
            <td>223.89</td>
            <td>227.01</td>
            <td>34,755,550</td>
        </tr>
        <tr class="historical-data__row">
            <th>10/03/2019</th>
            <td>218.43</td>
            <td>220.96</td>
            <td>215.132</td>
            <td>220.82</td>
            <td>30,352,690</td>
        </tr>
        <tr class="historical-data__row">
            <th>10/02/2019</th>
            <td>223.06</td>
            <td>223.58</td>
            <td>217.93</td>
            <td>218.96</td>
            <td>35,767,260</td>
        </tr>
        <tr class="historical-data__row">
            <th>10/01/2019</th>
            <td>225.07</td>
            <td>228.22</td>
            <td>224.2</td>
            <td>224.59</td>
            <td>36,187,160</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/30/2019</th>
            <td>220.9</td>
            <td>224.58</td>
            <td>220.79</td>
            <td>223.97</td>
            <td>26,318,580</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/27/2019</th>
            <td>220.54</td>
            <td>220.96</td>
            <td>217.2814</td>
            <td>218.82</td>
            <td>25,361,290</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/26/2019</th>
            <td>220</td>
            <td>220.94</td>
            <td>218.83</td>
            <td>219.89</td>
            <td>19,088,310</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/25/2019</th>
            <td>218.55</td>
            <td>221.5</td>
            <td>217.1402</td>
            <td>221.03</td>
            <td>22,481,010</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/24/2019</th>
            <td>221.03</td>
            <td>222.49</td>
            <td>217.19</td>
            <td>217.68</td>
            <td>31,434,370</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/23/2019</th>
            <td>218.95</td>
            <td>219.84</td>
            <td>217.65</td>
            <td>218.72</td>
            <td>19,419,650</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/20/2019</th>
            <td>221.38</td>
            <td>222.56</td>
            <td>217.473</td>
            <td>217.73</td>
            <td>57,977,090</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/19/2019</th>
            <td>222.01</td>
            <td>223.76</td>
            <td>220.37</td>
            <td>220.96</td>
            <td>22,187,880</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/18/2019</th>
            <td>221.06</td>
            <td>222.85</td>
            <td>219.44</td>
            <td>222.77</td>
            <td>25,643,090</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/17/2019</th>
            <td>219.96</td>
            <td>220.82</td>
            <td>219.12</td>
            <td>220.7</td>
            <td>18,386,470</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/16/2019</th>
            <td>217.73</td>
            <td>220.13</td>
            <td>217.56</td>
            <td>219.9</td>
            <td>21,158,140</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/13/2019</th>
            <td>220</td>
            <td>220.79</td>
            <td>217.02</td>
            <td>218.75</td>
            <td>39,763,300</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/12/2019</th>
            <td>224.8</td>
            <td>226.42</td>
            <td>222.86</td>
            <td>223.085</td>
            <td>32,226,670</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/11/2019</th>
            <td>218.07</td>
            <td>223.71</td>
            <td>217.73</td>
            <td>223.59</td>
            <td>44,289,650</td>
        </tr>
        </tbody>
    </table>
</div>
</body>
</html>

获得动态数据

import re
import urllib.request

url = 'http://q.stock.sohu.com/hisHq?code=cn_600519&stat=1&order=D&period=d&callback=historySearchHandler&rt=jsonp&0.8115656498417958'
req = urllib.request.Request(url)

with urllib.request.urlopen(req) as response:
    data = response.read()
    htmlstr = data.decode('gbk')
    print(htmlstr)
    htmlstr = htmlstr.replace('historySearchHandler(', '')
    htmlstr = htmlstr.replace(')', '')
    print('替换后的：', htmlstr)

historySearchHandler([{"status":0,"hq":[["2023-04-18","1753.00","1758.00","5.00","0.29%","1746.02","1769.00","18314","322010.75","0.15%"],["2023-04-17","1740.00","1753.00","39.58","2.31%","1728.00","1753.00","30467","530340.12","0.24%"],["2023-04-14","1726.00","1713.42","-9.58","-0.56%","1704.80","1733.00","21232","364652.69","0.17%"],["2023-04-13","1690.00","1723.00","28.90","1.71%","1684.01","1723.59","29543","504931.03","0.24%"],["2023-04-12","1747.26","1694.10","-51.40","-2.94%","1692.82","1750.00","51105","873265.75","0.41%"],["2023-04-11","1793.00","1745.50","-26.20","-1.48%","1744.00","1793.00","29209","513885.44","0.23%"],["2023-04-10","1790.88","1771.70","-19.29","-1.08%","1744.00","1790.88","29418","517115.03","0.23%"],["2023-04-07","1795.00","1790.99","-5.97","-0.33%","1788.34","1806.01","13525","242816.05","0.11%"],["2023-04-06","1805.00","1796.96","-17.63","-0.97%","1788.22","1815.90","14874","267625.19","0.12%"],["2023-04-04","1812.00","1814.59","12.52","0.69%","1787.00","1815.17","20066","361427.53","0.16%"],["2023-04-03","1825.00","1802.07","-17.93","-0.99%","1800.08","1827.77","21417","387581.16","0.17%"],["2023-03-31","1825.00","1820.00","20.00","1.11%","1819.00","1848.00","27446","502479.06","0.22%"],["2023-03-30","1793.00","1800.00","10.00","0.56%","1779.00","1805.00","19257","345357.31","0.15%"],["2023-03-29","1799.00","1790.00","8.20","0.46%","1785.07","1800.00","15393","276190.94","0.12%"],["2023-03-28","1770.00","1781.80","14.01","0.79%","1765.02","1790.00","17261","307311.31","0.14%"],["2023-03-27","1778.60","1767.79","-10.83","-0.61%","1756.00","1778.60","15296","270075.59","0.12%"],["2023-03-24","1769.08","1778.62","3.76","0.21%","1766.00","1783.60","12770","226964.92","0.10%"],["2023-03-23","1766.00","1774.86","1.51","0.09%","1765.01","1791.11","17356","308282.16","0.14%"],["2023-03-22","1780.00","1773.35","-1.65","-0.09%","1765.55","1793.00","15330","272764.88","0.12%"],["2023-03-21","1735.00","1775.00","45.40","2.62%","1723.97","1785.85","31142","549105.19","0.25%"],["2023-03-20","1751.00","1729.60","-12.40","-0.71%","1728.00","1755.00","20491","355787.22","0.16%"],["2023-03-17","1770.00","1742.00","-9.99","-0.57%","1736.00","1775.89","27023","474424.94","0.22%"],["2023-03-16","1740.00","1751.99","1.07","0.06%","1739.01","1770.00","17646","309679.09","0.14%"],["2023-03-15","1778.37","1750.92","-15.08","-0.85%","1750.12","1784.88","19213","339269.84","0.15%"],["2023-03-14","1763.78","1766.00","4.00","0.23%","1738.50","1779.88","23705","417728.91","0.19%"],["2023-03-13","1751.00","1762.00","12.00","0.69%","1749.00","1775.00","20560","362647.62","0.16%"],["2023-03-10","1751.57","1750.00","-20.02","-1.13%","1750.00","1781.00","21161","372513.91","0.17%"],["2023-03-09","1768.00","1770.02","-0.40","-0.02%","1740.00","1785.00","27612","488144.28","0.22%"],["2023-03-08","1780.02","1770.42","-17.88","-1.00%","1761.12","1785.94","22764","403578.72","0.18%"],["2023-03-07","1805.98","1788.30","-18.84","-1.04%","1788.00","1816.60","22785","410130.25","0.18%"],["2023-03-06","1818.18","1807.14","-10.90","-0.60%","1796.77","1818.50","20646","373007.94","0.16%"],["2023-03-03","1839.77","1818.04","-9.96","-0.54%","1802.48","1841.61","16198","294684.25","0.13%"],["2023-03-02","1829.00","1828.00","-10.53","-0.57%","1821.10","1838.99","13144","240529.23","0.10%"],["2023-03-01","1813.00","1838.53","24.79","1.37%","1803.23","1848.00","24458","447559.22","0.19%"],["2023-02-28","1819.00","1813.74","3.33","0.18%","1783.30","1822.01","23952","431487.69","0.19%"],["2023-02-27","1778.50","1810.41","22.41","1.25%","1775.02","1815.00","22065","397812.88","0.18%"],["2023-02-24","1810.11","1788.00","-30.00","-1.65%","1782.18","1810.19","24635","441562.16","0.20%"],["2023-02-23","1840.00","1818.00","-18.00","-0.98%","1805.25","1848.80","21881","398399.12","0.17%"],["2023-02-22","1855.01","1836.00","-31.00","-1.66%","1831.80","1863.90","21869","403101.59","0.17%"],["2023-02-21","1874.00","1867.00","-8.00","-0.43%","1851.00","1874.00","18751","349163.34","0.15%"],["2023-02-20","1821.00","1875.00","54.22","2.98%","1817.20","1878.80","29669","548880.00","0.24%"],["2023-02-17","1850.16","1820.78","-41.04","-2.20%","1820.05","1873.00","26443","488032.88","0.21%"],["2023-02-16","1841.34","1861.82","20.82","1.13%","1828.00","1887.00","33246","619691.50","0.26%"],["2023-02-15","1843.78","1841.00","-2.79","-0.15%","1835.81","1855.30","18177","335142.22","0.14%"],["2023-02-14","1856.46","1843.79","-12.56","-0.68%","1835.00","1857.40","19566","360176.94","0.16%"],["2023-02-13","1810.00","1856.35","46.35","2.56%","1810.00","1874.50","38147","705838.25","0.30%"],["2023-02-10","1810.10","1810.00","-8.00","-0.44%","1801.05","1818.49","17985","325385.94","0.14%"],["2023-02-09","1778.00","1818.00","34.00","1.91%","1775.01","1829.75","29754","540139.94","0.24%"],["2023-02-08","1800.01","1784.00","-13.00","-0.72%","1775.00","1805.97","16676","298057.47","0.13%"],["2023-02-07","1808.08","1797.00","2.00","0.11%","1787.73","1808.80","24322","437367.19","0.19%"],["2023-02-06","1780.00","1795.00","-23.00","-1.27%","1760.00","1795.00","42661","759573.94","0.34%"],["2023-02-03","1820.00","1818.00","-18.11","-0.99%","1795.68","1826.00","34945","632463.50","0.28%"],["2023-02-02","1848.38","1836.11","-8.86","-0.48%","1826.00","1859.00","29759","546550.94","0.24%"],["2023-02-01","1854.98","1844.97","-0.79","-0.04%","1811.40","1859.00","33974","624467.94","0.27%"],["2023-01-31","1896.50","1845.76","-42.24","-2.24%","1833.07","1899.95","32991","612831.12","0.26%"],["2023-01-30","1909.00","1888.00","27.99","1.50%","1880.00","1909.00","35923","679975.69","0.29%"],["2023-01-20","1889.19","1860.01","-20.20","-1.07%","1858.00","1898.25","25609","480735.59","0.20%"],["2023-01-19","1892.50","1880.21","-12.79","-0.68%","1866.00","1892.52","23439","440199.44","0.19%"],["2023-01-18","1914.00","1893.00","-15.00","-0.79%","1890.00","1925.30","21063","400866.53","0.17%"],["2023-01-17","1913.16","1908.00","-4.90","-0.26%","1895.00","1923.00","21299","406832.16","0.17%"],["2023-01-16","1886.00","1912.90","25.90","1.37%","1881.00","1935.00","36848","705998.31","0.29%"],["2023-01-13","1844.18","1887.00","53.00","2.89%","1840.00","1888.00","31940","596987.62","0.25%"],["2023-01-12","1848.00","1834.00","-10.95","-0.59%","1833.00","1856.00","17193","316263.72","0.14%"],["2023-01-11","1856.00","1844.95","-9.50","-0.51%","1836.84","1860.00","22720","420148.78","0.18%"],["2023-01-10","1839.06","1854.45","13.25","0.72%","1830.50","1864.50","22732","420478.38","0.18%"],["2023-01-09","1835.00","1841.20","37.43","2.08%","1807.82","1849.98","30977","568418.12","0.25%"],["2023-01-06","1806.12","1803.77","2.77","0.15%","1787.00","1811.90","24904","448083.88","0.20%"],["2023-01-05","1737.00","1801.00","75.99","4.41%","1733.00","1801.00","47943","854158.69","0.38%"],["2023-01-04","1730.00","1725.01","-5.00","-0.29%","1716.00","1738.70","20416","352358.22","0.16%"],["2023-01-03","1731.20","1730.01","3.01","0.17%","1706.01","1738.43","26034","448776.03","0.21%"],["2022-12-30","1736.00","1727.00","8.00","0.47%","1727.00","1752.99","25333","440954.41","0.20%"],["2022-12-29","1717.00","1719.00","-14.00","-0.81%","1701.05","1726.99","22418","384449.97","0.18%"],["2022-12-28","1745.88","1733.00","0.00","0.00%","1708.01","1747.00","21438","369994.91","0.17%"],["2022-12-27","1738.00","1733.00","12.85","0.75%","1725.50","1747.15","17905","310927.03","0.14%"],["2022-12-26","1771.00","1742.06","-28.94","-1.63%","1735.02","1771.00","21384","374912.09","0.17%"],["2022-12-23","1752.40","1771.00","3.00","0.17%","1745.00","1782.00","17319","306360.84","0.14%"],["2022-12-22","1756.70","1768.00","29.00","1.67%","1745.00","1783.00","23175","409386.16","0.18%"],["2022-12-21","1724.00","1739.00","24.00","1.40%","1717.65","1739.00","22816","394892.62","0.18%"],["2022-12-20","1765.33","1715.00","-58.00","-3.27%","1682.45","1765.33","46198","794412.06","0.37%"],["2022-12-19","1798.80","1773.00","-13.87","-0.78%","1760.17","1798.80","24987","444723.66","0.20%"]],"code":"cn_600519","stat":["累计:","2022-12-19至2023-04-18","-28.87","-1.62%",1682.45,1935,1961308,35261288.98,"15.59%"]}])

替换后的： [{"status":0,"hq":[["2023-04-18","1753.00","1758.00","5.00","0.29%","1746.02","1769.00","18314","322010.75","0.15%"],["2023-04-17","1740.00","1753.00","39.58","2.31%","1728.00","1753.00","30467","530340.12","0.24%"],["2023-04-14","1726.00","1713.42","-9.58","-0.56%","1704.80","1733.00","21232","364652.69","0.17%"],["2023-04-13","1690.00","1723.00","28.90","1.71%","1684.01","1723.59","29543","504931.03","0.24%"],["2023-04-12","1747.26","1694.10","-51.40","-2.94%","1692.82","1750.00","51105","873265.75","0.41%"],["2023-04-11","1793.00","1745.50","-26.20","-1.48%","1744.00","1793.00","29209","513885.44","0.23%"],["2023-04-10","1790.88","1771.70","-19.29","-1.08%","1744.00","1790.88","29418","517115.03","0.23%"],["2023-04-07","1795.00","1790.99","-5.97","-0.33%","1788.34","1806.01","13525","242816.05","0.11%"],["2023-04-06","1805.00","1796.96","-17.63","-0.97%","1788.22","1815.90","14874","267625.19","0.12%"],["2023-04-04","1812.00","1814.59","12.52","0.69%","1787.00","1815.17","20066","361427.53","0.16%"],["2023-04-03","1825.00","1802.07","-17.93","-0.99%","1800.08","1827.77","21417","387581.16","0.17%"],["2023-03-31","1825.00","1820.00","20.00","1.11%","1819.00","1848.00","27446","502479.06","0.22%"],["2023-03-30","1793.00","1800.00","10.00","0.56%","1779.00","1805.00","19257","345357.31","0.15%"],["2023-03-29","1799.00","1790.00","8.20","0.46%","1785.07","1800.00","15393","276190.94","0.12%"],["2023-03-28","1770.00","1781.80","14.01","0.79%","1765.02","1790.00","17261","307311.31","0.14%"],["2023-03-27","1778.60","1767.79","-10.83","-0.61%","1756.00","1778.60","15296","270075.59","0.12%"],["2023-03-24","1769.08","1778.62","3.76","0.21%","1766.00","1783.60","12770","226964.92","0.10%"],["2023-03-23","1766.00","1774.86","1.51","0.09%","1765.01","1791.11","17356","308282.16","0.14%"],["2023-03-22","1780.00","1773.35","-1.65","-0.09%","1765.55","1793.00","15330","272764.88","0.12%"],["2023-03-21","1735.00","1775.00","45.40","2.62%","1723.97","1785.85","31142","549105.19","0.25%"],["2023-03-20","1751.00","1729.60","-12.40","-0.71%","1728.00","1755.00","20491","355787.22","0.16%"],["2023-03-17","1770.00","1742.00","-9.99","-0.57%","1736.00","1775.89","27023","474424.94","0.22%"],["2023-03-16","1740.00","1751.99","1.07","0.06%","1739.01","1770.00","17646","309679.09","0.14%"],["2023-03-15","1778.37","1750.92","-15.08","-0.85%","1750.12","1784.88","19213","339269.84","0.15%"],["2023-03-14","1763.78","1766.00","4.00","0.23%","1738.50","1779.88","23705","417728.91","0.19%"],["2023-03-13","1751.00","1762.00","12.00","0.69%","1749.00","1775.00","20560","362647.62","0.16%"],["2023-03-10","1751.57","1750.00","-20.02","-1.13%","1750.00","1781.00","21161","372513.91","0.17%"],["2023-03-09","1768.00","1770.02","-0.40","-0.02%","1740.00","1785.00","27612","488144.28","0.22%"],["2023-03-08","1780.02","1770.42","-17.88","-1.00%","1761.12","1785.94","22764","403578.72","0.18%"],["2023-03-07","1805.98","1788.30","-18.84","-1.04%","1788.00","1816.60","22785","410130.25","0.18%"],["2023-03-06","1818.18","1807.14","-10.90","-0.60%","1796.77","1818.50","20646","373007.94","0.16%"],["2023-03-03","1839.77","1818.04","-9.96","-0.54%","1802.48","1841.61","16198","294684.25","0.13%"],["2023-03-02","1829.00","1828.00","-10.53","-0.57%","1821.10","1838.99","13144","240529.23","0.10%"],["2023-03-01","1813.00","1838.53","24.79","1.37%","1803.23","1848.00","24458","447559.22","0.19%"],["2023-02-28","1819.00","1813.74","3.33","0.18%","1783.30","1822.01","23952","431487.69","0.19%"],["2023-02-27","1778.50","1810.41","22.41","1.25%","1775.02","1815.00","22065","397812.88","0.18%"],["2023-02-24","1810.11","1788.00","-30.00","-1.65%","1782.18","1810.19","24635","441562.16","0.20%"],["2023-02-23","1840.00","1818.00","-18.00","-0.98%","1805.25","1848.80","21881","398399.12","0.17%"],["2023-02-22","1855.01","1836.00","-31.00","-1.66%","1831.80","1863.90","21869","403101.59","0.17%"],["2023-02-21","1874.00","1867.00","-8.00","-0.43%","1851.00","1874.00","18751","349163.34","0.15%"],["2023-02-20","1821.00","1875.00","54.22","2.98%","1817.20","1878.80","29669","548880.00","0.24%"],["2023-02-17","1850.16","1820.78","-41.04","-2.20%","1820.05","1873.00","26443","488032.88","0.21%"],["2023-02-16","1841.34","1861.82","20.82","1.13%","1828.00","1887.00","33246","619691.50","0.26%"],["2023-02-15","1843.78","1841.00","-2.79","-0.15%","1835.81","1855.30","18177","335142.22","0.14%"],["2023-02-14","1856.46","1843.79","-12.56","-0.68%","1835.00","1857.40","19566","360176.94","0.16%"],["2023-02-13","1810.00","1856.35","46.35","2.56%","1810.00","1874.50","38147","705838.25","0.30%"],["2023-02-10","1810.10","1810.00","-8.00","-0.44%","1801.05","1818.49","17985","325385.94","0.14%"],["2023-02-09","1778.00","1818.00","34.00","1.91%","1775.01","1829.75","29754","540139.94","0.24%"],["2023-02-08","1800.01","1784.00","-13.00","-0.72%","1775.00","1805.97","16676","298057.47","0.13%"],["2023-02-07","1808.08","1797.00","2.00","0.11%","1787.73","1808.80","24322","437367.19","0.19%"],["2023-02-06","1780.00","1795.00","-23.00","-1.27%","1760.00","1795.00","42661","759573.94","0.34%"],["2023-02-03","1820.00","1818.00","-18.11","-0.99%","1795.68","1826.00","34945","632463.50","0.28%"],["2023-02-02","1848.38","1836.11","-8.86","-0.48%","1826.00","1859.00","29759","546550.94","0.24%"],["2023-02-01","1854.98","1844.97","-0.79","-0.04%","1811.40","1859.00","33974","624467.94","0.27%"],["2023-01-31","1896.50","1845.76","-42.24","-2.24%","1833.07","1899.95","32991","612831.12","0.26%"],["2023-01-30","1909.00","1888.00","27.99","1.50%","1880.00","1909.00","35923","679975.69","0.29%"],["2023-01-20","1889.19","1860.01","-20.20","-1.07%","1858.00","1898.25","25609","480735.59","0.20%"],["2023-01-19","1892.50","1880.21","-12.79","-0.68%","1866.00","1892.52","23439","440199.44","0.19%"],["2023-01-18","1914.00","1893.00","-15.00","-0.79%","1890.00","1925.30","21063","400866.53","0.17%"],["2023-01-17","1913.16","1908.00","-4.90","-0.26%","1895.00","1923.00","21299","406832.16","0.17%"],["2023-01-16","1886.00","1912.90","25.90","1.37%","1881.00","1935.00","36848","705998.31","0.29%"],["2023-01-13","1844.18","1887.00","53.00","2.89%","1840.00","1888.00","31940","596987.62","0.25%"],["2023-01-12","1848.00","1834.00","-10.95","-0.59%","1833.00","1856.00","17193","316263.72","0.14%"],["2023-01-11","1856.00","1844.95","-9.50","-0.51%","1836.84","1860.00","22720","420148.78","0.18%"],["2023-01-10","1839.06","1854.45","13.25","0.72%","1830.50","1864.50","22732","420478.38","0.18%"],["2023-01-09","1835.00","1841.20","37.43","2.08%","1807.82","1849.98","30977","568418.12","0.25%"],["2023-01-06","1806.12","1803.77","2.77","0.15%","1787.00","1811.90","24904","448083.88","0.20%"],["2023-01-05","1737.00","1801.00","75.99","4.41%","1733.00","1801.00","47943","854158.69","0.38%"],["2023-01-04","1730.00","1725.01","-5.00","-0.29%","1716.00","1738.70","20416","352358.22","0.16%"],["2023-01-03","1731.20","1730.01","3.01","0.17%","1706.01","1738.43","26034","448776.03","0.21%"],["2022-12-30","1736.00","1727.00","8.00","0.47%","1727.00","1752.99","25333","440954.41","0.20%"],["2022-12-29","1717.00","1719.00","-14.00","-0.81%","1701.05","1726.99","22418","384449.97","0.18%"],["2022-12-28","1745.88","1733.00","0.00","0.00%","1708.01","1747.00","21438","369994.91","0.17%"],["2022-12-27","1738.00","1733.00","12.85","0.75%","1725.50","1747.15","17905","310927.03","0.14%"],["2022-12-26","1771.00","1742.06","-28.94","-1.63%","1735.02","1771.00","21384","374912.09","0.17%"],["2022-12-23","1752.40","1771.00","3.00","0.17%","1745.00","1782.00","17319","306360.84","0.14%"],["2022-12-22","1756.70","1768.00","29.00","1.67%","1745.00","1783.00","23175","409386.16","0.18%"],["2022-12-21","1724.00","1739.00","24.00","1.40%","1717.65","1739.00","22816","394892.62","0.18%"],["2022-12-20","1765.33","1715.00","-58.00","-3.27%","1682.45","1765.33","46198","794412.06","0.37%"],["2022-12-19","1798.80","1773.00","-13.87","-0.78%","1760.17","1798.80","24987","444723.66","0.20%"]],"code":"cn_600519","stat":["累计:","2022-12-19至2023-04-18","-28.87","-1.62%",1682.45,1935,1961308,35261288.98,"15.59%"]}]

伪装成浏览器

import urllib.request


url = 'http://www.ctrip.com/'

req = urllib.request.Request(url)

req.add_header('User-Agent',
               'Mozilla/5.0 (iPhone; CPU iPhone OS 10_2_1 like Mac OS X) AppleWebKit/602.4.6 (KHTML, like Gecko) Version/10.0 Mobile/14D27 Safari/602.1')

with urllib.request.urlopen(req) as response:
    data = response.read()
    htmlstr = data.decode()
    if htmlstr.find('mobile') != -1:
        print('移动版')

移动版

使用Selenium爬取数据

from selenium import webdriver

driver = webdriver.Chrome()

driver.get('http://q.stock.sohu.com/cn/600519/lshq.shtml')
em = driver.find_element(By.id,'BIZ_hq_historySearch')
print(em.text)
## driver.close()
driver.quit()

---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

Input In [2], in <cell line: 6>()
      3 driver = webdriver.Chrome()
      5 driver.get('http://q.stock.sohu.com/cn/600519/lshq.shtml')
----> 6 em = driver.find_element_by_id('BIZ_hq_historySearch')
      7 print(em.text)
      8 ## driver.close()


AttributeError: 'WebDriver' object has no attribute 'find_element_by_id'

分析数据

使用正则表达式

import urllib.request

import os
import re

url = 'http://p.weather.com.cn/'


def findallimageurl(htmlstr):
    """从HTML代码中查找匹配的字符串"""

    ## 定义正则表达式
    pattern = r'http://\S+(?:\.png|\.jpg)'
    return re.findall(pattern, htmlstr)


def getfilename(urlstr):
    """根据图片连接地址截取图片名"""

    pos = urlstr.rfind('/')
    return urlstr[pos + 1:]


## 分析获得的url列表
url_list = []
req = urllib.request.Request(url)
with urllib.request.urlopen(req) as response:
    data = response.read()
    htmlstr = data.decode()

    url_list = findallimageurl(htmlstr)

for imagesrc in url_list:
    ## 根据图片地址下载
    req = urllib.request.Request(imagesrc)
    with urllib.request.urlopen(req) as response:
        data = response.read()
        ## 过滤掉用小于100kb字节的图片
        if len(data) < 1024 * 100:
            continue

        ## 创建download文件夹
        if not os.path.exists('download'):
            os.mkdir('download')

        ## 获得图片文件名
        filename = getfilename(imagesrc)
        filename = 'download/' + filename
        ## 保存图片到本地
        with open(filename, 'wb') as f:
            f.write(data)

    print('下载图片', filename)

下载图片 download/20230412105733E6869CA2C51FC9659543B01BCAD594C0.jpg
下载图片 download/2023041210583373DC4BF4E9ABC5CC8C084D45FB133E3A.jpg
下载图片 download/20230412105932202830A62B6E006C698504271BA9D52C.jpg
下载图片 download/20230406160425985ECFF0D26CB2A423DAECD29141F4EE.jpg
下载图片 download/20220401091431D32C5DA957F3441693885B05E271420C.jpg
下载图片 download/2023041812043228512B6723F81BA42BC286530A7AD859.jpg
下载图片 download/20230416152716215BBBA7CCF443222A245DA84B742444.jpg
下载图片 download/202304160947448C2B8A7CF30225471547902BD50AB088.jpg
下载图片 download/20230316141537671B47C5E4F520E11EE0E489187E624F.png

使用BeautifulSoup库

import os
import urllib.request

from bs4 import BeautifulSoup

url = 'http://p.weather.com.cn/'


def findallimageurl(htmlstr):
    """从HTML代码中查找匹配的字符串"""

    sp = BeautifulSoup(htmlstr, 'html.parser') #html.parser html.parser
    ## 返回所有的img标签对象
    imgtaglist = sp.find_all('img')

    ## 从img标签对象列表中返回对应的src列表
    srclist = list(map(lambda u: u.get('src'), imgtaglist))
    ## 过滤掉非.png和.jpg结尾文件src字符串
    filtered_srclist = filter(lambda u: u.lower().endswith('.png')
                                        or u.lower().endswith('.jpg'), srclist)

    return filtered_srclist


def getfilename(urlstr):
    """根据图片连接地址截取图片名"""

    pos = urlstr.rfind('/')
    return urlstr[pos + 1:]


## 分析获得的url列表
url_list = []
req = urllib.request.Request(url)
with urllib.request.urlopen(req) as response:
    data = response.read()
    htmlstr = data.decode()

    url_list = findallimageurl(htmlstr)

for imagesrc in url_list:
    ## 根据图片地址下载
    req = urllib.request.Request(imagesrc)
    with urllib.request.urlopen(req) as response:
        data = response.read()
        ## 过滤掉用小于100kb字节的图片
        if len(data) < 1024 * 100:
            continue

        ## 创建download文件夹
        if not os.path.exists('download1'):
            os.mkdir('download1')

        ## 获得图片文件名
        filename = getfilename(imagesrc)
        filename = 'download1/' + filename
        ## 保存图片到本地
        with open(filename, 'wb') as f:
            f.write(data)

    print('下载图片', filename)

下载图片 download1/20230412105733E6869CA2C51FC9659543B01BCAD594C0.jpg
下载图片 download1/2023041210583373DC4BF4E9ABC5CC8C084D45FB133E3A.jpg
下载图片 download1/20230412105932202830A62B6E006C698504271BA9D52C.jpg
下载图片 download1/20230406160425985ECFF0D26CB2A423DAECD29141F4EE.jpg
下载图片 download1/20220401091431D32C5DA957F3441693885B05E271420C.jpg
下载图片 download1/2023041812043228512B6723F81BA42BC286530A7AD859.jpg
下载图片 download1/20230416152716215BBBA7CCF443222A245DA84B742444.jpg
下载图片 download1/202304160947448C2B8A7CF30225471547902BD50AB088.jpg
下载图片 download1/20230316141537671B47C5E4F520E11EE0E489187E624F.png

爬取Nasdaq股票数据

import datetime
import hashlib
import logging
import os
import re
import threading
import time
import urllib.request

from bs4 import BeautifulSoup

from db.db_access import insert_hisq_data





logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s - %(threadName)s - '
                           '%(name)s - %(funcName)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

## url = 'https://www.nasdaq.com/symbol/aapl/historical#.UWdnJBDMhHk'
## 换成自己到路径
url = 'file:///C:/Users/HP/nasdaq-Apple1.html'

def validateUpdate(html):
    """验证数据是否更新，更新返回True，未更新返回False"""

    ## 创建md5对象
    md5obj = hashlib.md5()
    md5obj.update(html.encode(encoding='utf-8'))
    md5code = md5obj.hexdigest()

    old_md5code = ''
    f_name = 'md5.txt'

    if os.path.exists(f_name):  ## 如果文件存在读取文件内容
        with open(f_name, 'r', encoding='utf-8') as f:
            old_md5code = f.read()

    if md5code == old_md5code:
        logger.info('数据没有更新')
        return False
    else:
        ## 把新的md5码写入到文件中
        with open(f_name, 'w', encoding='utf-8') as f:
            f.write(md5code)
        logger.info('数据更新')
        return True


## 线程运行标志
isrunning = True
## 爬虫工作间隔
interval = 5


def controlthread_body():
    """控制线程体函数"""

    global interval, isrunning

    while isrunning:
        ## 控制爬虫工作计划
        i = input('输入Bye终止爬虫，输入数字改变爬虫工作间隔，单位秒：')
        logger.info('控制输入{0}'.format(i))
        try:
            interval = int(i)
        except ValueError:
            if i.lower() == 'bye':
                isrunning = False


def istradtime():
    """判断交易时间"""

    now = datetime.datetime.now()
    df = '%H%M%S'
    strnow = now.strftime(df)
    starttime = datetime.time(hour=21, minute=30).strftime(df)
    endtime = datetime.time(hour=4, minute=0).strftime(df)

    if now.weekday() == 5 \
            or now.weekday() == 6 \
            or (endtime < strnow < starttime):
        ## 非工作时间
        return False
    ## 工作时间
    return True


def workthread_body():
    """工作线程体函数"""

    global interval, isrunning

    while isrunning:

        if istradtime():
            ## 交易时间内不工作
            logger.info('交易时间，爬虫休眠1小时...')
            time.sleep(60 * 60)
            continue

        logger.info('爬虫开始工作...')
        req = urllib.request.Request(url)

        with urllib.request.urlopen(req) as response:
            data = response.read()
            html = data.decode()

            sp = BeautifulSoup(html, 'html.parser')
            ## 返回指定CSS选择器的div标签列表
            div = sp.select('div#quotes_content_left_pnlAJAX')
            ## 从列表中返回第一个元素
            divstring = div[0]

            if validateUpdate(divstring):  ## 数据更新
                ## 分析数据
                trlist = sp.select('div#quotes_content_left_pnlAJAX table tbody tr')

                data = []

                for tr in trlist:
                    trtext = tr.text.strip('\n\r ')
                    if trtext == '':
                        continue

                    rows = re.split(r'\s+', trtext)
                    fields = {}
                    try:
                        df = '%m/%d/%Y'
                        fields['Date'] = datetime.datetime.strptime(rows[0], df)
                    except ValueError:
                        ## 实时数据不分析（只有时间，如10:12）
                        continue
                    fields['Open'] = float(rows[1])
                    fields['High'] = float(rows[2])
                    fields['Low'] = float(rows[3])
                    fields['Close'] = float(rows[4])
                    fields['Volume'] = int(rows[5].replace(',', ''))
                    data.append(fields)

                ## 保存数据到数据库
                for row in data:
                    row['Symbol'] = 'AAPL'
                    insert_hisq_data(row)

            ## 爬虫休眠
            logger.info('爬虫休眠{0}秒...'.format(interval))
            time.sleep(interval)


def main():
    """主函数"""

    global interval, isrunning
    ## 创建工作线程对象workthread
    workthread = threading.Thread(target=workthread_body, name='WorkThread')
    ## 启动线程workthread
    workthread.start()

    ## 创建控制线程对象controlthread
    controlthread = threading.Thread(target=controlthread_body, name='ControlThread')
    ## 启动线程controlthread
    controlthread.start()


if __name__ == '__main__':
    main()

2023-04-19 15:46:27,709 - WorkThread - __main__ - workthread_body - INFO - 爬虫开始工作...
2023-04-19 15:46:28,157 - WorkThread - __main__ - validateUpdate - INFO - 数据更新
2023-04-19 15:46:28,236 - WorkThread - __main__ - workthread_body - INFO - 爬虫休眠5秒...
2023-04-19 15:46:33,247 - WorkThread - __main__ - workthread_body - INFO - 爬虫开始工作...
2023-04-19 15:46:33,255 - WorkThread - __main__ - validateUpdate - INFO - 数据没有更新
2023-04-19 15:46:33,256 - WorkThread - __main__ - workthread_body - INFO - 爬虫休眠5秒...


输入Bye终止爬虫，输入数字改变爬虫工作间隔，单位秒：3600


2023-04-19 15:46:36,048 - ControlThread - __main__ - controlthread_body - INFO - 控制输入3600


输入Bye终止爬虫，输入数字改变爬虫工作间隔，单位秒：

Exception in thread ControlThread:
Traceback (most recent call last):
  File "E:\anaconda\lib\threading.py", line 973, in _bootstrap_inner
    self.run()
  File "E:\anaconda\lib\threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\HP\AppData\Local\Temp\ipykernel_22288\985097547.py", line 66, in controlthread_body
EOFError: EOF when reading a line
2023-04-19 15:46:38,259 - WorkThread - __main__ - workthread_body - INFO - 爬虫开始工作...
2023-04-19 15:46:38,267 - WorkThread - __main__ - validateUpdate - INFO - 数据没有更新
2023-04-19 15:46:38,267 - WorkThread - __main__ - workthread_body - INFO - 爬虫休眠3600秒...

Pandas进阶

import numpy as np
import pandas as pd

data = pd.Series(np.random.randn(9), index=[['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'd'], [1, 2, 3, 1, 3, 1, 2, 2, 3]])
data

a  1   -0.018841
   2    0.291057
   3   -0.869647
b  1    0.500437
   3   -1.678710
c  1   -1.957127
   2   -0.563527
d  2    0.454833
   3   -0.343765
dtype: float64

data.index

MultiIndex([('a', 1),
            ('a', 2),
            ('a', 3),
            ('b', 1),
            ('b', 3),
            ('c', 1),
            ('c', 2),
            ('d', 2),
            ('d', 3)],
           )

data['b']

1    0.500437
3   -1.678710
dtype: float64

data['b':'c']

b  1    0.500437
   3   -1.678710
c  1   -1.957127
   2   -0.563527
dtype: float64

data.loc[['b','d']]

b  1    0.500437
   3   -1.678710
d  2    0.454833
   3   -0.343765
dtype: float64

data.loc[:,2]

a    0.291057
c   -0.563527
d    0.454833
dtype: float64

frame = pd.DataFrame(np.arange(12).reshape((4, 3)), index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],                 columns=[['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']])

frame

		Ohio		Colorado
		Green	Red	Green
a	1	0	1	2
a	2	3	4	5
b	1	6	7	8
b	2	9	10	11

frame.index.names = ['key1', 'key2']
frame.columns.names = ['state', 'color']
frame

	state	Ohio		Colorado
	color	Green	Red	Green
key1	key2
a	1	0	1	2
a	2	3	4	5
b	1	6	7	8
b	2	9	10	11

frame['Ohio']

	color	Green	Red
key1	key2
a	1	0	1
a	2	3	4
b	1	6	7
b	2	9	10

from pandas import *
MultiIndex.from_arrays([['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']], names=['state', 'color'])

MultiIndex([(    'Ohio', 'Green'),
            (    'Ohio',   'Red'),
            ('Colorado', 'Green')],
           names=['state', 'color'])

frame.swaplevel('key1', 'key2')

	state	Ohio		Colorado
	color	Green	Red	Green
key2	key1
1	a	0	1	2
2	a	3	4	5
1	b	6	7	8
2	b	9	10	11

frame.sort_index(level=1)

	state	Ohio		Colorado
	color	Green	Red	Green
key1	key2
a	1	0	1	2
b	1	6	7	8
a	2	3	4	5
b	2	9	10	11

frame.swaplevel(0, 1).sort_index(level=0)

	state	Ohio		Colorado
	color	Green	Red	Green
key2	key1
1	a	0	1	2
1	b	6	7	8
2	a	3	4	5
2	b	9	10	11

frame.sum(level='key2')

C:\Users\HP\AppData\Local\Temp\ipykernel_21392\2004046222.py:1: FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.sum(level=1) should use df.groupby(level=1).sum().
  frame.sum(level='key2')

state	Ohio		Colorado
color	Green	Red	Green
key2
1	6	8	10
2	12	14	16

frame.sum(level='color', axis=1)

C:\Users\HP\AppData\Local\Temp\ipykernel_21392\4133796543.py:1: FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.sum(level=1) should use df.groupby(level=1).sum().
  frame.sum(level='color', axis=1)

	color	Green	Red
key1	key2
a	1	2	1
a	2	8	4
b	1	14	7
b	2	20	10

frame.describe()

state	Ohio		Colorado
color	Green	Red	Green
count	4.000000	4.000000	4.000000
mean	4.500000	5.500000	6.500000
std	3.872983	3.872983	3.872983
min	0.000000	1.000000	2.000000
25%	2.250000	3.250000	4.250000
50%	4.500000	5.500000	6.500000
75%	6.750000	7.750000	8.750000
max	9.000000	10.000000	11.000000

frame = pd.DataFrame({'a': range(7), 'b': range(7, 0, -1), 'c': ['one', 'one', 'one', 'two', 'two', 'two', 'two'], 'd': [0, 1, 2, 0, 1, 2, 3]})

frame

	a	b	c	d
0	0	7	one	0
1	1	6	one	1
2	2	5	one	2
3	3	4	two	0
4	4	3	two	1
5	5	2	two	2
6	6	1	two	3

frame2 = frame.set_index(['c', 'd'])
frame2

		a	b
c	d
one	0	0	7
	1	1	6
	2	2	5
two	0	3	4
	1	4	3
	2	5	2
	3	6	1

frame.set_index(['c', 'd'], drop=False)

		a	b	c	d
c	d
one	0	0	7	one	0
	1	1	6	one	1
	2	2	5	one	2
two	0	3	4	two	0
	1	4	3	two	1
	2	5	2	two	2
	3	6	1	two	3

frame2.reset_index()

	c	d	a	b
0	one	0	0	7
1	one	1	1	6
2	one	2	2	5
3	two	0	3	4
4	two	1	4	3
5	two	2	5	2
6	two	3	6	1

import pandas as pd
import numpy as np
df1 = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1': range(7)})
df2 = pd.DataFrame({'key': ['a', 'b', 'd'], 'data2': range(3)})

df1

	key	data1
0	b	0
1	b	1
2	a	2
3	c	3
4	a	4
5	a	5
6	b	6

df2

	key	data2
0	a	0
1	b	1
2	d	2

pd.merge(df1,df2)

	key	data1	data2
0	b	0	1
1	b	1	1
2	b	6	1
3	a	2	0
4	a	4	0
5	a	5	0

pd.merge(df1,df2,on='key')

	key	data1	data2
0	b	0	1
1	b	1	1
2	b	6	1
3	a	2	0
4	a	4	0
5	a	5	0

df3 = pd.DataFrame({'lkey': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1': range(7)})
df4 = pd.DataFrame({'rkey': ['a', 'b', 'd'], 'data2': range(3)})
pd.merge(df3, df4, left_on='lkey', right_on='rkey')  #分别指定列名

	lkey	data1	rkey	data2
0	b	0	b	1
1	b	1	b	1
2	b	6	b	1
3	a	2	a	0
4	a	4	a	0
5	a	5	a	0

df3

	lkey	data1
0	b	0
1	b	1
2	a	2
3	c	3
4	a	4
5	a	5
6	b	6

df4

	rkey	data2
0	a	0
1	b	1
2	d	2

pd.merge(df1,df2,how='outer')

	key	data1	data2
0	b	0.0	1.0
1	b	1.0	1.0
2	b	6.0	1.0
3	a	2.0	0.0
4	a	4.0	0.0
5	a	5.0	0.0
6	c	3.0	NaN
7	d	NaN	2.0

pd.merge(df1,df2,how='left')

	key	data1	data2
0	b	0	1.0
1	b	1	1.0
2	a	2	0.0
3	c	3	NaN
4	a	4	0.0
5	a	5	0.0
6	b	6	1.0

pd.merge(df1,df2,how='right')

	key	data1	data2
0	a	2.0	0
1	a	4.0	0
2	a	5.0	0
3	b	0.0	1
4	b	1.0	1
5	b	6.0	1
6	d	NaN	2

df1

	key	data1
0	b	0
1	b	1
2	a	2
3	c	3
4	a	4
5	a	5
6	b	6

df2

	key	data2
0	a	0
1	b	1
2	d	2

left = pd.DataFrame({'key1': ['foo', 'foo', 'bar'], 'key2': ['one', 'two', 'one'], 'lval': [1, 2, 3]})
right = pd.DataFrame({'key1': ['foo', 'foo', 'bar', 'bar'], 'key2': ['one', 'one', 'one', 'two'], 'rval': [4, 5, 6, 7]})
pd.merge(left, right, on=['key1', 'key2'], how='outer')

	key1	key2	lval	rval
0	foo	one	1.0	4.0
1	foo	one	1.0	5.0
2	foo	two	2.0	NaN
3	bar	one	3.0	6.0
4	bar	two	NaN	7.0

pd.merge(left, right, on='key1')

	key1	key2_x	lval	key2_y	rval
0	foo	one	1	one	4
1	foo	one	1	one	5
2	foo	two	2	one	4
3	foo	two	2	one	5
4	bar	one	3	one	6
5	bar	one	3	two	7

left

	key1	key2	lval
0	foo	one	1
1	foo	two	2
2	bar	one	3

right

	key1	key2	rval
0	foo	one	4
1	foo	one	5
2	bar	one	6
3	bar	two	7

pd.merge(left, right, on='key1', suffixes=('_left', '_right'))

	key1	key2_left	lval	key2_right	rval
0	foo	one	1	one	4
1	foo	one	1	one	5
2	foo	two	2	one	4
3	foo	two	2	one	5
4	bar	one	3	one	6
5	bar	one	3	two	7

import pandas as pd
left1 = pd.DataFrame({'key': ['a', 'b', 'a', 'a', 'b', 'c'], 'value': range(6)})
right1 = pd.DataFrame({'group_val': [3.5, 7]}, index=['a', 'b'])
left1

	key	value
0	a	0
1	b	1
2	a	2
3	a	3
4	b	4
5	c	5

right1

	group_val
a	3.5
b	7.0

pd.merge(left1, right1, left_on='key', right_index=True)

	key	value	group_val
0	a	0	3.5
2	a	2	3.5
3	a	3	3.5
1	b	1	7.0
4	b	4	7.0

import pandas as pd

import numpy as np
lefth = pd.DataFrame({'key1': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], 'key2': [2000, 2001, 2002, 2001, 2002], 'data': np.arange(5.)})
righth = pd.DataFrame(np.arange(12).reshape((6, 2)),  index=[['Nevada', 'Nevada', 'Ohio', 'Ohio', 'Ohio', 'Ohio'], [2001, 2000, 2000, 2000, 2001, 2002]], columns=['event1', 'event2'])
lefth

	key1	key2	data
0	Ohio	2000	0.0
1	Ohio	2001	1.0
2	Ohio	2002	2.0
3	Nevada	2001	3.0
4	Nevada	2002	4.0

righth

		event1	event2
Nevada	2001	0	1
Nevada	2000	2	3
Ohio	2000	4	5
	2000	6	7
	2001	8	9
	2002	10	11

pd.merge(lefth, righth, left_on=['key1', 'key2'], right_index=True)

	key1	key2	data	event1	event2
0	Ohio	2000	0.0	4	5
0	Ohio	2000	0.0	6	7
1	Ohio	2001	1.0	8	9
2	Ohio	2002	2.0	10	11
3	Nevada	2001	3.0	0	1

pd.merge(lefth, righth, left_on=['key1', 'key2'], right_index=True, how='outer')

	key1	key2	data	event1	event2
0	Ohio	2000	0.0	4.0	5.0
0	Ohio	2000	0.0	6.0	7.0
1	Ohio	2001	1.0	8.0	9.0
2	Ohio	2002	2.0	10.0	11.0
3	Nevada	2001	3.0	0.0	1.0
4	Nevada	2002	4.0	NaN	NaN
4	Nevada	2000	NaN	2.0	3.0

left2 = pd.DataFrame([[1., 2.], [3., 4.], [5., 6.]], index=['a', 'c', 'e'], columns=['Ohio', 'Nevada'])
right2 = pd.DataFrame([[7., 8.], [9., 10.], [11., 12.], [13, 14]], index=['b', 'c', 'd', 'e'], columns=['Missouri', 'Alabama'])
left2

	Ohio	Nevada
a	1.0	2.0
c	3.0	4.0
e	5.0	6.0

right2

	Missouri	Alabama
b	7.0	8.0
c	9.0	10.0
d	11.0	12.0
e	13.0	14.0

pd.merge(left2, right2, how='outer', left_index=True, right_index=True)

	Ohio	Nevada	Missouri	Alabama
a	1.0	2.0	NaN	NaN
b	NaN	NaN	7.0	8.0
c	3.0	4.0	9.0	10.0
d	NaN	NaN	11.0	12.0
e	5.0	6.0	13.0	14.0

left2.join(right2, how='outer')

	Ohio	Nevada	Missouri	Alabama
a	1.0	2.0	NaN	NaN
b	NaN	NaN	7.0	8.0
c	3.0	4.0	9.0	10.0
d	NaN	NaN	11.0	12.0
e	5.0	6.0	13.0	14.0

another = pd.DataFrame([[7., 8.], [9., 10.], [11., 12.], [16., 17.]], index=['a', 'c', 'e', 'f'],                     columns=['New York', 'Oregon'])
another

	New York	Oregon
a	7.0	8.0
c	9.0	10.0
e	11.0	12.0
f	16.0	17.0

left2.join(right2)

	Ohio	Nevada	Missouri	Alabama
a	1.0	2.0	NaN	NaN
c	3.0	4.0	9.0	10.0
e	5.0	6.0	13.0	14.0

left2.join([right2, another])

	Ohio	Nevada	Missouri	Alabama	New York	Oregon
a	1.0	2.0	NaN	NaN	7.0	8.0
c	3.0	4.0	9.0	10.0	9.0	10.0
e	5.0	6.0	13.0	14.0	11.0	12.0

left2.join([right2, another], how='outer')

	Ohio	Nevada	Missouri	Alabama	New York	Oregon
a	1.0	2.0	NaN	NaN	7.0	8.0
c	3.0	4.0	9.0	10.0	9.0	10.0
e	5.0	6.0	13.0	14.0	11.0	12.0
b	NaN	NaN	7.0	8.0	NaN	NaN
d	NaN	NaN	11.0	12.0	NaN	NaN
f	NaN	NaN	NaN	NaN	16.0	17.0

s1 = pd.Series([0, 1], index=['a', 'b'])
s2 = pd.Series([2, 3, 4], index=['c', 'd', 'e'])
s3 = pd.Series([5, 6], index=['f', 'g'])
pd.concat([s1, s2, s3])

a    0
b    1
c    2
d    3
e    4
f    5
g    6
dtype: int64

pd.concat([s1, s2, s3], axis=1) #变为DataFrame

	0	1	2
a	0.0	NaN	NaN
b	1.0	NaN	NaN
c	NaN	2.0	NaN
d	NaN	3.0	NaN
e	NaN	4.0	NaN
f	NaN	NaN	5.0
g	NaN	NaN	6.0

s4 = pd.concat([s1, s3])
s4

a    0
b    1
f    5
g    6
dtype: int64

pd.concat([s1, s4], axis=1)

	0	1
a	0.0	0
b	1.0	1
f	NaN	5
g	NaN	6

pd.concat([s1, s4], axis=1, join='inner')

	0	1
a	0	0
b	1	1

pd.concat([s1, s4], axis=1, join_axes=[['a', 'c', 'b', 'e']])

---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

Input In [20], in <cell line: 1>()
----> 1 pd.concat([s1, s4], axis=1, join_axes=[['a', 'c', 'b', 'e']])


File E:\anaconda\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)


TypeError: concat() got an unexpected keyword argument 'join_axes'

result = pd.concat([s1, s1, s3], keys=['one', 'two', 'three'])
result

one    a    0
       b    1
two    a    0
       b    1
three  f    5
       g    6
dtype: int64

result.unstack()

	a	b	f	g
one	0.0	1.0	NaN	NaN
two	0.0	1.0	NaN	NaN
three	NaN	NaN	5.0	6.0

pd.concat([s1, s2, s3], axis=1, keys=['one', 'two', 'three'])

	one	two	three
a	0.0	NaN	NaN
b	1.0	NaN	NaN
c	NaN	2.0	NaN
d	NaN	3.0	NaN
e	NaN	4.0	NaN
f	NaN	NaN	5.0
g	NaN	NaN	6.0

df1 = pd.DataFrame(np.arange(6).reshape(3, 2), index=['a', 'b', 'c'], columns=['one', 'two'])
df2 = pd.DataFrame(5 + np.arange(4).reshape(2, 2), index=['a', 'c'], columns=['three', 'four'])
df1

	one	two
a	0	1
b	2	3
c	4	5

df2

	three	four
a	5	6
c	7	8

pd.concat([df1, df2], axis=1, keys=['level1', 'level2'])

	level1		level2
	one	two	three	four
a	0	1	5.0	6.0
b	2	3	NaN	NaN
c	4	5	7.0	8.0

pd.concat({'level1': df1, 'level2': df2}, axis=1)

	level1		level2
	one	two	three	four
a	0	1	5.0	6.0
b	2	3	NaN	NaN
c	4	5	7.0	8.0

pd.concat([df1, df2], axis=1, keys=['level1', 'level2'], names=['upper', 'lower'])

upper	level1		level2
lower	one	two	three	four
a	0	1	5.0	6.0
b	2	3	NaN	NaN
c	4	5	7.0	8.0

df1 = pd.DataFrame(np.random.randn(3, 4), columns=['a', 'b', 'c', 'd'])
df2 = pd.DataFrame(np.random.randn(2, 3), columns=['b', 'd', 'a'])
df1

	a	b	c	d
0	0.527674	2.145525	1.979097	1.702063
1	-0.350557	-0.511584	-1.061349	-0.702928
2	-1.239068	-1.240555	-0.295705	0.209181

df2

	b	d	a
0	1.718647	-2.931403	0.129779
1	1.482412	-1.022705	-1.186445

pd.concat([df1, df2], ignore_index=True)

	a	b	c	d
0	0.527674	2.145525	1.979097	1.702063
1	-0.350557	-0.511584	-1.061349	-0.702928
2	-1.239068	-1.240555	-0.295705	0.209181
3	0.129779	1.718647	NaN	-2.931403
4	-1.186445	1.482412	NaN	-1.022705

a = pd.Series([np.nan, 2.5, np.nan, 3.5, 4.5, np.nan], index=['f', 'e', 'd', 'c', 'b', 'a'])
b = pd.Series(np.arange(len(a), dtype=np.float64), index=['f', 'e', 'd', 'c', 'b', 'a'])
b[-1] = np.nan

f    NaN
e    2.5
d    NaN
c    3.5
b    4.5
a    NaN
dtype: float64

f    0.0
e    1.0
d    2.0
c    3.0
b    4.0
a    NaN
dtype: float64

np.where(pd.isnull(a), b, a)

array([0. , 2.5, 2. , 3.5, 4.5, nan])

b[:-2].combine_first(a[2:])

a    NaN
b    4.5
c    3.0
d    2.0
e    1.0
f    0.0
dtype: float64

df1 = pd.DataFrame({'a': [1., np.nan, 5., np.nan], 'b': [np.nan, 2., np.nan, 6.], 'c': range(2, 18, 4)})
df2 = pd.DataFrame({'a': [5., 4., np.nan, 3., 7.], 'b': [np.nan, 3., 4., 6., 8.]})
df1

	a	b	c
0	1.0	NaN	2
1	NaN	2.0	6
2	5.0	NaN	10
3	NaN	6.0	14

df2

	a	b
0	5.0	NaN
1	4.0	3.0
2	NaN	4.0
3	3.0	6.0
4	7.0	8.0

df1.combine_first(df2)

	a	b	c
0	1.0	NaN	2.0
1	4.0	2.0	6.0
2	5.0	4.0	10.0
3	3.0	6.0	14.0
4	7.0	8.0	NaN

data = pd.DataFrame(np.arange(6).reshape((2, 3)), index=pd.Index(['Ohio', 'Colorado'], name='state'), columns=pd.Index(['one', 'two', 'three'], name='number'))
data
result = data.stack()
result
result.unstack()  #默认操作最内层
result.unstack(0)  #指定分层编号

state	Ohio	Colorado
number
one	0	3
two	1	4
three	2	5

data = pd.DataFrame(np.arange(6).reshape((2, 3)), index=pd.Index(['Ohio', 'Colorado'], name='state'), columns=pd.Index(['one', 'two', 'three'], name='number'))

data

number	one	two	three
state
Ohio	0	1	2
Colorado	3	4	5

result = data.stack()
result

state     number
Ohio      one       0
          two       1
          three     2
Colorado  one       3
          two       4
          three     5
dtype: int32

result.unstack()  #默认操作最内层

number	one	two	three
state
Ohio	0	1	2
Colorado	3	4	5

result.unstack(0)  #指定操作最内层

state	Ohio	Colorado
number
one	0	3
two	1	4
three	2	5

s1 = pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([4, 5, 6], index=['c', 'd', 'e'])
data2 = pd.concat([s1, s2], keys=['one', 'two'])
data2
data2.unstack() #默认引入缺失数据
data2.unstack().stack()
data2.unstack().stack(dropna=False)

one  a    0.0
     b    1.0
     c    2.0
     d    3.0
     e    NaN
two  a    NaN
     b    NaN
     c    4.0
     d    5.0
     e    6.0
dtype: float64

s1 = pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([4, 5, 6], index=['c', 'd', 'e'])
data2 = pd.concat([s1, s2], keys=['one', 'two'])
data2

one  a    0
     b    1
     c    2
     d    3
two  c    4
     d    5
     e    6
dtype: int64

data2.unstack() #默认引入缺失数据

	a	b	c	d	e
one	0.0	1.0	2.0	3.0	NaN
two	NaN	NaN	4.0	5.0	6.0

data2.unstack().stack()

one  a    0.0
     b    1.0
     c    2.0
     d    3.0
two  c    4.0
     d    5.0
     e    6.0
dtype: float64

data2.unstack().stack(dropna=False)

one  a    0.0
     b    1.0
     c    2.0
     d    3.0
     e    NaN
two  a    NaN
     b    NaN
     c    4.0
     d    5.0
     e    6.0
dtype: float64

df = pd.DataFrame({'left': result, 'right': result + 5}, columns=pd.Index(['left', 'right'], name='side'))

df

	side	left	right
state	number
Ohio	one	0	5
	two	1	6
	three	2	7
Colorado	one	3	8
	two	4	9
	three	5	10

df.unstack('state')

side	left		right
state	Ohio	Colorado	Ohio	Colorado
number
one	0	3	5	8
two	1	4	6	9
three	2	5	7	10

df.unstack('state').stack('side')

	state	Colorado	Ohio
number	side
one	left	3	0
one	right	8	5
two	left	4	1
two	right	9	6
three	left	5	2
three	right	10	7

data

number	one	two	three
state
Ohio	0	1	2
Colorado	3	4	5

data

number	one	two	three
state
Ohio	0	1	2
Colorado	3	4	5

data = pd.DataFrame({'k1': ['one', 'two'] * 3 + ['two'], 'k2': [1, 1, 2, 3, 3, 4, 4]})
data

	k1	k2
0	one	1
1	two	1
2	one	2
3	two	3
4	one	3
5	two	4
6	two	4

data.duplicated() #默认判断全部列

0    False
1    False
2    False
3    False
4    False
5    False
6     True
dtype: bool

data.drop_duplicates() #默认保留第一次出现的值

	k1	k2
0	one	1
1	two	1
2	one	2
3	two	3
4	one	3
5	two	4

data['v1'] = range(7)
data.drop_duplicates(['k1'])

	k1	k2	v1
0	one	1	0
1	two	1	1

data.drop_duplicates(['k1', 'k2'], keep='last')

	k1	k2	v1
0	one	1	0
1	two	1	1
2	one	2	2
3	two	3	3
4	one	3	4
6	two	4	6

data = pd.DataFrame({'food': ['bacon', 'pulled pork', 'bacon', 'Pastrami', 'corned beef', 'Bacon',
                'pastrami', 'honey ham', 'nova lox'], 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})
data

	food	ounces
0	bacon	4.0
1	pulled pork	3.0
2	bacon	12.0
3	Pastrami	6.0
4	corned beef	7.5
5	Bacon	8.0
6	pastrami	3.0
7	honey ham	5.0
8	nova lox	6.0

meat_to_animal = {
  'bacon': 'pig',
  'pulled pork': 'pig',
  'pastrami': 'cow',
  'corned beef': 'cow',
  'honey ham': 'pig',
  'nova lox': 'salmon'
}

lowercased = data['food'].str.lower()
lowercased

0          bacon
1    pulled pork
2          bacon
3       pastrami
4    corned beef
5          bacon
6       pastrami
7      honey ham
8       nova lox
Name: food, dtype: object

data['animal'] = lowercased.map(meat_to_animal)
data

	food	ounces	animal
0	bacon	4.0	pig
1	pulled pork	3.0	pig
2	bacon	12.0	pig
3	Pastrami	6.0	cow
4	corned beef	7.5	cow
5	Bacon	8.0	pig
6	pastrami	3.0	cow
7	honey ham	5.0	pig
8	nova lox	6.0	salmon

data['food'].map(lambda x: meat_to_animal[x.lower()])

0       pig
1       pig
2       pig
3       cow
4       cow
5       pig
6       cow
7       pig
8    salmon
Name: food, dtype: object

data = pd.Series([1., -999., 2., -999., -1000., 3.])
data

0       1.0
1    -999.0
2       2.0
3    -999.0
4   -1000.0
5       3.0
dtype: float64

data.replace(-999, np.nan)

0       1.0
1       NaN
2       2.0
3       NaN
4   -1000.0
5       3.0
dtype: float64

data.replace([-999, -1000], [np.nan, 0])

0    1.0
1    NaN
2    2.0
3    NaN
4    0.0
5    3.0
dtype: float64

data.replace({-999: np.nan, -1000: 0})

0    1.0
1    NaN
2    2.0
3    NaN
4    0.0
5    3.0
dtype: float64

data = pd.DataFrame(np.arange(12).reshape((3, 4)), index=['Ohio', 'Colorado', 'New York'],                   columns=['one', 'two', 'three', 'four'])
data.rename(index=str.title, columns=str.upper)

	ONE	TWO	THREE	FOUR
Ohio	0	1	2	3
Colorado	4	5	6	7
New York	8	9	10	11

transform = lambda x: x[:4].upper()
data.index.map(transform)

Index(['OHIO', 'COLO', 'NEW '], dtype='object')

data.index=data.index.map(transform)
data

	one	two	three	four
OHIO	0	1	2	3
COLO	4	5	6	7
NEW	8	9	10	11

data.rename(index={'OHIO': 'INDIANA'},  columns={'three': 'peekaboo'})

	one	two	peekaboo	four
INDIANA	0	1	2	3
COLO	4	5	6	7
NEW	8	9	10	11

data.rename(index={'OHIO': 'INDIANA'}, inplace=True)
data

	one	two	three	four
INDIANA	0	1	2	3
COLO	4	5	6	7
NEW	8	9	10	11

ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32]
bins = [18, 25, 35, 60, 100]
cats = pd.cut(ages, bins)
cats

[(18, 25], (18, 25], (18, 25], (25, 35], (18, 25], ..., (25, 35], (60, 100], (35, 60], (35, 60], (25, 35]]
Length: 12
Categories (4, interval[int64, right]): [(18, 25] < (25, 35] < (35, 60] < (60, 100]]

cats.codes       #分组的编码

array([0, 0, 0, 1, 0, 0, 2, 1, 3, 2, 2, 1], dtype=int8)

cats.categories

IntervalIndex([(18, 25], (25, 35], (35, 60], (60, 100]], dtype='interval[int64, right]')

pd.value_counts(cats)

(18, 25]     5
(25, 35]     3
(35, 60]     3
(60, 100]    1
dtype: int64

pd.cut(ages, [18, 26, 36, 61, 100], right=False)

[[18, 26), [18, 26), [18, 26), [26, 36), [18, 26), ..., [26, 36), [61, 100), [36, 61), [36, 61), [26, 36)]
Length: 12
Categories (4, interval[int64, left]): [[18, 26) < [26, 36) < [36, 61) < [61, 100)]

group_names = ['Youth', 'YoungAdult', 'MiddleAged', 'Senior']
pd.cut(ages, bins, labels=group_names)

['Youth', 'Youth', 'Youth', 'YoungAdult', 'Youth', ..., 'YoungAdult', 'Senior', 'MiddleAged', 'MiddleAged', 'YoungAdult']
Length: 12
Categories (4, object): ['Youth' < 'YoungAdult' < 'MiddleAged' < 'Senior']

data = np.random.rand(20)

data

array([0.12967787, 0.87168374, 0.24167497, 0.56688941, 0.22964312,
       0.30205167, 0.88297675, 0.22349301, 0.18292263, 0.81072534,
       0.25054152, 0.99378214, 0.78439125, 0.3970331 , 0.89049743,
       0.51677834, 0.76808437, 0.54701119, 0.79386529, 0.25451132])

temp=pd.cut(data, 4, precision=2)   #划分的分组数而不是边界，边界按最大最小平均分
temp

[(0.13, 0.35], (0.78, 0.99], (0.13, 0.35], (0.56, 0.78], (0.13, 0.35], ..., (0.35, 0.56], (0.56, 0.78], (0.35, 0.56], (0.78, 0.99], (0.13, 0.35]]
Length: 20
Categories (4, interval[float64, right]): [(0.13, 0.35] < (0.35, 0.56] < (0.56, 0.78] < (0.78, 0.99]]

pd.value_counts(temp)

(0.13, 0.35]    8
(0.78, 0.99]    7
(0.35, 0.56]    3
(0.56, 0.78]    2
dtype: int64

data = np.random.randn(1000)  ## Normally distributed
cats = pd.qcut(data, 4)  #将所有数据平均分为4部分
cats

[(-0.726, -0.00747], (-0.00747, 0.636], (-3.057, -0.726], (-3.057, -0.726], (-0.00747, 0.636], ..., (-0.726, -0.00747], (-0.726, -0.00747], (-0.726, -0.00747], (0.636, 2.834], (-3.057, -0.726]]
Length: 1000
Categories (4, interval[float64, right]): [(-3.057, -0.726] < (-0.726, -0.00747] < (-0.00747, 0.636] < (0.636, 2.834]]

pd.value_counts(cats)

(-3.057, -0.726]      250
(-0.726, -0.00747]    250
(-0.00747, 0.636]     250
(0.636, 2.834]        250
dtype: int64

pd.qcut(data, [0, 0.1, 0.5, 0.9, 1.])

[(-1.239, -0.00747], (-0.00747, 1.338], (-3.057, -1.239], (-3.057, -1.239], (-0.00747, 1.338], ..., (-1.239, -0.00747], (-1.239, -0.00747], (-1.239, -0.00747], (1.338, 2.834], (-3.057, -1.239]]
Length: 1000
Categories (4, interval[float64, right]): [(-3.057, -1.239] < (-1.239, -0.00747] < (-0.00747, 1.338] < (1.338, 2.834]]

data = pd.DataFrame(np.random.randn(1000, 4))
data.describe()

	0	1	2	3
count	1000.000000	1000.000000	1000.000000	1000.000000
mean	-0.068024	0.015781	0.048655	-0.019467
std	1.050557	0.963683	0.972374	1.031390
min	-3.617567	-2.550853	-3.372664	-3.196753
25%	-0.718715	-0.591289	-0.606569	-0.712316
50%	-0.066156	0.004574	0.068207	0.000122
75%	0.627520	0.662984	0.747493	0.673216
max	2.940831	2.865724	3.369795	3.364796

col=data[2]
col[np.abs(col) > 3]

340    3.196054
445   -3.159953
533   -3.156547
628    3.369795
698   -3.372664
Name: 2, dtype: float64

data[(np.abs(data) > 3).any(1)] #选出超过3的行

	0	1	2	3
55	-3.157032	-0.841691	1.018759	-0.018302
340	0.456149	0.854559	3.196054	0.353166
343	-3.283047	-0.316560	-0.121576	0.584322
407	-0.089158	-0.604724	1.028259	3.364796
445	0.300672	-0.848071	-3.159953	0.870023
533	-0.048864	0.152498	-3.156547	-0.968370
628	1.119083	0.171787	3.369795	-0.550373
698	-0.517293	-1.208259	-3.372664	-0.418606
824	-3.459360	-0.702142	0.325501	0.653165
873	-3.617567	-1.302917	-0.577524	0.859530
923	-0.920904	-0.103102	-0.581829	-3.196753
981	0.672200	-0.274157	-0.883970	-3.038320

data[np.abs(data) > 3] = np.sign(data) * 3
data.describe()

	0	1	2	3
count	1000.000000	1000.000000	1000.000000	1000.000000
mean	-0.066507	0.015781	0.048779	-0.019597
std	1.045975	0.963683	0.968296	1.029555
min	-3.000000	-2.550853	-3.000000	-3.000000
25%	-0.718715	-0.591289	-0.606569	-0.712316
50%	-0.066156	0.004574	0.068207	0.000122
75%	0.627520	0.662984	0.747493	0.673216
max	2.940831	2.865724	3.000000	3.000000

df = pd.DataFrame(np.arange(5 * 4).reshape((5, 4)))
sampler = np.random.permutation(5)  #表示新顺序的数组
sampler

array([1, 2, 0, 4, 3])

df

	0	1	2	3
0	0	1	2	3
1	4	5	6	7
2	8	9	10	11
3	12	13	14	15
4	16	17	18	19

df.take(sampler)

	0	1	2	3
1	4	5	6	7
2	8	9	10	11
0	0	1	2	3
4	16	17	18	19
3	12	13	14	15

df.sample(n=3)

	0	1	2	3
4	16	17	18	19
1	4	5	6	7
2	8	9	10	11

choices = pd.Series([5, 7, -1, 6, 4])
draws = choices.sample(n=10, replace=True)
draws

1    7
0    5
4    4
0    5
3    6
3    6
1    7
4    4
1    7
1    7
dtype: int64

df = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'b'],  'data1': range(6)})
df

	key	data1
0	b	0
1	b	1
2	a	2
3	c	3
4	a	4
5	b	5

pd.get_dummies(df['key'])

	a	b	c
0	0	1	0
1	0	1	0
2	1	0	0
3	0	0	1
4	1	0	0
5	0	1	0

dummies = pd.get_dummies(df['key'], prefix='key')
df_with_dummy = df[['data1']].join(dummies)
df_with_dummy

	data1	key_a	key_b	key_c
0	0	0	1	0
1	1	0	1	0
2	2	1	0	0
3	3	0	0	1
4	4	1	0	0
5	5	0	1	0

data = {'Dave': 'dave@google.com', 'Steve': 'steve@gmail.com', 'Rob': 'rob@gmail.com', 'Wes': np.nan}
data = pd.Series(data)
data

Dave     dave@google.com
Steve    steve@gmail.com
Rob        rob@gmail.com
Wes                  NaN
dtype: object

data.isnull()

Dave     False
Steve    False
Rob      False
Wes       True
dtype: bool

data.str.contains('gmail') #data.map可以将字符串函数作用于各个值，但是遇见NaN会报错，str不会

Dave     False
Steve     True
Rob       True
Wes        NaN
dtype: object

import re
pattern='([A-Z0-9._%+-]+)@([A-Z0-9._-]+)\\.([A-Z]{2,4})'
data.str.findall(pattern, flags=re.IGNORECASE)

Dave     [(dave, google, com)]
Steve    [(steve, gmail, com)]
Rob        [(rob, gmail, com)]
Wes                        NaN
dtype: object

df = pd.DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'], 'key2' : ['one', 'two', 'one', 'two', 'one'],
                   'data1' : np.random.randn(5), 'data2' : np.random.randn(5)})
df

	key1	key2	data1	data2
0	a	one	-0.083293	0.456279
1	a	two	-0.442362	-0.337304
2	b	one	0.244770	0.943875
3	b	two	0.862879	0.444040
4	a	one	0.858584	0.527193

grouped = df['data1'].groupby(df['key1'])
grouped

<pandas.core.groupby.generic.SeriesGroupBy object at 0x0000024CEBF7B880>

grouped.mean()

key1
a    0.110977
b    0.553824
Name: data1, dtype: float64

means = df['data1'].groupby([df['key1'], df['key2']]).mean()
means

key1  key2
a     one     0.387646
      two    -0.442362
b     one     0.244770
      two     0.862879
Name: data1, dtype: float64

means.unstack()

key2	one	two
key1
a	0.387646	-0.442362
b	0.244770	0.862879

states = np.array(['Ohio', 'California', 'California', 'Ohio', 'Ohio'])
years = np.array([2005, 2005, 2006, 2005, 2006])
df['data1'].groupby([states, years]).mean()

California  2005   -0.442362
            2006    0.244770
Ohio        2005    0.389793
            2006    0.858584
Name: data1, dtype: float64

df.groupby('key1').mean()

	data1	data2
key1
a	0.110977	0.215390
b	0.553824	0.693958

df.groupby(['key1', 'key2']).mean()

		data1	data2
key1	key2
a	one	0.387646	0.491736
a	two	-0.442362	-0.337304
b	one	0.244770	0.943875
b	two	0.862879	0.444040

df.groupby(['key1', 'key2']).size()  #忽略缺失值

key1  key2
a     one     2
      two     1
b     one     1
      two     1
dtype: int64

for name, group in df.groupby('key1'):
    print(name)
    print(group)

a
  key1 key2     data1     data2
0    a  one -0.083293  0.456279
1    a  two -0.442362 -0.337304
4    a  one  0.858584  0.527193
b
  key1 key2     data1     data2
2    b  one  0.244770  0.943875
3    b  two  0.862879  0.444040

for (k1, k2), group in df.groupby(['key1', 'key2']):
    print((k1, k2))
    print(group)

('a', 'one')
  key1 key2     data1     data2
0    a  one -0.083293  0.456279
4    a  one  0.858584  0.527193
('a', 'two')
  key1 key2     data1     data2
1    a  two -0.442362 -0.337304
('b', 'one')
  key1 key2    data1     data2
2    b  one  0.24477  0.943875
('b', 'two')
  key1 key2     data1    data2
3    b  two  0.862879  0.44404

pieces = dict(list(df.groupby('key1')))
pieces['b']

	key1	key2	data1	data2
2	b	one	0.244770	0.943875
3	b	two	0.862879	0.444040

df.dtypes

key1      object
key2      object
data1    float64
data2    float64
dtype: object

grouped = df.groupby(df.dtypes, axis=1)
for dtype, group in grouped:
    print(dtype)
    print(group)

float64
      data1     data2
0 -0.083293  0.456279
1 -0.442362 -0.337304
2  0.244770  0.943875
3  0.862879  0.444040
4  0.858584  0.527193
object
  key1 key2
0    a  one
1    a  two
2    b  one
3    b  two
4    a  one

people = pd.DataFrame(np.random.randn(5, 5), columns=['a', 'b', 'c', 'd', 'e'], index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
people.iloc[2:3, [1, 2]] = np.nan ## Add a few NA values
people

	a	b	c	d	e
Joe	0.231080	-0.440371	0.409642	-0.114867	0.328406
Steve	-0.775944	-1.258328	-2.723042	0.615950	-1.263696
Wes	1.965413	NaN	NaN	-1.284734	0.204553
Jim	-0.097869	0.182042	0.061867	-0.648661	-0.217448
Travis	-0.006042	-0.612533	0.537186	0.646037	1.339316

mapping = {'a': 'red', 'b': 'red', 'c': 'blue', 'd': 'blue', 'e': 'red', 'f' : 'orange'}
by_column = people.groupby(mapping, axis=1)
by_column.sum()

	blue	red
Joe	0.294776	0.119115
Steve	-2.107092	-3.297967
Wes	-1.284734	2.169965
Jim	-0.586794	-0.133276
Travis	1.183222	0.720741

map_series = pd.Series(mapping)
map_series

a       red
b       red
c      blue
d      blue
e       red
f    orange
dtype: object

people.groupby(map_series, axis=1).count()

	blue	red
Joe	2	3
Steve	2	3
Wes	1	2
Jim	2	3
Travis	2	3

people.groupby(len).sum()

	a	b	c	d	e
3	2.098624	-0.258329	0.471509	-2.048262	0.315510
5	-0.775944	-1.258328	-2.723042	0.615950	-1.263696
6	-0.006042	-0.612533	0.537186	0.646037	1.339316

key_list = ['one', 'one', 'one', 'two', 'two']
people.groupby([len, key_list]).min()

		a	b	c	d	e
3	one	0.231080	-0.440371	0.409642	-1.284734	0.204553
3	two	-0.097869	0.182042	0.061867	-0.648661	-0.217448
5	one	-0.775944	-1.258328	-2.723042	0.615950	-1.263696
6	two	-0.006042	-0.612533	0.537186	0.646037	1.339316

columns = pd.MultiIndex.from_arrays([['US', 'US', 'US', 'JP', 'JP'], [1, 3, 5, 1, 3]], names=['cty', 'tenor'])
hier_df = pd.DataFrame(np.random.randn(4, 5), columns=columns)
hier_df

cty	US			JP
tenor	1	3	5	1	3
0	0.860698	-0.379994	0.644758	-0.231480	0.346634
1	1.237142	0.038387	0.600247	0.431467	0.137392
2	-2.211133	1.528952	0.056726	-0.629724	-0.125510
3	-1.272170	-1.088555	-1.950819	-0.253229	0.910727

hier_df.groupby(level='cty', axis=1).count()

cty	JP	US
0	2	3
1	2	3
2	2	3
3	2	3

df

	key1	key2	data1	data2
0	a	one	-0.083293	0.456279
1	a	two	-0.442362	-0.337304
2	b	one	0.244770	0.943875
3	b	two	0.862879	0.444040
4	a	one	0.858584	0.527193

grouped = df.groupby('key1')
grouped['data1'].quantile(0.9)  #计算该百分位的值，如果没有则插值

key1
a    0.670209
b    0.801068
Name: data1, dtype: float64

def peak_to_peak(arr): return arr.max() - arr.min()
grouped.agg(peak_to_peak)

C:\Users\HP\AppData\Local\Temp\ipykernel_20412\238647417.py:2: FutureWarning: ['key2'] did not aggregate successfully. If any error is raised this will raise in a future version of pandas. Drop these columns/ops to avoid this warning.
  grouped.agg(peak_to_peak)

	data1	data2
key1
a	1.300946	0.864497
b	0.618109	0.499836

grouped.describe()

	data1								data2
	count	mean	std	min	25%	50%	75%	max	count	mean	std	min	25%	50%	75%	max
key1
a	3.0	0.110977	0.671878	-0.442362	-0.262827	-0.083293	0.387646	0.858584	3.0	0.215390	0.479958	-0.337304	0.059488	0.456279	0.491736	0.527193
b	2.0	0.553824	0.437069	0.244770	0.399297	0.553824	0.708352	0.862879	2.0	0.693958	0.353437	0.444040	0.568999	0.693958	0.818916	0.943875

tips = pd.read_csv('examples/tips.csv')
tips['tip_pct'] = tips['tip'] / tips['total_bill']
tips[:4]

---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)

Input In [135], in <cell line: 1>()
----> 1 tips = pd.read_csv('examples/tips.csv')
      2 tips['tip_pct'] = tips['tip'] / tips['total_bill']
      3 tips[:4]


File E:\anaconda\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)


File E:\anaconda\lib\site-packages\pandas\io\parsers\readers.py:680, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    665 kwds_defaults = _refine_defaults_read(
    666     dialect,
    667     delimiter,
   (...)
    676     defaults={"delimiter": ","},
    677 )
    678 kwds.update(kwds_defaults)
--> 680 return _read(filepath_or_buffer, kwds)


File E:\anaconda\lib\site-packages\pandas\io\parsers\readers.py:575, in _read(filepath_or_buffer, kwds)
    572 _validate_names(kwds.get("names", None))
    574 ## Create the parser.
--> 575 parser = TextFileReader(filepath_or_buffer, **kwds)
    577 if chunksize or iterator:
    578     return parser


File E:\anaconda\lib\site-packages\pandas\io\parsers\readers.py:933, in TextFileReader.__init__(self, f, engine, **kwds)
    930     self.options["has_index_names"] = kwds["has_index_names"]
    932 self.handles: IOHandles | None = None
--> 933 self._engine = self._make_engine(f, self.engine)


File E:\anaconda\lib\site-packages\pandas\io\parsers\readers.py:1217, in TextFileReader._make_engine(self, f, engine)
   1213     mode = "rb"
   1214 ## error: No overload variant of "get_handle" matches argument types
   1215 ## "Union[str, PathLike[str], ReadCsvBuffer[bytes], ReadCsvBuffer[str]]"
   1216 ## , "str", "bool", "Any", "Any", "Any", "Any", "Any"
-> 1217 self.handles = get_handle(  ## type: ignore[call-overload]
   1218     f,
   1219     mode,
   1220     encoding=self.options.get("encoding", None),
   1221     compression=self.options.get("compression", None),
   1222     memory_map=self.options.get("memory_map", False),
   1223     is_text=is_text,
   1224     errors=self.options.get("encoding_errors", "strict"),
   1225     storage_options=self.options.get("storage_options", None),
   1226 )
   1227 assert self.handles is not None
   1228 f = self.handles.handle


File E:\anaconda\lib\site-packages\pandas\io\common.py:789, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    784 elif isinstance(handle, str):
    785     ## Check whether the filename is to be opened in binary mode.
    786     ## Binary mode does not support 'encoding' and 'newline'.
    787     if ioargs.encoding and "b" not in ioargs.mode:
    788         ## Encoding
--> 789         handle = open(
    790             handle,
    791             ioargs.mode,
    792             encoding=ioargs.encoding,
    793             errors=errors,
    794             newline="",
    795         )
    796     else:
    797         ## Binary mode
    798         handle = open(handle, ioargs.mode)


FileNotFoundError: [Errno 2] No such file or directory: 'examples/tips.csv'

frame = pd.DataFrame({'data1': np.random.randn(1000), 'data2': np.random.randn(1000)})
quartiles = pd.cut(frame.data1, 4)
quartiles[:10]

0     (-0.436, 1.211]
1     (-0.436, 1.211]
2      (1.211, 2.858]
3     (-0.436, 1.211]
4      (1.211, 2.858]
5    (-2.083, -0.436]
6    (-2.083, -0.436]
7      (1.211, 2.858]
8     (-0.436, 1.211]
9    (-2.083, -0.436]
Name: data1, dtype: category
Categories (4, interval[float64, right]): [(-3.737, -2.083] < (-2.083, -0.436] < (-0.436, 1.211] < (1.211, 2.858]]

def get_stats(group):
    return {'min': group.min(), 'max': group.max(), 'count': group.count(), 'mean': group.mean()}
grouped = frame.data2.groupby(quartiles)
grouped.apply(get_stats).unstack()

	min	max	count	mean
data1
(-3.737, -2.083]	-1.417666	1.053207	15.0	-0.021193
(-2.083, -0.436]	-2.815877	2.712397	296.0	-0.097675
(-0.436, 1.211]	-2.950480	3.093977	568.0	0.006707
(1.211, 2.858]	-2.621023	2.433423	121.0	-0.102975

grouping = pd.qcut(frame.data1, 10, labels=False) #平均分
grouped = frame.data2.groupby(grouping)
grouped.apply(get_stats).unstack()

	min	max	count	mean
data1
0	-1.979885	2.546603	100.0	0.015369
1	-2.815877	2.560098	100.0	-0.189069
2	-2.367227	2.290479	100.0	-0.097540
3	-2.057884	3.093977	100.0	0.000976
4	-2.314728	2.157829	100.0	0.125600
5	-2.944465	1.991280	100.0	-0.092507
6	-2.503720	2.415097	100.0	-0.045530
7	-2.950480	2.553021	100.0	0.057286
8	-2.688502	2.356049	100.0	-0.059030
9	-2.339079	2.433423	100.0	-0.094356

tips.pivot_table(index=['day', 'smoker'])  #默认计算分组平均数

---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Input In [139], in <cell line: 1>()
----> 1 tips.pivot_table(index=['day', 'smoker'])


NameError: name 'tips' is not defined

from io import StringIO
data = """\
Sample  Nationality  Handedness
1   USA  Right-handed
2   Japan    Left-handed
3   USA  Right-handed
4   Japan    Right-handed
5   Japan    Left-handed
6   Japan    Right-handed
7   USA  Right-handed
8   USA  Left-handed
9   Japan    Right-handed
10  USA  Right-handed"""
data = pd.read_table(StringIO(data), sep='\s+')

pd.crosstab(data.Nationality, data.Handedness, margins=True)

Handedness	Left-handed	Right-handed	All
Nationality
Japan	2	3	5
USA	1	4	5
All	3	7	10

pd.crosstab([tips.time, tips.day], tips.smoker, margins=True)

---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Input In [142], in <cell line: 1>()
----> 1 pd.crosstab([tips.time, tips.day], tips.smoker, margins=True)


NameError: name 'tips' is not defined

J&Ocean

https://jiang-wu-19.github.io/2023/07/06/Python%E6%95%B0%E6%8D%AE%E5%A4%84%E7%90%86/

本博客所有文章除特別声明外，均采用 CC BY 4.0 许可协议。转载请注明来源 J&Ocean !

Python 数据处理

算法课外思考题五

求top-k元素和递归实现堆排序

2023-07-07 思考题

算法 top-k 堆排序

算法课外思考题四

二分查找效率和旋转数组中的查找

2023-07-06 思考题

算法查找效率旋转数组

	key	data1	data2
0	b	0.0	1.0
1	b	1.0	1.0
2	b	6.0	1.0
3	a	2.0	0.0
4	a	4.0	0.0
5	a	5.0	0.0
6	c	3.0	NaN
7	d	NaN	2.0

	key1	key2	lval	rval
0	foo	one	1.0	4.0
1	foo	one	1.0	5.0
2	foo	two	2.0	NaN
3	bar	one	3.0	6.0
4	bar	two	NaN	7.0

	key1	key2_x	lval	key2_y	rval
0	foo	one	1	one	4
1	foo	one	1	one	5
2	foo	two	2	one	4
3	foo	two	2	one	5
4	bar	one	3	one	6
5	bar	one	3	two	7

	key1	key2_left	lval	key2_right	rval
0	foo	one	1	one	4
1	foo	one	1	one	5
2	foo	two	2	one	4
3	foo	two	2	one	5
4	bar	one	3	one	6
5	bar	one	3	two	7

	0	1	2
a	0.0	NaN	NaN
b	1.0	NaN	NaN
c	NaN	2.0	NaN
d	NaN	3.0	NaN
e	NaN	4.0	NaN
f	NaN	NaN	5.0
g	NaN	NaN	6.0

	one	two	three
a	0.0	NaN	NaN
b	1.0	NaN	NaN
c	NaN	2.0	NaN
d	NaN	3.0	NaN
e	NaN	4.0	NaN
f	NaN	NaN	5.0
g	NaN	NaN	6.0

	key	data1	data2
0	b	0.0	1.0
1	b	1.0	1.0
2	b	6.0	1.0
3	a	2.0	0.0
4	a	4.0	0.0
5	a	5.0	0.0
6	c	3.0	NaN
7	d	NaN	2.0

	key1	key2	lval	rval
0	foo	one	1.0	4.0
1	foo	one	1.0	5.0
2	foo	two	2.0	NaN
3	bar	one	3.0	6.0
4	bar	two	NaN	7.0

	key1	key2_x	lval	key2_y	rval
0	foo	one	1	one	4
1	foo	one	1	one	5
2	foo	two	2	one	4
3	foo	two	2	one	5
4	bar	one	3	one	6
5	bar	one	3	two	7

	key1	key2_left	lval	key2_right	rval
0	foo	one	1	one	4
1	foo	one	1	one	5
2	foo	two	2	one	4
3	foo	two	2	one	5
4	bar	one	3	one	6
5	bar	one	3	two	7

	0	1	2
a	0.0	NaN	NaN
b	1.0	NaN	NaN
c	NaN	2.0	NaN
d	NaN	3.0	NaN
e	NaN	4.0	NaN
f	NaN	NaN	5.0
g	NaN	NaN	6.0

	one	two	three
a	0.0	NaN	NaN
b	1.0	NaN	NaN
c	NaN	2.0	NaN
d	NaN	3.0	NaN
e	NaN	4.0	NaN
f	NaN	NaN	5.0
g	NaN	NaN	6.0

	key	data1	data2
0	b	0.0	1.0
1	b	1.0	1.0
2	b	6.0	1.0
3	a	2.0	0.0
4	a	4.0	0.0
5	a	5.0	0.0
6	c	3.0	NaN
7	d	NaN	2.0

	key1	key2	lval	rval
0	foo	one	1.0	4.0
1	foo	one	1.0	5.0
2	foo	two	2.0	NaN
3	bar	one	3.0	6.0
4	bar	two	NaN	7.0

	key1	key2_x	lval	key2_y	rval
0	foo	one	1	one	4
1	foo	one	1	one	5
2	foo	two	2	one	4
3	foo	two	2	one	5
4	bar	one	3	one	6
5	bar	one	3	two	7

	key1	key2_left	lval	key2_right	rval
0	foo	one	1	one	4
1	foo	one	1	one	5
2	foo	two	2	one	4
3	foo	two	2	one	5
4	bar	one	3	one	6
5	bar	one	3	two	7

	0	1	2
a	0.0	NaN	NaN
b	1.0	NaN	NaN
c	NaN	2.0	NaN
d	NaN	3.0	NaN
e	NaN	4.0	NaN
f	NaN	NaN	5.0
g	NaN	NaN	6.0

	one	two	three
a	0.0	NaN	NaN
b	1.0	NaN	NaN
c	NaN	2.0	NaN
d	NaN	3.0	NaN
e	NaN	4.0	NaN
f	NaN	NaN	5.0
g	NaN	NaN	6.0