Python数据处理


_hello="helloworld"
score=0
y=20
y=True
print(_hello)
helloworld
print(score)
0
print(y)
True

变量

python是动态类型语言,不检查数据类型
可以接收其他类型的数据

a=b=c=10
  • python支持链式赋值语句
print(a)
10
#coding=utf-8
#file:chapter4/4.4/hello.py

_hello="helloworld"
score_for_student=10.0 #没有错误发生
y=20

name1="Tom";name2="Tony"
#链式赋值语句
a=b=c=10

if y>10:
    print(y)
    print(score_for_student)
else:
    print(y*10)
print(_hello)
20
10.0
helloworld
#coding=utf-8
#file:chapter4/4.4/hello.py

_hello="helloworld"
score_for_student=10.0 #没有错误发生
y=20

name1="Tom";name2="Tony"
#链式赋值语句
a=b=c=10

if y>10:
    print(y)
    print(score_for_student)
else:
    print(y*10)
print(_hello)
20
10.0
helloworld
## coding=utf-8
import module1
from module1 import z

y=20

print(y)
print(module1.y)
print(z)
20
True
10.0
## coding=utf-8
import module1
from module1 import z

y=20

print(y)
print(module1.y)
print(z)
20
True
10.0
import com.pkg2.hello as module1
from com.pkg2.hello import z as x
print(x)
y=20
print(y)
print(module1.y)
print(z)
10.1
20
True
10.0

编码规范

命名规范

  • 包名: 全部小写字母,中间可以由的隔开,不推荐使用下画线。作为命名空间,包名野窍应该具有唯一性,推荐采用公司或组织域名的倒置,如com.apple . quicktime . v2 。
  • 模块名: 全部小写字母,如果是多个单词构成, 可以用下画线隔开, 如dummy_threading 。
  • 类名: 采用大驼峰法命名③,如SplitViewController 。
  • 异常名:异常属于类, 命名同类命名,但应该使用Error 作为后缀。如FileNotFoundError 。
  • 变量名: 全部小写字母,如果由多个单词构成,可以用下画线隔开。如果变量应用于模块或函数内部,则变量名可以由单下画线开头: 变量类内部私有使用变量名可以双下画线开头。不要命名双下画线开头和结尾的变量,这是Python 保留的。另外,避免使用小写L 、大写0 和大写I 作为变量名。
  • 函数名和方法名: 命名同变量命名,如balance_account 、push_cm exit 。
  • 常量名: 全部大写字母,如果是由多个单词构成,可以用下画线隔开,如YEAR 和WEEK OF MONTH 。

注释规范

单行注释、多行注释和文档注释

文件注释

文件注释就是在每一个文件开头添加注释,采用多行注释。文件注释通常包括如下信息:版权信息、文件名、所在模块、作者信息、历史版本信息、文件内容和作用等。

#
#版权所有2015 北京智捷东方科技有限公司
#许可信息查看LICENSE . txt 文件
#描述:
## 实现日期基本功能
#历史版本:
## 2015 7 22 :创建关东升
## 2015 - 8 - 20 : 添加socket 库
## 2015 - 8 - 22 :添加math 库
#

上述注释只是提供了版权信息、文件内容和历史版本信息等,文件注释要根据实际情况包
括内容。

文档注释

代码注释

使用todo注释

导入规范

导入语句应该按照从通用到特殊的顺序分组, 顺序是: 标准库→ 第三方库→ 自己模块。每一组之间有一个空行,而且组中模块是按照英文字母顺序排序的。

import io
import os
import pkgutil
import platform
import re
import sys
import time
from html import unescape
from com.pkgl import example

代码规范

空行

  • import 语句块前后保留两个空行
  • 函数声明之前保留两个空行
  • 类声明之前保留两个空行
  • 方法声明之前保留一个空行
  • 两个逻辑代码块之间应该保留一个空行

空格

  • 赋值符号“=”前后各有一个空格
  • 所有的二元运算符都应该使用空格与操作数分开
  • 一元运算符:算法运算符取反“”和运算符取反“ ~ ”
  • 括号内不要有空格, Python 中括号包括小括号“()飞中括号“ []”和大括号“{}”
  • 不要在逗号、分号、冒号前面有空格,而是要在它们后面有一个空格,除非该符号已经是行尾了
  • 参数列表、索引或切片的左括号前不应有空格

缩进

4 个空格常被作为缩进排版的一个级别。虽然在开发时程序员可以使用制表符进行缩进,而默认情况下一个制表符等于8 个空格,但是不同的IDE 工具中一个制表符与空格对应个数会有不同,所以不要使用制表符缩进。

断行

一行代码中最多79 个字符, 对于文档注释和多行注释时一行最多72 个字符,但是如果注释中包含URL 地址可以不受这个限制。否则,如果超过则需断行,可以依据下面的一般规范断开。

  • 在逗号后面断开
  • 在运算符前面断开
  • 尽量不要使用续行符“ \ ” , 当有括号(包括大括号、中括号和小括号) 则在括号中断开, 这样可以不使用续行符

数据类型

数字类型

整数类型

28
28
0b11100
28
0o34
28
0x1c
28

浮点类型

1.0
1.0
0.0
0.0
3.36e2
336.0
1.56e-2
0.0156

复数类型

1+2j
(1+2j)
(1+2j)+(1+2j)
(2+4j)

布尔类型

bool(0)
False
bool(2)
True
bool(1)
True
bool('')
False
bool(' ')
True
bool([])
False
bool({})
False

数字类型相互转换

隐式类型转换

a=1+True
print(a)
2
a=1.0+1
type(a)
float
print(a)
2.0
a=1.0+True
print(a)
2.0
a=1.0+1+True
print(a)
3.0
a=1.0+1+False
print(a)
2.0

显式类型转换

int(False)
0
int(True)
1
int(19.6)
19
float(5)
5.0
float(False)
0.0
float(True)
1.0

字符串类型

字符串表示方式

s = 'Hello World'
print(s)
Hello World
s="Hello World"
print(s)
Hello World
s='\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064'
print(s)
Hello World
s="\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064"
print(s)
Hello World
  • 转义符
s='Hello\n World'
print(s)
Hello
 World
s='Hello\t World'
print(s)
Hello	 World
s='Hello \'World'
print(s)
Hello 'World
s="hello'world"
print(s)
hello'world
s='hello"world'
print(s)
hello"world
s='hello\\world'
print(s)
hello\world
s='hello\u005c world'
print(s)
hello\ world
  • 原始字符串
s='hello\tworld'
print(s)
hello	world
s=r'hello\tworld'
print(s)
hello\tworld
  • 长字符串
s='''hello
world'''
print(s)
hello
world
s='''hello
\tworld'''
print(s)
hello
    world

字符串格式化

name='Mary'
age=18
s='她的年龄是{0}岁。'.format(age)
print(s)
她的年龄是18岁。
s='{0}芳龄是{1}岁'.format(name,age)
print(s)
Mary芳龄是18岁
s='{1}芳龄是{0}岁'.format(age,name)
print(s)
Mary芳龄是18岁
s='{n}芳龄是{a}岁'.format(n=name,a=age)
print(s)
Mary芳龄是18岁
name='Mary'
age=18
money=1234.5678
"{0}芳龄是{1:d}岁。".format(name,age)
'Mary芳龄是18岁。'
"{1}芳龄是{0:5d}岁。".format(age,name)
'Mary芳龄是   18岁。'
"{0}今天收入是{1:f}元".format(name,money)
'Mary今天收入是1234.567800元'
"{0}今天收入是{1:.2f}".format(name,money)
'Mary今天收入是1234.57'
"{0}今天收入是{1:10.2f}".format(name,money)
'Mary今天收入是   1234.57'
"{0}今天收入是{1:g}".format(name,money)
'Mary今天收入是1234.57'
"{0}今天收入是{1:G}".format(name,money)
'Mary今天收入是1234.57'
"{0}今天收入是{1:e}".format(name,money)
'Mary今天收入是1.234568e+03'
"{0}今天收入是{1:E}".format(name,money)
'Mary今天收入是1.234568E+03'

字符串查找

source_str="there is a string accessing example"
len(source_str)
35
source_str[16]
'g'
source_str.find('r')
3
source_str.rfind('r')
13
source_str.find('ing')
14
source_str.rfind('ing')
24
source_str.find('e',15)
21
source_str.find('ing',5)
14
source_str.rfind('ing',5)
24
source_str.find('ing',18,28)
24
source_str.find('ingg',5)
-1
字符串与数字相互转换
int('9')
9
int('9.6')
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Input In [24], in <cell line: 1>()
----> 1 int('9.6')


ValueError: invalid literal for int() with base 10: '9.6'
float('9.6')
9.6
int('AB')
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Input In [26], in <cell line: 1>()
----> 1 int('AB')


ValueError: invalid literal for int() with base 10: 'AB'
str(3.24)
'3.24'
str(True)
'True'
str([])
'[]'
str([1,2,3])
'[1, 2, 3]'
str(34)
'34'
'{0:2f}'.format(3.24)
'3.240000'
'{:.1f}'.format(3.24)
'3.2'
'{:10.1f}'.format(3.24)
'       3.2'

运算符

算数运算符

一元运算符

a=12
-a
-12

二元运算符

1+2
3
2-1
1
2*3
6
3/2
1.5
3%2
1
3//2
1
-3//2
-2
10**2
100
10.22+10
20.22
10.0+True+2
13.0
'hello'+'world'
'helloworld'
'hello'+2
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

Input In [47], in <cell line: 1>()
----> 1 'hello'+2


TypeError: can only concatenate str (not "int") to str
'hello'*2
'hellohello'

关系运算符

a=1
b=2
a>b
False
a<b
True
a>=b
False
a<=b
True
1.0!=1
False
a='hello'
b='hello'
a==b
True
a='World'
a>b
False
a<b
True
a=[]
b=[1,2]
a==b
False
a<b
True
a=[1,2]
a==b
True

逻辑运算符

i=0
a=10
b=9

if a>b or i==1:
    print("或运算为真")
else:
    print("或运算为假")
    
if a<b and i==1:
    print("与运算为真")
else:
    print("与运算为假")
    

def f1():
    return a>b

def f2():
    print('--f2--')
    return a==b

print(f1() or f2())
或运算为真
与运算为假
True

位运算符

a=0b10110010
b=0b01011110
print("a|b={0}".format(a|b))
print("a&b={0}".format(a&b))
print("a^b={0}".format(a^b))
print("~a={0}".format(~a))
print("a>>2={0}".format(a>>2))
print("a<<2={0}".format(a<<2))
c=-0b1100
print("c>>2={0}".format(c>>2))
print("c<<2={0}".format(c<<2))
a|b=254
a&b=18
a^b=236
~a=-179
a>>2=44
a<<2=712
c>>2=-3
c<<2=-48

赋值运算符

a=1
b=2

a+=b
print(a)

a+=b+3
print(a)

a-=b
print(a)

a*=b
print(a)

a/=b
print(a)

a%=b
print(a)

a=0b10110010
b=0b01011110

a|=b
print(a)

a^=b
print(a)
3
8
6
12
6.0
0.0
254
160

其他运算符

同一性测试运算符

成员测试运算符

class Person:
    def __init__(self,name,age):
        self.name=name
        self.age=age
        
p1=Person('Tony',18)
p2=Person('Tony',18)

print(p1==p2)
print(p1 is p2)

print(p1!=p2)
print(p1 is not p2)
False
False
True
True
class Person:
    def __init__(self,name,age):
        self.name=name
        self.age=age
    
    def __eq__(self,other):
        if self.name==other.name and self.age==other.age:
            return True
        else:
            return False
  
        
p1=Person('Tony',18)
p2=Person('Tony',18)

print(p1==p2)
print(p1 is p2)

print(p1!=p2)
print(p1 is not p2)
    
True
False
False
True
string_a='hello'
print('e' in string_a)
print('ell' not in string_a)

list_a=[1,2]
print(2 in list_a)
print(1 not in list_a)
True
False
True
False

控制语句

分支语句

if结构

score=5

if score>=85:
    print('perfect')
if score<60:
    print('hard')
if score>=60 and score<85:
    print('justsoso')
hard

if-else结构

score=75

if score>=60:
    print('justsoso')
    if score>=90:
        print('perfect')
else:
    print("不及格")
justsoso

elif结构

score=80

if score>=90:
    grade='A'
elif score>=80:
    grade='B'
elif score>=70:
    grade='C'
elif score>=60:
    grade='D'
else:
    grade='F'
    
print(grade)
B

条件表达式

score=85
result='justsoso' if score>=60 else 'hard'
print(result)
justsoso

循环语句

while语句

i=0

while i*i<100_000:
    i+=1

print(i)
print(i*i)
317
100489

for语句

print('----范围----')
for num in range(1,10):
    print("{0}*{0}={1}".format(num,num*num))

print('----字符串----')
for item in "hello":
    print(item)
    
print('----整数列表----')
numbers=[43,32,53,54,75,7,10]
for item in numbers:
    print(item)
----范围----
1*1=1
2*2=4
3*3=9
4*4=16
5*5=25
6*6=36
7*7=49
8*8=64
9*9=81
----字符串----
h
e
l
l
o
----整数列表----
43
32
53
54
75
7
10

跳转语句

break语句

for item in range(10):
    if item==3:
        break
    print(item)
0
1
2

continue语句

for item in range(10):
    if item==3:
        continue
    print(item)
0
1
2
4
5
6
7
8
9

while和for中的else语句

i=0

while i*i<10:
    i+=1
    print("{0}*{0}={1}".format(num,num*num))
else:
    print("whileover")
    
print('----------')

for item in range(10):
    if item==3:
        break
    print(item)
else:
    print('forover')
9*9=81
9*9=81
9*9=81
9*9=81
whileover
----------
0
1
2

使用范围

range()函数语法:
$$
range([start,]stop[,step])
$$

for item in range(1,10,2):
    print(item)
print('------------')

for item in range(1,-10,-3):
    print(item)
1
3
5
7
9
------------
1
-2
-5
-8

数据结构

元组

序列

索引操作
a='hello'
a[0]
'h'
a[1]
'e'
a[2]
'l'
a[3]
'l'
a[4]
'o'
a[5]
---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

Input In [7], in <cell line: 1>()
----> 1 a[5]


IndexError: string index out of range
max(a)
'o'
min(a)
'e'
len(a)
5
序列的+和*
a*3
'hellohellohello'
print(a)
hello
a+=' '
a+='world'
print(a)
hello world
序列分片
  • [start:end]:start是开始索引,end是结束索引
  • [start:end:step]:start是开始索引,end是结束索引,step是步长,可取正负整数
    实际切下分片为:[start,end)
a[1:3]
'el'
a[:3]
'hel'
a[0:3]
'hel'
a[0:]
'hello world'
a[0:5]
'hello'
a[:]
'hello world'
a[1:-1]
'ello worl'
a[1:5]
'ello'
a[1:5:2]
'el'

创建元组

21324345
  Input In [26]
    21,32,43,45
      ^
SyntaxError: invalid character ',' (U+FF0C)
21,32,43,45
(21, 32, 43, 45)
(21,32,43,45)
(21, 32, 43, 45)
print(a)
hello world
a=(21,32,43,45)
print(a)
(21, 32, 43, 45)
('hello','world')
('hello', 'world')
('hello','world',1,2,3)
('hello', 'world', 1, 2, 3)
tuple([21,32,43,45])
(21, 32, 43, 45)
a=(21)
type(a)
int
a=(21,)
type(a)
tuple
a=()
type(a)
tuple

访问元组

a=('hello','world',1,2,3)
a[1]
'world'
a[1:3]
('world', 1)
a[2:]
(1, 2, 3)
a[:2]
('hello', 'world')
str1,str2,n1,n2,n3=a
str1
'hello'
str2
'world'
n1
1
n2
2
n3
3
str1,str2,*n=a
str1
'hello'
str2
'world'
n
[1, 2, 3]
str1,_,n1,n2,_=a
str1
'hello'
n1
1
n2
2

遍历元组

a=(21,32,43,45)

for item in a:
    print(item)

print('---------------------')
for i,item in enumerate(a):
    print('{0}-{1}'.format(i,item))
21
32
43
45
---------------------
0-21
1-32
2-43
3-45

列表

列表创建

[20,10,50,40,30]
[20, 10, 50, 40, 30]
[]
[]
['hello','world',1,2,3]
['hello', 'world', 1, 2, 3]
a=[10]
type(a)
list
a=[10,]
type(a)
list
list((20,10,50,40,30))
[20, 10, 50, 40, 30]

追加元素

list.append(x)
list.extend(t)
student_list=['张三','李四','王五']
student_list.append('董六')
student_list
['张三', '李四', '王五', '董六']
student_list+=['刘备','关羽']
student_list
['张三', '李四', '王五', '董六', '刘备', '关羽']
student_list.extend(['张飞','赵云'])
student_list
['张三', '李四', '王五', '董六', '刘备', '关羽', '张飞', '赵云']

插入元素

list.insert(i,x)
student_list=['zhangsan','lisi','wangwu']
student_list.insert(2,'liubei')
student_list
['zhangsan', 'lisi', 'liubei', 'wangwu']

替换元素

student_list=['zhangsan','lisi','wangwu']
student_list[0]='zhugeliang'
student_list
['zhugeliang', 'lisi', 'wangwu']

删除元素

remove()方法

如果找到多个,只会删除第一个

student_list=['zhangsan','lisi','wangwu','wangwu']
student_list.remove('wangwu')
student_list
['zhangsan', 'lisi', 'wangwu']
student_list.remove('wangwu')
student_list
['zhangsan', 'lisi']
pop()方法
item=list.pop([i])

i是指定删除元素的索引

student_list=['zhangsan','lisi','wangwu']
student_list.pop()
'wangwu'
student_list
['zhangsan', 'lisi']
student_list.pop(0)
'zhangsan'
student_list
['lisi']

其他常用办法

  • reverse():倒置列表
  • copy():复制列表
  • clear():清楚列表中的所有元素
  • index(x[,i[,j]]):返回x第一次出现的索引,i为开始查找索引,j是结束查找索引,继承序列
  • count(x):返回x出现的次数,方法继承序列
a=[21,32,43,45]
a.reverse()
a
[45, 43, 32, 21]
b=a.copy()
b
[45, 43, 32, 21]
a.clear()
a
[]
b
[45, 43, 32, 21]
a=[45,43,32,21,32]
a.count(32)
2
student_list=['zhangsan','lisi','wangwu']
student_list.index('wangwu')
2
student_tuple=('zhangsan','lisi','wangwu')
student_tuple.index('wangwu')
2
student_tuple.index('lisi',1,2)
1

列表推导式

n_list=[]
for x in range(10):
    if x%2==0:
        n_list.append(x**2)
print(n_list)
[0, 4, 16, 36, 64]
n_list=[x**2 for x in range(10) if x%2==0]
n_list
[0, 4, 16, 36, 64]
n_list=[x for x in range(100) if x%2==0 if x%5==0]
n_list
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]

集合

创建可变集合

a={'zhangsan','lisi','wangwu'}
a
{'lisi', 'wangwu', 'zhangsan'}
a={'zhangsan','lisi','wangwu','wangwu'}
len(a)
3
a
{'lisi', 'wangwu', 'zhangsan'}
set((20,10,50,40,30))
{10, 20, 30, 40, 50}
b={}
type(b)
dict
b=set()
type(b)
set

修改可变集合

  • add(elem):添加元素,已存在不能添加
  • remove(elem):删除元素,不存在则抛出错误
  • discard(elem):删除元素,不存在不抛出
  • pop():删除返回集合中任意元素,返回值是删除的元素
  • clear():清楚集合
student_set={'zhangsan','lisi','wangwu'}
student_set.add('dongliu')
student_set
{'dongliu', 'lisi', 'wangwu', 'zhangsan'}
student_set.remove('lisi')
student_set
{'dongliu', 'wangwu', 'zhangsan'}
student_set.remove('lisi')
---------------------------------------------------------------------------

KeyError                                  Traceback (most recent call last)

Input In [129], in <cell line: 1>()
----> 1 student_set.remove('lisi')


KeyError: 'lisi'
student_set.discard('lisi')
student_set
{'dongliu', 'wangwu', 'zhangsan'}
student_set.discard('wangwu')
student_set
{'dongliu', 'zhangsan'}
student_set.pop()
'dongliu'
student_set
{'zhangsan'}
student_set.clear()
student_set
set()

遍历集合

student_set={'zhangsan','lisi','wangwu'}

for item in student_set:
    print(item)
    
print('----------')
for i,item in enumerate(student_set):
    print('{0}-{1}'.format(i,item))
lisi
wangwu
zhangsan
----------
0-lisi
1-wangwu
2-zhangsan

不可变集合

student_set=frozenset({'zhangsan','lisi','wangwu'})
student_set
frozenset({'lisi', 'wangwu', 'zhangsan'})
type(student_set)
frozenset
student_set.add('dongliu')
---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

Input In [142], in <cell line: 1>()
----> 1 student_set.add('dongliu')


AttributeError: 'frozenset' object has no attribute 'add'
a=(21,32,43,45)
seta=frozenset(a)
seta
frozenset({21, 32, 43, 45})

集合推导式

n_list={x for x in range(100) if x%2==0 if x%5==0}
print(n_list)
{0, 70, 40, 10, 80, 50, 20, 90, 60, 30}
input_list=[2,3,2,4,5,6,6,6]
n_set=[x**2 for x in input_list]
n_set
[4, 9, 4, 16, 25, 36, 36, 36]
n_list={x**2 for x in input_list}
n_list
{4, 9, 16, 25, 36}

字典

创建字典

dict1={102:'zhangsan',105:'lisi',109:'wangwu'}
len(dict1)
3
dict1
{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}
type(dict1)
dict
dict1={}
dict1
{}
dict({102:'zhangsan',105:'lisi',109:'wangwu'})
{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}
dict(((102,'zhangsan'),(105,'lisi'),(109,'wangwu')))
{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}
dict([(102,'zhangsan'),(105,'lisi'),(109,'wangwu')])
{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}
t1=(102,'zhangsan')
t2=(105,'lisi')
t3=(109,'wangwu')
t=(t1,t2,t3)
dict(t)
{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}
list1=[t1,t2,t3]
dict(list1)
{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}
dict(zip([102,105,109],['zhangsan','lisi','wangwu']))
{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}

访问字典

  • get(key[,default]):通过键返回值,如果键不存在返回默认值
  • items():返回字典的所有键值对
  • keys():返回字典键视图
  • values():返回字典值视图
dict1={102:'zhangsan',105:'lisi',109:'wangwu'}
dict1.get(105)
'lisi'
dict1.get(101)
dict1.get(101,'dongliu')
'dongliu'
dict1.items()
dict_items([(102, 'zhangsan'), (105, 'lisi'), (109, 'wangwu')])
dict1.keys()
dict_keys([102, 105, 109])
dict1.values()
dict_values(['zhangsan', 'lisi', 'wangwu'])
student_dict={102:'zhangsan',105:'lisi',109:'wangwu'}
102 in student_dict
True
'lisi' in student_dict
False
print('---bianlijian---')
for student_id in student_dict.keys():
    print('xuehao:'+str(student_id))
    
print('---bianlizhi---')
for student_name in student_dict.values():
    print('xuesheng:'+student_name)
    
print('---bianlijian:zhi---')
for student_id,student_name in student_dict.items():
    print('xuehao:{0}-xuesheng:{1}'.format(student_id,student_name))
---bianlijian---
xuehao:102
xuehao:105
xuehao:109
---bianlizhi---
xuesheng:zhangsan
xuesheng:lisi
xuesheng:wangwu
---bianlijian:zhi---
xuehao:102-xuesheng:zhangsan
xuehao:105-xuesheng:lisi
xuehao:109-xuesheng:wangwu

字典推导式

input_dict={'one':1,'two':2,'three':3,'four':4}

output_dict={k:v for k,v in input_dict.items() if v%2==0}
output_dict
{'two': 2, 'four': 4}
keys=[k for k,v in input_dict.items() if v%2==0]
keys
['two', 'four']

函数式编程

定义函数

def ---:
    ---
    return ---
def rectangle_area(width,height):
    area=width*height
    return area

r_area=rectangle_area(320,420)
print("320*420的矩形面积{0}".format(r_area))
320*420的矩形面积134400

函数参数

使用关键字参数调用函数

def print_area(width,height):
    area=width*height
    print("{0}*{1}矩形的面积是:{2}".format(width,height,area))
    
print_area(320,420)
print_area(width=320,height=420)
print_area(320,height=420)
print(height=420,width=320)
320*420矩形的面积是:134400
320*420矩形的面积是:134400
320*420矩形的面积是:134400



---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

Input In [4], in <cell line: 8>()
      6 print_area(width=320,height=420)
      7 print_area(320,height=420)
----> 8 print(height=420,width=320)


TypeError: 'height' is an invalid keyword argument for print()

参数默认值

def make_coffee(name="Cappuccino"):
    return "制作一杯{0}".format(name)

coffee1=make_coffee("Latte")
coffee2=make_coffee()

print(coffee1)
print(coffee2)
制作一杯Latte
制作一杯Cappuccino

可变参数

*可变参数
def sum(*numbers,multiple=1):
    total=0
    for number in numbers:
        total+=number
    return total*multiple

print(sum(100.0,20.0,30.0))
print(sum(80,30))
print(sum(30,80,multiple=2))
double_tuple={50.0,60.0,0.0}
print(sum(30,80,*double_tuple))
150.0
110
220
220.0
**可变参数
def show(sep=':', **info):
    print('----info----')
    for key, value in info.items():
        print('{0} {2} {1}'.format(key, value, sep))


show('->', name='tony', age=18, sex = True)
show(student_name='tony',student_no='1000',sep='=')
stu_dict={'name':'tony','age':18}
show(**stu_dict,sex=True,sep='=')
----info----
name -> tony
age -> 18
sex -> True
----info----
student_name = tony
student_no = 1000
----info----
name = tony
age = 18
sex = True

函数返回值

无返回值函数

def show(sep=':', **info):
    print('----info----')
    for key, value in info.items():
        print('{0} {2} {1}'.format(key, value, sep))
    return

result=show('->', name='tony', age=18, sex = True)
print(result)

def sum(*numbers,multiple=1):
    total=0
    for number in numbers:
        total+=number
    return total*multiple

print(sum(100.0,20.0,30.0))
print(sum(80,30))
----info----
name -> tony
age -> 18
sex -> True
None
150.0
110

多返回值函数

def position(dt,speed):
    posx=speed[0]*dt
    posy=speed[1]*dt
    return(posx,posy)

move=position(60,(10,-5))
print("物体位移:({0},{1})".format(move[0],move[1]))
物体位移:(600,-300)

函数变量作用域

x=20
def print_value():
    print("函数中x={0}".format(x))
    
print_value()
print("全局变量={0}".format(x))
函数中x=20
全局变量=20
x=20
def print_value():
    x=10
    print("函数中x={0}".format(x))
    
print_value()
print("全局变量={0}".format(x))
函数中x=10
全局变量=20
x=20
def print_value():
    global x
    x=10
    print("函数中x={0}".format(x))
    
print_value()
print("全局变量={0}".format(x))
函数中x=10
全局变量=10

生成器

def square(num):
    n_list=[]
    
    for i in range(1,num+1):
        n_list.append(i*i)
        
    return n_list

for i in square(5):
    print(i,end=' ')
1 4 9 16 25 
def square(num):
    n_list=[]
    
    for i in range(1,num+1):
        yield i*i
        
    return n_list

for i in square(5):
    print(i,end=' ')
1 4 9 16 25 
def square(num):
    for i in range(1,num+1):
        yield i*i
n_seq=square(5)
n_seq.__next__()
1
n_seq.__next__()
4
n_seq.__next__()
9
n_seq.__next__()
16
n_seq.__next__()
25
n_seq.__next__()
---------------------------------------------------------------------------

StopIteration                             Traceback (most recent call last)

Input In [14], in <cell line: 1>()
----> 1 n_seq.__next__()


StopIteration: 

嵌套函数

def calculate(n1,n2,opr):
    multiple=2
    
    def add(a,b):
        return (a+b)*multiple
    
    def sub(a,b):
        return (a-b)*multiple
    
    if opr=='+':
        return add(n1,n2)
    else:
        return sub(n1,n2)
    
print(calculate(10,5,'+'))
30

函数式编程基础

函数类型

def calculate_fun(opr):
    def add(a,b):
        return a+b
    
    def sub(a,b):
        return a-b
    
    if opr=='+':
        return add
    else:
        return sub

f1=calculate_fun('+')
f2=calculate_fun('-')

print(type(f1))

print('10+5={0}'.format(f1(10,5)))
print('10-5={0}'.format(f2(10,5)))
<class 'function'>
10+5=15
10-5=5

Lamda表达式

def calculate_fun(opr):
    if opr=='+':
        return lambda a,b:(a+b)
    else:
        return lambda a,b:(a-b)

f1=calculate_fun('+')
f2=calculate_fun('-')

print(type(f1))

print('10+5={0}'.format(f1(10,5)))
print('10-5={0}'.format(f2(10,5)))
<class 'function'>
10+5=15
10-5=5

三大基础函数

filter()
users=['tony','tom','ben','alex']
users_filter=filter(lambda u:u.startswith('t'),users)
print(list(users_filter))
['tony', 'tom']
number_list=range(1,11)
number_filter=filter(lambda it:it%2==0,number_list)
print(list(number_filter))
[2, 4, 6, 8, 10]
map()
users=['tony','tom','ben','alex']
users_map=map(lambda u:u.lower(),users)
print(list(users_map))
['tony', 'tom', 'ben', 'alex']
users=['tony','tom','ben','alex']
users_filter=filter(lambda u:u.startswith('t'),users)
users_map=map(lambda u:u.lower(),filter(lambda u:u.startswith('t'),users))
print(list(users_map))
['tony', 'tom']
from functools import reduce
a={1,2,3,4}
a_reduce=reduce(lambda acc,i:acc+i,a)
print(a_reduce)
10

面向对象编程

面向对象概述oop

面向对象三个基本特性

封装性

继承性

多态性

类和对象

定义类

class 类名[(父类)]:
    类体
class Animal(object):
    
    pass

创建和使用对象

animal=Animal()
print(animal)
<__main__.Animal object at 0x00000222D7FA4160>

实例变量

class Animal(object):
    def __init__(self,age,sex,weight):
        self.age=age
        self.sex=sex
        self.weight=weight

animal=Animal(2,1,10.0)

print('age:{0}'.format(animal.age))
print('sex:{0}'.format('female' if animal.sex==0 else 'male'))
print('weight:{0}'.format(animal.weight))
age:2
sex:male
weight:10.0

类变量

class Account:
    interest_rate=0.0668
    
    def __init__(self,owner,amount):
        self.owner=owner
        self.amount=amount
        
account=Account('tony',1_800_000.0)
print('account:{0}'.format(account.owner))
print('amount:{0}'.format(account.amount))
print('interest_rate:{0}'.format(account.interest_rate))
account:tony
amount:1800000.0
interest_rate:0.0668

构造方法

class Animal(object):
    def __init__(self,age,sex=1,weight=0.0):
        self.age=age
        self.sex=sex
        self.weight=weight
        
a1=Animal(2,1,10.0)
a2=Animal(1,weight=5.0)
a3=Animal(1,sex=0)
print('age:{0}'.format(a1.age))
print('sex:{0}'.format('female' if a3.sex==0 else 'male'))
print('weight:{0}'.format(a2.weight))
age:2
sex:female
weight:5.0

实例方法

class Animal(object):
    def __init__(self,age,sex=1,weight=0.0):
        self.age=age
        self.sex=sex
        self.weight=weight
    def eat(self):
        self.weight+=0.05
        print('eat')
    def run(self):
        self.weight-=0.01
        print('run')
        
a1=Animal(2,0,10.0)
print(a1.weight)
a1.eat()
print(a1.weight)
a1.run()
print(a1.weight)
10.0
eat
10.05
run
10.040000000000001

类方法

class Account:
    interest_rate=0.0668
    
    def __init__(self,owner,amount):
        self.owner=owner
        self.amount=amount
        
    @classmethod
    def interest_by(cls,amt):
        return cls.interest_rate*amt
    
interest=Account.interest_by(12000.0)
print(interest)
801.6

静态方法

class Account:
    interest_rate=0.0668
    
    def __init__(self,owner,amount):
        self.owner=owner
        self.amount=amount
        
    @classmethod
    def interest_by(cls,amt):
        return cls.interest_rate*amt
    
    @staticmethod
    def interest_with(amt):
        return Account.interest_by(amt)
    
interest1=Account.interest_by(12000.0)
print(interest1)
interest2=Account.interest_with(12000.0)
print(interest2)
801.6
801.6

封装性

私有变量

class Animal(object):
    def __init__(self,age,sex=1,weight=0.0):
        self.age=age
        self.sex=sex
        self.__weight=weight
    def eat(self):
        self.weight+=0.05
        print('eat')
    def run(self):
        self.weight-=0.01
        print('run')
        
a1=Animal(2,0,10.0)
print(a1.weight)
a1.eat()
a1.run()
---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

Input In [4], in <cell line: 14>()
     11         print('run')
     13 a1=Animal(2,0,10.0)
---> 14 print(a1.weight)
     15 a1.eat()
     16 a1.run()


AttributeError: 'Animal' object has no attribute 'weight'

私有方法

class Animal(object):
    def __init__(self,age,sex=1,weight=0.0):
        self.age=age
        self.sex=sex
        self.__weight=weight
    def eat(self):
        self.__weight+=0.05
        print('eat')
    def __run(self):
        self.__weight-=0.01
        print('run')
        
a1=Animal(2,0,10.0)
a1.eat()
a1.run()
eat



---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

Input In [5], in <cell line: 15>()
     13 a1=Animal(2,0,10.0)
     14 a1.eat()
---> 15 a1.run()


AttributeError: 'Animal' object has no attribute 'run'

定义属性

class Animal(object):
    def __init__(self,age,sex=1,weight=0.0):
        self.age=age
        self.sex=sex
        self.__weight=weight
        
    def set_weight(self,weight):
        self.__weight=weight
    def get_weight(self):
        return self.__weight
    
a1=Animal(2,0,10.0)
print(a1.get_weight)
a1.set_weight(123.45)
print(a1.get_weight)
<bound method Animal.get_weight of <__main__.Animal object at 0x000002310CC31CA0>>
<bound method Animal.get_weight of <__main__.Animal object at 0x000002310CC31CA0>>
class Animal(object):
    def __init__(self,age,sex=1,weight=0.0):
        self.age=age
        self.sex=sex
        self.__weight=weight
        
    @property
    def weight(self):
        return self.__weight
    
    @weight.setter
    def weight(self,weight):
        self.__weight=weight
        
a1=Animal(2,0,10.0)
print(a1.weight)
a1.weight=123.45
print(a1.weight)
10.0
123.45

继承性

继承概念

class Person:
    def __init__(self,name,age):
        self.name=name
        self.age=age
        
    def info(self):
        template='Person[name={0},age={1}]'
        s=template.format(self.name,self.age)
        return s
    
class Student(Person):
    def __init__(self,name,age,school):
        super().__init__(name,age)
        self.school=school

重写方法

class Animal(object):
    def __init__(self,age,sex=1,weight=0.0):
        self.age=age
        self.sex=sex
        self.weight=weight
        
    def eat(self):
        self.weight+=0.05
        print('eat')
        
class Dog(Animal):
    def eat(self):
        self.weight+=0.1
        print('gougouchi...')
        
a1=Dog(2,0,10.0)
a1.eat()
gougouchi...

多继承

class ParentClass1:
    def run(self):
        print('ParentClass1 run...')
        
class ParentClass2:
    def run(self):
        print('ParentClass2 run...')
        
class SubClass1(ParentClass1,ParentClass2):
    pass

class SubClass2(ParentClass2,ParentClass1):
    pass

class SubClass3(ParentClass1,ParentClass2):
    def run(self):
        print('SubClass3 run...')
        
sub1=SubClass1()
sub1.run()

sub2=SubClass2()
sub2.run()

sub3=SubClass3()
sub3.run()
ParentClass1 run...
ParentClass2 run...
SubClass3 run...

多态性

多态概念

class Figure:
    def draw(self):
        print('draw figure...')
        
class Ellipse(Figure):
    def draw(self):
        print('draw Ellipse')
        
class Triangle(Figure):
    def draw(self):
        print('draw Triangle')
        
f1=Figure()
f1.draw()

f2=Ellipse()
f2.draw()

f3=Triangle()
f3.draw()
draw figure...
draw Ellipse
draw Triangle

类型检查

class Figure:
    def draw(self):
        print('draw figure...')
        
class Ellipse(Figure):
    def draw(self):
        print('draw Ellipse')
        
class Triangle(Figure):
    def draw(self):
        print('draw Triangle')
        
f1=Figure()
f1.draw()

f2=Ellipse()
f2.draw()

f3=Triangle()
f3.draw()


print(isinstance(f1,Triangle))
print(isinstance(f2,Triangle))
print(isinstance(f3,Triangle))
print(isinstance(f2,Figure))
draw figure...
draw Ellipse
draw Triangle
False
False
True
True

鸭子类型

class Animal(object):
    def run(self):
        print('animal run')
        
class Dog(Animal):
    def run(self):
        print('dog run')
        
class Car(object):
    def run(self):
        print('car run')
        
def go(animal):
    animal.run()
    
go(Animal())
go(Dog())
go(Car())
animal run
dog run
car run

Python根类——object

两个重要方法

  • str()
  • eq(other)

str()方法

class Person:
    def __init__(self,name,age):
        self.name=name
        self.age=age
        
    def __str__(self):
        template='Person [name={0},age={1}]'
        s=template.format(self.name,self.age)
        return s

person=Person('Tony',18)
print(person)
Person [name=Tony,age=18]

对象比较方法

class Person:
    def __init__(self,name,age):
        self.name=name
        self.age=age
        
    def __str__(self):
        template='Person [name={0},age={1}]'
        s=template.format(self.name,self.age)
        return s
    
    def __eq__(self,other):
        if self.name==other.name and self.age==other.age:
            return True
        else:
            return False
        
p1=Person('Tony',18)
p2=Person('Tony',18)

print(p1==p2)
True

枚举类

定义枚举类

import enum

class WeekDays(enum.Enum):
    MONDAY=1
    TUESDAY=2
    WEDNESDAY=3
    THURSDAY=4
    FRIDAY=5
    
day=WeekDays.FRIDAY

print(day)
print(day.value)
print(day.name)
WeekDays.FRIDAY
5
FRIDAY

限制枚举类

import enum

@enum.unique

class WeekDays(enum.IntEnum):
    MONDAY=1
    TUESDAY=2
    WEDNESDAY=3
    THURSDAY=4
    FRIDAY=5
    
day=WeekDays.FRIDAY

print(day)
print(day.value)
print(day.name)
WeekDays.FRIDAY
5
FRIDAY

使用枚举类

import enum

@enum.unique

class WeekDays(enum.IntEnum):
    MONDAY=1
    TUESDAY=2
    WEDNESDAY=3
    THURSDAY=4
    FRIDAY=5
    
day=WeekDays.FRIDAY

if day==WeekDays.MONDAY:
    print('work')
elif day==WeekDays.FRIDAY:
    print('study')
study

异常处理

常见异常

AttributeError异常

class Animal(object):
    pass
al=Animal()
al.run()
---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

Input In [3], in <cell line: 1>()
----> 1 al.run()


AttributeError: 'Animal' object has no attribute 'run'
print(al.age)
---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

Input In [4], in <cell line: 1>()
----> 1 print(al.age)


AttributeError: 'Animal' object has no attribute 'age'
print(Animal.weight)
---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

Input In [5], in <cell line: 1>()
----> 1 print(Animal.weight)


AttributeError: type object 'Animal' has no attribute 'weight'

OSError异常

f=open('abc.txt')
---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)

Input In [6], in <cell line: 1>()
----> 1 f=open('abc.txt')


FileNotFoundError: [Errno 2] No such file or directory: 'abc.txt'

IndexError异常

code_list=[125,56,89,36]
code_list[4]
---------------------------------------------------------------------------

IndexError                                Traceback (most recent call last)

Input In [7], in <cell line: 2>()
      1 code_list=[125,56,89,36]
----> 2 code_list[4]


IndexError: list index out of range

KeyError异常

访问字典里不存在的键时引发

dict1[104]
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Input In [9], in <cell line: 1>()
----> 1 dict1[104]


NameError: name 'dict1' is not defined

NameError异常

value1
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Input In [10], in <cell line: 1>()
----> 1 value1


NameError: name 'value1' is not defined
a=value1
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Input In [11], in <cell line: 1>()
----> 1 a=value1


NameError: name 'value1' is not defined
value1=10

TypeError异常

i='2'
print(5/i)
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

Input In [14], in <cell line: 1>()
----> 1 print(5/i)


TypeError: unsupported operand type(s) for /: 'int' and 'str'

ValueError异常

i='QWE'
print(5/int(i))
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Input In [16], in <cell line: 1>()
----> 1 print(5/int(i))


ValueError: invalid literal for int() with base 10: 'QWE'

捕获异常

try-except语句

import datetime as dt

def read_date(in_date):
    try:
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except ValueError:
        print('处理ValueError异常')


str_date='2018-8-18'
print('日期={0}'.format(read_date(str_date)))
日期=2018-08-18 00:00:00
def read_date(in_date):
    try:
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except ValueError as e:
        print('处理ValueError异常')
        print(e)
str_date='201B-8-18'
print('日期={0}'.format(read_date(str_date)))
处理ValueError异常
time data '201B-8-18' does not match format '%Y-%m-%d'
日期=None

多except代码块

import datetime as dt


def read_date_from_file(filename):
    try:
        file=open(filename)
        in_date=file.read()
        in_date=in_date.strip()
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except ValueError as e:
        print('处理ValueError异常')
        print(e)
    except FileNotFoundError as e:
        print('处理FileNotFoundError异常')
        print(e)
        
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
处理FileNotFoundError异常
[Errno 2] No such file or directory: 'read.txt'
日期=None
import datetime as dt


def read_date_from_file(filename):
    try:
        file=open(filename)
        in_date=file.read()
        in_date=in_date.strip()
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except ValueError as e:
        print('处理ValueError异常')
        print(e)
    except FileNotFoundError as e:
        print('处理FileNotFoundError异常')
        print(e)
    except OSError as e:
        print('处理OSError异常')
        print(e)
        
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
处理FileNotFoundError异常
[Errno 2] No such file or directory: 'read.txt'
日期=None

try-except语句嵌套

import datetime as dt

def read_date_from_file(filename):
    try:
        file=open(filename)
        try:
            in_date = file.read()
            in_date = in_date.strip()
            date = dt.datetime.strptime(in_date, '%Y-%m-%d')
            return date
        except ValueError as e:
            print('处理ValueError异常')
            print(e)
    except FileNotFoundError as e:
        print('处理FileNotFoundError异常')
        print(e)
    except OSError as e:
        print('处理OSError异常')
        print(e)
        
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
处理FileNotFoundError异常
[Errno 2] No such file or directory: 'read.txt'
日期=None

多重异常捕获

import datetime as dt
def read_date_from_file(filename):
    try:
        file=open(filename)
        in_date=file.read()
        in_date=in_date.strip()
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except (ValueError,OSError) as e:
        print('调用---')
        print(e)
        
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
调用---
[Errno 2] No such file or directory: 'read.txt'
日期=None

异常堆栈跟踪

import datetime as dt
import traceback as tb
def read_date_from_file(filename):
    try:
        file=open(filename)
        in_date=file.read()
        in_date=in_date.strip()
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except (ValueError,OSError) as e:
        print('调用---')
        print(e)
        tb.print_exc()
        
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
调用---
[Errno 2] No such file or directory: 'read.txt'
日期=None


Traceback (most recent call last):
  File "C:\Users\HP\AppData\Local\Temp\ipykernel_8772\538862610.py", line 5, in read_date_from_file
    file=open(filename)
FileNotFoundError: [Errno 2] No such file or directory: 'read.txt'

释放资源

finally代码块

import datetime as dt
def read_date_from_file(filename):
    try:
        file=open(filename)
        in_date=file.read()
        in_date=in_date.strip()
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except ValueError as e:
        print('处理ValueError异常')
        print(e)
    except FileNotFoundError as e:
        print('处理FileNotFoundError异常')
        print(e)
    except OSError as e:
        print('处理OSError异常')
        print(e)
    finally:
        file.close()
        
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
处理FileNotFoundError异常
[Errno 2] No such file or directory: 'read.txt'



---------------------------------------------------------------------------

UnboundLocalError                         Traceback (most recent call last)

Input In [7], in <cell line: 21>()
     18     finally:
     19         file.close()
---> 21 date=read_date_from_file('read.txt')
     22 print('日期={0}'.format(date))


Input In [7], in read_date_from_file(filename)
     17     print(e)
     18 finally:
---> 19     file.close()


UnboundLocalError: local variable 'file' referenced before assignment

else代码块

import datetime as dt
import traceback as tb

def read_date_from_file(filename):
    try:
        file=open(filename)
    except OSError as e:
        print('打开文件失败')
    else:
        print('打开文件成功')
        try:
            in_date = file.read()
            in_date = in_date.strip()
            date = dt.datetime.strptime(in_date, '%Y-%m-%d')
            return date
        except ValueError as e:
            print('处理ValueError异常')
            print(e)
        except OSError as e:
            print('处理OSError异常')
            print(e)
        finally:
            file.close()
            
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
打开文件失败
日期=None

with as 代码块自动资源管理

import datetime as dt
def read_date_from_file(filename):
    try:
        with open(filename) as file:
            in_date=file.read()
        in_date=in_date.strip()
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except ValueError as e:
        print('处理ValueError异常')
        print(e)
    except OSError as e:
        print('处理OSError异常')
        print(e)
        
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
处理OSError异常
[Errno 2] No such file or directory: 'read.txt'
日期=None

自定义异常类

class MyException(Exception):
    def __init__(self,message):
        super().__init__(message)

显式抛出异常

import datetime as dt
class MyException(Exception):
    def __init__(self,message):
        super().__init__(message)
        
def read_date_from_file(filename):
    try:
        file=open(filename)
        in_date=file.read()
        in_date=in_date.strip()
        date=dt.datetime.strptime(in_date,'%Y-%m-%d')
        return date
    except ValueError as e:
        raise MyException('不是有效日期')
    except FileNotFoundError as e:
        raise MyException('文件找不到')
    except OSError as e:
        raise MyException('文件无法打开或无法读取')

date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)

Input In [4], in read_date_from_file(filename)
      7 try:
----> 8     file=open(filename)
      9     in_date=file.read()


FileNotFoundError: [Errno 2] No such file or directory: 'read.txt'

During handling of the above exception, another exception occurred:


MyException                               Traceback (most recent call last)

Input In [4], in <cell line: 20>()
     17     except OSError as e:
     18         raise MyException('文件无法打开或无法读取')
---> 20 date=read_date_from_file('read.txt')
     21 print('日期={0}'.format(date))


Input In [4], in read_date_from_file(filename)
     14     raise MyException('不是有效日期')
     15 except FileNotFoundError as e:
---> 16     raise MyException('文件找不到')
     17 except OSError as e:
     18     raise MyException('文件无法打开或无法读取')


MyException: 文件找不到

常用模块

math模块

舍入函数

import math
math.ceil(1.4)
2
math.floor(1.4)
1
round(1.4)
1
math.ceil(1.5)
2
math.floor(1.5)
1
math.ceil(1.6)
2
math.floor(1.6)
1
round(1.5)
2
round(1.6)
2

幂和对数函数

math.log(8,2)
3.0
math.pow(2,3)
8.0
math.log(8)
2.0794415416798357
math.sqrt(1.6)
1.2649110640673518

三角函数

math.degrees(0.5*math.pi)
90.0
math.radians(180/math.pi)
1.0
a=math.radians(45/math.pi)
a
0.25
math.sin(a)
0.24740395925452294
math.asin(math.sin(a))
0.25
math.asin(0.2474)
0.24999591371483254
math.asin(0.24740395925452294)
0.25
math.cos(a)
0.9689124217106447
math.acos(0.9689124217106447)
0.2500000000000002
math.acos(math.cos(a))
0.2500000000000002
math.tan(a)
0.25534192122103627
math.atan(math.tan(a))
0.25
math.atan(0.25534192122103627)
0.25

random模块

import random
print('0.0<=x<1.0 random')
for i in range(0,10):
    x=random.random()
    print(x)
print('0<x<5 random')
for i in range(0,10):
    x=random.randrange(5)
    print(x)
print('05<=x<10 random')
for i in range(0,10):
    x=random.randrange(5,10)
    print(x)
print('05<=x<=10 random')
for i in range(0,10):
    x=random.randint(5,10)
    print(x)
0.0<=x<1.0 random
0.3905863037934756
0.8922407632329942
0.21352047760461534
0.5211523015401928
0.30030870435664747
0.9862984919490358
0.21171993560160762
0.6653280107488534
0.32488043176197134
0.3562099773397064
0<x<5 random
0
0
4
0
2
1
3
3
0
4
05<=x<10 random
7
8
7
6
8
5
9
8
7
7
05<=x<=10 random
5
5
8
7
7
9
9
8
7
5

datetime模块

datetime、date和time类

datetime类
import datetime
dt=datetime.datetime(2018,2,29)
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Input In [37], in <cell line: 1>()
----> 1 dt=datetime.datetime(2018,2,29)


ValueError: day is out of range for month
dt=datetime.datetime(2018,2,28)
dt
datetime.datetime(2018, 2, 28, 0, 0)
dt=datetime.datetime(2018,2,28,23,60,59,10000)
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Input In [40], in <cell line: 1>()
----> 1 dt=datetime.datetime(2018,2,28,23,60,59,10000)


ValueError: minute must be in 0..59
dt=datetime.datetime(2018,2,28,23,30,59,10000)
dt
datetime.datetime(2018, 2, 28, 23, 30, 59, 10000)
datetime.datetime.today()
datetime.datetime(2023, 3, 21, 18, 2, 6, 436821)
datetime.datetime.now()
datetime.datetime(2023, 3, 21, 18, 2, 32, 837270)
datetime.datetime.utcnow()
datetime.datetime(2023, 3, 21, 10, 2, 48, 100681)
datetime.datetime.fromtimestamp(999999999.999)
datetime.datetime(2001, 9, 9, 9, 46, 39, 999000)
datetime.datetime.utcfromtimestamp(999999999.999)
datetime.datetime(2001, 9, 9, 1, 46, 39, 999000)
date类
d=datetime.date(2018,2,29)
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Input In [48], in <cell line: 1>()
----> 1 d=datetime.date(2018,2,29)


ValueError: day is out of range for month
d=datetime.date(2018,2,28)
d
datetime.date(2018, 2, 28)
datetime.date.today()
datetime.date(2023, 3, 21)
datetime.date.fromtimestamp(999999999.999)
datetime.date(2001, 9, 9)
time类
datetime.time(24,59,58,1999)
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Input In [53], in <cell line: 1>()
----> 1 datetime.time(24,59,58,1999)


ValueError: hour must be in 0..23
datetime.time(23,59,58,1999)
datetime.time(23, 59, 58, 1999)

日期时间计算

datetime.date.today()
datetime.date(2023, 3, 21)
d=datetime.date.today()
delta=datetime.timedelta(10)
d+=delta
d
datetime.date(2023, 3, 31)
d=datetime.date(2018,1,1)
delta=datetime.timedelta(weeks=5)
d-=delta
d
datetime.date(2017, 11, 27)

日期时间格式化和解析

d=datetime.datetime.today()
d.strftime('%Y-%m-%d %H:%M:%S')
'2023-03-21 18:10:33'
d.strftime('%Y-%m-%d')
'2023-03-21'
str_date='2018-02-29 10:40:26'
date=datetime.datetime.strptime(in_date,'%Y-%m-%d %H:%M:%S')
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Input In [69], in <cell line: 1>()
----> 1 date=datetime.datetime.strptime(in_date,'%Y-%m-%d %H:%M:%S')


NameError: name 'in_date' is not defined
str_date='2018-02-28 10:40:26'
date=datetime.datetime.strptime(str_date,'%Y-%m-%d %H:%M:%S')
date
datetime.datetime(2018, 2, 28, 10, 40, 26)
date=datetime.datetime.strptime(str_date,'%Y-%m-%d')
---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

Input In [74], in <cell line: 1>()
----> 1 date=datetime.datetime.strptime(str_date,'%Y-%m-%d')


File E:\anaconda\lib\_strptime.py:568, in _strptime_datetime(cls, data_string, format)
    565 def _strptime_datetime(cls, data_string, format="%a %b %d %H:%M:%S %Y"):
    566     """Return a class cls instance based on the input string and the
    567     format string."""
--> 568     tt, fraction, gmtoff_fraction = _strptime(data_string, format)
    569     tzname, gmtoff = tt[-2:]
    570     args = tt[:6] + (fraction,)


File E:\anaconda\lib\_strptime.py:352, in _strptime(data_string, format)
    349     raise ValueError("time data %r does not match format %r" %
    350                      (data_string, format))
    351 if len(data_string) != found.end():
--> 352     raise ValueError("unconverted data remains: %s" %
    353                       data_string[found.end():])
    355 iso_year = year = None
    356 month = day = 1


ValueError: unconverted data remains:  10:40:26

时区

from datetime import datetime,timezone,timedelta
utc_dt=datetime(2008,8,19,23,59,59,tzinfo=timezone.utc)
utc_dt
datetime.datetime(2008, 8, 19, 23, 59, 59, tzinfo=datetime.timezone.utc)
utc_dt.strftime('%Y-%m-%d %H:%M:%S')
'2008-08-19 23:59:59'
utc_dt.strftime('%Y-%m-%d %H:%M:%S %z')
'2008-08-19 23:59:59 +0000'
bj_tz=timezone(offset=timedelta(hours=8),name='Asia/Beijing')
bj_tz
datetime.timezone(datetime.timedelta(seconds=28800), 'Asia/Beijing')
bj_dt=utc_dt.astimezone(bj_tz)
bj_dt
datetime.datetime(2008, 8, 20, 7, 59, 59, tzinfo=datetime.timezone(datetime.timedelta(seconds=28800), 'Asia/Beijing'))
bj_dt.strftime('%Y-%m-%d %H:%M:%S %Z')
'2008-08-20 07:59:59 Asia/Beijing'
bj_dt.strftime('%Y-%m-%d %H:%M:%S %z')
'2008-08-20 07:59:59 +0800'
bj_tz=timezone(timedelta(hours=8))
bj_dt=utc_dt.astimezone(bj_tz)
bj_dt.strftime('%Y-%m-%d %H:%M:%S %z')
'2008-08-20 07:59:59 +0800'

logging日志模块

日志级别

import logging
logging.basicConfig(level=logging.ERROR)

logging.debug('this is debug')
logging.info('this is info')
logging.warning('this is warning')
logging.error('this is error')
logging.critical('this is critical')
2023-03-21 20:15:10,230-MainThread-root-<cell line: 5>-INFO-this is info
2023-03-21 20:15:10,246-MainThread-root-<cell line: 6>-WARNING-this is warning
2023-03-21 20:15:10,247-MainThread-root-<cell line: 7>-ERROR-this is error
2023-03-21 20:15:10,248-MainThread-root-<cell line: 8>-CRITICAL-this is critical
import logging
logging.basicConfig(level=logging.DEBUG)
logger=logging.getLogger(__name__)

logger.debug('this is debug')
logger.info('this is info')
logger.warning('this is warning')
logger.error('this is error')
logger.critical('this is critical')
2023-03-21 20:14:59,694-MainThread-__main__-<cell line: 6>-INFO-this is info
2023-03-21 20:14:59,694-MainThread-__main__-<cell line: 7>-WARNING-this is warning
2023-03-21 20:14:59,698-MainThread-__main__-<cell line: 8>-ERROR-this is error
2023-03-21 20:14:59,700-MainThread-__main__-<cell line: 9>-CRITICAL-this is critical

日志信息格式化

import logging
logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s-%(threadName)s-'
                    '%(name)s-%(funcName)s-%(levelname)s-%(message)s')
logger=logging.getLogger(__name__)

logger.debug('this is debug')
logger.info('this is info')
logger.warning('this is warning')
logger.error('this is error')
logger.critical('this is critical')

def funlog():
    logger.info('enter funlog')
    
logger.info('use funlog')
funlog()
2023-03-21 20:14:51,110-MainThread-__main__-<cell line: 8>-INFO-this is info
2023-03-21 20:14:51,120-MainThread-__main__-<cell line: 9>-WARNING-this is warning
2023-03-21 20:14:51,122-MainThread-__main__-<cell line: 10>-ERROR-this is error
2023-03-21 20:14:51,123-MainThread-__main__-<cell line: 11>-CRITICAL-this is critical
2023-03-21 20:14:51,124-MainThread-__main__-<cell line: 16>-INFO-use funlog
2023-03-21 20:14:51,124-MainThread-__main__-funlog-INFO-enter funlog

日志重定位

import logging
logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s-%(threadName)s-'
                    '%(name)s-%(funcName)s-%(levelname)s-%(message)s')
logger=logging.getLogger(__name__)

logger.debug('this is debug')
logger.info('this is info')
logger.warning('this is warning')
logger.error('this is error')
logger.critical('this is critical')

def funlog():
    logger.info('enter funlog')
    
logger.info('use funlog')
funlog()
2023-03-21 20:17:26,157-MainThread-__main__-<cell line: 8>-INFO-this is info
2023-03-21 20:17:26,165-MainThread-__main__-<cell line: 9>-WARNING-this is warning
2023-03-21 20:17:26,166-MainThread-__main__-<cell line: 10>-ERROR-this is error
2023-03-21 20:17:26,167-MainThread-__main__-<cell line: 11>-CRITICAL-this is critical
2023-03-21 20:17:26,169-MainThread-__main__-<cell line: 16>-INFO-use funlog
2023-03-21 20:17:26,171-MainThread-__main__-funlog-INFO-enter funlog

使用配置文件

import logging
import logging.config

logging.config.fileConfig("logger.conf")
logger=logging.getLogger('loggerl')

logger.debug('this is debug')
logger.info('this is info')
logger.warning('this is warning')
logger.error('this is error')
logger.critical('this is critical')

def funlog():
    logger.info('enter funlog')
logger.info('use funlog')
funlog()
this is debug
this is info
this is warning
this is error
this is critical
use funlog
enter funlog

正则表达式

正则表达式字符串

  • 普通字符
  • 元字符

元字符

字符转义

开始与结束字符

import re

p1 = r'\w+@zhijieketang\.com'
p2 = r'^\w+@zhijieketang\.com$'

text = "Tony's email is tony_guan588@zhijieketang.com."
m = re.search(p1, text)
print(m)  ## 匹配
m = re.search(p2, text)
print(m)  ## 不匹配

email = 'tony_guan588@zhijieketang.com'
m = re.search(p2, email)
print(m)
<re.Match object; span=(16, 45), match='tony_guan588@zhijieketang.com'>
None
<re.Match object; span=(0, 29), match='tony_guan588@zhijieketang.com'>

字符类

定义字符类

import re

p = r'[Jj]ava'
## p = r'Java|java|JAVA'

m = re.search(p, 'I like Java and Python.')
print(m)  ## 匹配

m = re.search(p, 'I like JAVA and Python.')
print(m)  ## 不匹配

m = re.search(p, 'I like java and Python.')
print(m)  ## 匹配
<re.Match object; span=(7, 11), match='Java'>
None
<re.Match object; span=(7, 11), match='java'>

字符类取反

import re

p = r'[^0123456789]'

m = re.search(p, '1000')
print(m)  ## 不匹配

m = re.search(p, 'Python 3')
print(m)  ## 匹配
None
<re.Match object; span=(0, 1), match='P'>

区间

import re

m = re.search(r'[A-Za-z0-9]', 'A10.3')
print(m)  ## 匹配

m = re.search(r'[0-25-7]', 'A3489C')
print(m)  ## 不匹配
<re.Match object; span=(0, 1), match='A'>
None

预定义字符类

import re

## p = r'[^0123456789]'
p = r'\D'

m = re.search(p, '1000')
print(m)  ## 不匹配

m = re.search(p, 'Python 3')
print(m)  ## 匹配

text = '你们好Hello'
m = re.search(r'\w', text)
print(m)  ## 匹配
None
<re.Match object; span=(0, 1), match='P'>
<re.Match object; span=(0, 1), match='你'>

量词

量词的使用

import re

m = re.search(r'\d?', '87654321')  ## 出现数字一次
print(m)  ## 匹配字符'8'

m = re.search(r'\d?', 'ABC')  ## 出现数字零次
print(m)  ## 匹配字符''

m = re.search(r'\d*', '87654321')  ## 出现数字多次
print(m)  ## 匹配字符'87654321'

m = re.search(r'\d*', 'ABC')  ## 出现数字零次
print(m)  ## 匹配字符''

m = re.search(r'\d+', '87654321')  ## 出现数字多次
print(m)  ## 匹配字符'87654321'

m = re.search(r'\d+', 'ABC')
print(m)  ## 不匹配

m = re.search(r'\d{8}', '87654321')  ## 出现数字8次
print('8765432', m)  ## 匹配字符'87654321'

m = re.search(r'\d{8}', 'ABC')
print(m)  ## 不匹配

m = re.search(r'\d{7,8}', '87654321')  ## 出现数字8次
print(m)  ## 匹配字符'87654321'

m = re.search(r'\d{9,}', '87654321')
print(m)  ## 不匹配
<re.Match object; span=(0, 1), match='8'>
<re.Match object; span=(0, 0), match=''>
<re.Match object; span=(0, 8), match='87654321'>
<re.Match object; span=(0, 0), match=''>
<re.Match object; span=(0, 8), match='87654321'>
None
8765432 <re.Match object; span=(0, 8), match='87654321'>
None
<re.Match object; span=(0, 8), match='87654321'>
None

贪婪量词和懒惰量词

import re

## 使用贪婪量词
m = re.search(r'\d{5,8}', '87654321')  ## 出现数字8次
print(m)  ## 匹配字符'87654321'

## 使用惰性量词
m = re.search(r'\d{5,8}?', '87654321')  ## 出现数字5次
print(m)  ## 匹配字符'87654'
<re.Match object; span=(0, 8), match='87654321'>
<re.Match object; span=(0, 5), match='87654'>

分组

分组的使用

import re

p = r'(121){2}'
m = re.search(p, '121121abcabc')
print(m)  ## 匹配
print(m.group())  ## 返回匹配字符串
print(m.group(1))  ## 获得第一组内容

p = r'(\d{3,4})-(\d{7,8})'
m = re.search(p, '010-87654321')
print(m)  ## 匹配
print(m.group())  ## 返回匹配字符串
print(m.groups())  ## 获得所有组内容
<re.Match object; span=(0, 6), match='121121'>
121121
121
<re.Match object; span=(0, 12), match='010-87654321'>
010-87654321
('010', '87654321')

分组命名

import re

p = r'(?P<area_code>\d{3,4})-(?P<phone_code>\d{7,8})'
m = re.search(p, '010-87654321')
print(m)  ## 匹配
print(m.group())  ## 返回匹配字符串
print(m.groups())  ## 获得所有组内容

## 通过组编号返回组内容
print(m.group(1))
print(m.group(2))

## 通过组名返回组内容
print(m.group('area_code'))
print(m.group('phone_code'))
<re.Match object; span=(0, 12), match='010-87654321'>
010-87654321
('010', '87654321')
010
87654321
010
87654321

反向引用分组

import re

## p = r'<([\w]+)>.*</([\w]+)>'
p = r'<([\w]+)>.*</\1>'  ## 使用反向引用

m = re.search(p, '<a>abc</a>')
print(m)  ## 匹配

m = re.search(p, '<a>abc</b>')
print(m)  ## 不匹配
<re.Match object; span=(0, 10), match='<a>abc</a>'>
None

非捕获分组

import re

s = 'img1.jpg,img2.jpg,img3.bmp'

#捕获分组
p = r'\w+(\.jpg)'
mlist = re.findall(p, s)
print(mlist)

#非捕获分组
p = r'\w+(?:\.jpg)'
mlist = re.findall(p, s)
print(mlist)
['.jpg', '.jpg']
['img1.jpg', 'img2.jpg']

re模块

search()和match()函数

import re

p = r'\w+@zhijieketang\.com'

text = "Tony's email is tony_guan588@zhijieketang.com."
m = re.search(p, text)
print(m)  ## 匹配

m = re.match(p, text)
print(m)  ## 不匹配

email = 'tony_guan588@zhijieketang.com'
m = re.search(p, email)
print(m)  ## 匹配

m = re.match(p, email)
print(m)  ## 匹配

## match对象几个方法
print('match对象几个方法:')
print(m.group())
print(m.start())
print(m.end())
print(m.span())
<re.Match object; span=(16, 45), match='tony_guan588@zhijieketang.com'>
None
<re.Match object; span=(0, 29), match='tony_guan588@zhijieketang.com'>
<re.Match object; span=(0, 29), match='tony_guan588@zhijieketang.com'>
match对象几个方法:
tony_guan588@zhijieketang.com
0
29
(0, 29)

findall()和finditer()函数

import re

p = r'[Jj]ava'
text = 'I like Java and java.'

match_list = re.findall(p, text)
print(match_list)  ## 匹配

match_iter = re.finditer(p, text)
for m in match_iter:
    print(m.group())
['Java', 'java']
Java
java

字符串分割

import re

p = r'\d+'
text = 'AB12CD34EF'

clist = re.split(p, text)
print(clist)

clist = re.split(p, text, maxsplit=1)
print(clist)

clist = re.split(p, text, maxsplit=2)
print(clist)
['AB', 'CD', 'EF']
['AB', 'CD34EF']
['AB', 'CD', 'EF']

字符串替换

import re

p = r'\d+'
text = 'AB12CD34EF'

repace_text = re.sub(p, ' ', text)
print(repace_text)

repace_text = re.sub(p, ' ', text, count=1)
print(repace_text)

repace_text = re.sub(p, ' ', text, count=2)
print(repace_text)
AB CD EF
AB CD34EF
AB CD EF

编译正则表达式

re.compile(pattern[,flags=0])

已编译正则表达式对象

import re

p = r'\w+@zhijieketang\.com'
regex = re.compile(p)

text = "Tony's email is tony_guan588@zhijieketang.com."
m = regex.search(text)
print(m)  ## 匹配

m = regex.match(text)
print(m)  ## 不匹配

p = r'[Jj]ava'
regex = re.compile(p)
text = 'I like Java and java.'

match_list = regex.findall(text)
print(match_list)  ## 匹配

match_iter = regex.finditer(text)
for m in match_iter:
    print(m.group())

p = r'\d+'
regex = re.compile(p)
text = 'AB12CD34EF'

clist = regex.split(text)
print(clist)

repace_text = regex.sub(' ', text)
print(repace_text)
<re.Match object; span=(16, 45), match='tony_guan588@zhijieketang.com'>
None
['Java', 'java']
Java
java
['AB', 'CD', 'EF']
AB CD EF

编译标志

ASCII和Unicode
import re

text = '你们好Hello'

p = r'\w+'
regex = re.compile(p, re.U)

m = regex.search(text)
print(m)  ## 匹配

m = regex.match(text)
print(m)  ## 匹配

regex = re.compile(p, re.A)

m = regex.search(text)
print(m)  ## 匹配

m = regex.match(text)
print(m)  ## 不匹配
<re.Match object; span=(0, 8), match='你们好Hello'>
<re.Match object; span=(0, 8), match='你们好Hello'>
<re.Match object; span=(3, 8), match='Hello'>
None
忽略大小写
import re

p = r'(java).*(python)'
regex = re.compile(p, re.I)

m = regex.search('I like Java and Python.')
print(m)  ## 匹配

m = regex.search('I like JAVA and Python.')
print(m)  ## 匹配

m = regex.search('I like java and Python.')
print(m)  ## 匹配
<re.Match object; span=(7, 22), match='Java and Python'>
<re.Match object; span=(7, 22), match='JAVA and Python'>
<re.Match object; span=(7, 22), match='java and Python'>
点元字符匹配换行符
import re

p = r'.+'
regex = re.compile(p)

m = regex.search('Hello\nWorld.')
print(m)  ## 匹配

regex = re.compile(p, re.DOTALL)

m = regex.search('Hello\nWorld.')
print(m)  ## 匹配
<re.Match object; span=(0, 5), match='Hello'>
<re.Match object; span=(0, 12), match='Hello\nWorld.'>
多行模式
import re

p = r'^World'
regex = re.compile(p)

m = regex.search('Hello\nWorld.')
print(m)  ## 不匹配

regex = re.compile(p, re.M)

m = regex.search('Hello\nWorld.')
print(m)  ## 匹配
None
<re.Match object; span=(6, 11), match='World'>
详细模式
import re

p = """(java)     #匹配java字符串
        .*        #匹配任意字符零或多个
        (python)  #匹配python字符串
    """

regex = re.compile(p, re.I | re.VERBOSE)

m = regex.search('I like Java and Python.')
print(m)  ## 匹配

m = regex.search('I like JAVA and Python.')
print(m)  ## 匹配

m = regex.search('I like java and Python.')
print(m)  ## 匹配
<re.Match object; span=(7, 22), match='Java and Python'>
<re.Match object; span=(7, 22), match='JAVA and Python'>
<re.Match object; span=(7, 22), match='java and Python'>

数据交换格式

CSV数据交换格式

reader()函数

import csv

with open('data/books.csv', 'r',  encoding='gbk') as rf:
    reader = csv.reader(rf, dialect=csv.excel)
    for row in reader:
        print('|'.join(row))
1|软件工程|戴国强|机械工业出版社|19980528|2
2|汇编语言|李利光|北京大学出版社|19980318|2
3|计算机基础|王飞|经济科学出版社|19980218|1
4|FLASH精选|刘扬|中国纺织出版社|19990312|2
5|java基础|王一|电子工业出版社|19990528|3
6|世界杯|柳飞|世界出版社|19990412|2
7|JAVA程序设计|张余|人民邮电出版社|19990613|1
8|新概念3|余智|外语出版社|19990723|2
9|军事要闻|张强|解放军出版社|19990722|3
10|大众生活|许阳|电子出版社|19990819|3
11|南方旅游|王爱国|南方出版社|19990930|2
13|幽灵|钱力华|华光出版社|19991008|1
14|期货分析|孙宝|飞鸟出版社|19991122|3
15|人工智能|周未|机械工业出版社|19991223|3
16|数据库系统概念|吴红|机械工业出版社|20000328|3
17|计算机理论基础|戴家|机械工业出版社|20000218|4
18|编译原理|郑键|机械工业出版社|20000415|2
19|通讯与网络|欧阳杰|机械工业出版社|20000517|1
20|现代操作系统|王小国|机械工业出版社|20010128|1
21|网络基础|王大尉|北京大学出版社|20000617|1
22|万紫千红|丛丽|北京大学出版社|20000702|3
23|经济概论|思佳|北京大学出版社|20000819|3
24|经济与科学|毛波|经济科学出版社|20000923|2
25|计算机体系结构|方丹|机械工业出版社|20000328|4
26|软件工程|牛田|经济科学出版社|20000328|4
27|世界语言大观|候丙辉|经济科学出版社|20000814|2
28|高级语言程序设计|寇国华|清华大学出版社|20000117|3
29|操作系统概论|聂元名|清华大学出版社|20001028|1
30|数据库及应用|孙家萧|清华大学出版社|20000619|1
31|软件工程|戴志名|电子工业出版社|20000324|3
32|SOL使用手册|贺民|电子工业出版社|19990425|2
33|模拟电路|邓英才|电子工业出版社|20000527|2
34|集邮爱好者|李云|人民邮电出版社|20000630|1
36|高等数学|李放|人民邮电出版社|20000812|1
37|南方周末|邓光明|南方出版社|20000923|3
38|十大旅游胜地|潭晓明|南方出版社|20000403|2
39|黑幕|李仪|华光出版社|20000508|24

writer()函数

import csv

with open('data/books.csv', 'r', encoding='gbk') as rf:
    reader = csv.reader(rf)
    with open('data/books2.csv', 'w', newline='', encoding='gbk') as wf:
        writer = csv.writer(wf, delimiter='\t')
        for row in reader:
            print('|'.join(row))
            writer.writerow(row)
1|软件工程|戴国强|机械工业出版社|19980528|2
2|汇编语言|李利光|北京大学出版社|19980318|2
3|计算机基础|王飞|经济科学出版社|19980218|1
4|FLASH精选|刘扬|中国纺织出版社|19990312|2
5|java基础|王一|电子工业出版社|19990528|3
6|世界杯|柳飞|世界出版社|19990412|2
7|JAVA程序设计|张余|人民邮电出版社|19990613|1
8|新概念3|余智|外语出版社|19990723|2
9|军事要闻|张强|解放军出版社|19990722|3
10|大众生活|许阳|电子出版社|19990819|3
11|南方旅游|王爱国|南方出版社|19990930|2
13|幽灵|钱力华|华光出版社|19991008|1
14|期货分析|孙宝|飞鸟出版社|19991122|3
15|人工智能|周未|机械工业出版社|19991223|3
16|数据库系统概念|吴红|机械工业出版社|20000328|3
17|计算机理论基础|戴家|机械工业出版社|20000218|4
18|编译原理|郑键|机械工业出版社|20000415|2
19|通讯与网络|欧阳杰|机械工业出版社|20000517|1
20|现代操作系统|王小国|机械工业出版社|20010128|1
21|网络基础|王大尉|北京大学出版社|20000617|1
22|万紫千红|丛丽|北京大学出版社|20000702|3
23|经济概论|思佳|北京大学出版社|20000819|3
24|经济与科学|毛波|经济科学出版社|20000923|2
25|计算机体系结构|方丹|机械工业出版社|20000328|4
26|软件工程|牛田|经济科学出版社|20000328|4
27|世界语言大观|候丙辉|经济科学出版社|20000814|2
28|高级语言程序设计|寇国华|清华大学出版社|20000117|3
29|操作系统概论|聂元名|清华大学出版社|20001028|1
30|数据库及应用|孙家萧|清华大学出版社|20000619|1
31|软件工程|戴志名|电子工业出版社|20000324|3
32|SOL使用手册|贺民|电子工业出版社|19990425|2
33|模拟电路|邓英才|电子工业出版社|20000527|2
34|集邮爱好者|李云|人民邮电出版社|20000630|1
36|高等数学|李放|人民邮电出版社|20000812|1
37|南方周末|邓光明|南方出版社|20000923|3
38|十大旅游胜地|潭晓明|南方出版社|20000403|2
39|黑幕|李仪|华光出版社|20000508|24

XML数据交换格式

XML文档结构

  1. 声明
  2. 根元素
  3. 子元素
  4. 属性
  5. 命名空间
  6. 限定名

解析XML文档

import xml.etree.ElementTree as ET

tree = ET.parse('data1/Notes.xml')  ## 创建XML文档树
print(type(tree))  ## xml.etree.ElementTree.ElementTree

root = tree.getroot()  ## root是根元素
print(type(root))  ## xml.etree.ElementTree.Element
print(root.tag)  ## Notes

for index, child in enumerate(root):
    print('第{0}个{1}元素,属性:{2}'.format(index, child.tag, child.attrib))
    for i, child_child in enumerate(child):
        print('    标签:{0},内容:{1}'.format(child_child.tag, child_child.text))
<class 'xml.etree.ElementTree.ElementTree'>
<class 'xml.etree.ElementTree.Element'>
Notes
第0个Note元素,属性:{'id': '1'}
    标签:CDate,内容:2018-3-21
    标签:Content,内容:发布Python0
    标签:UserID,内容:tony
第1个Note元素,属性:{'id': '2'}
    标签:CDate,内容:2018-3-22
    标签:Content,内容:发布Python1
    标签:UserID,内容:tony
第2个Note元素,属性:{'id': '3'}
    标签:CDate,内容:2018-3-23
    标签:Content,内容:发布Python2
    标签:UserID,内容:tony
第3个Note元素,属性:{'id': '4'}
    标签:CDate,内容:2018-3-24
    标签:Content,内容:发布Python3
    标签:UserID,内容:tony
第4个Note元素,属性:{'id': '5'}
    标签:CDate,内容:2018-3-25
    标签:Content,内容:发布Python4
    标签:UserID,内容:tony

XPath

  1. find(match,namespace=None)
  2. findall(match,namespace=None)
  3. findtext(match,default=None,namespace=None)
import xml.etree.ElementTree as ET

tree = ET.parse('data1/Notes.xml')
root = tree.getroot()

node = root.find("./Note")  ## 当前节点下的第一个Note子节点
print(node.tag, node.attrib)
node = root.find("./Note/CDate")  ## Note子节点下的第一个CDate节点
print(node.text)
node = root.find("./Note/CDate/..")  ## Note节点
print(node.tag, node.attrib)
node = root.find(".//CDate")  ## 当前节点查找所有后代节点中第一个CDate节点
print(node.text)

node = root.find("./Note[@id]")  ## 具有id属性Note节点
print(node.tag, node.attrib)

node = root.find("./Note[@id='2']")  ## id属性等于'2'的Note节点
print(node.tag, node.attrib)

node = root.find("./Note[2]")  ## 第二个Note节点
print(node.tag, node.attrib)

node = root.find("./Note[last()]")  ## 最后一个Note节点
print(node.tag, node.attrib)

node = root.find("./Note[last()-2]")  ## 倒数第三个Note节点
print(node.tag, node.attrib)
Note {'id': '1'}
2018-3-21
Note {'id': '1'}
2018-3-21
Note {'id': '1'}
Note {'id': '2'}
Note {'id': '2'}
Note {'id': '5'}
Note {'id': '3'}

JSON数据交换格式

JSON文档结构

JSON数据编码

import json

## 准备数据
py_dict = {'name': 'tony', 'age': 30, 'sex': True}  ## 创建字典对象
py_list = [1, 3]  ## 创建列表对象
py_tuple = ('A', 'B', 'C')  ## 创建元组对象

py_dict['a'] = py_list  ## 添加列表到字典中
py_dict['b'] = py_tuple  ## 添加元组到字典中

print(py_dict)
print(type(py_dict))  ## <class 'dict'>

## 编码过程
json_obj = json.dumps(py_dict)
print(json_obj)
print(type(json_obj))  ## <class 'str'>

## 编码过程
json_obj = json.dumps(py_dict, indent=4)
## 输出格式化后的字符串
print(json_obj)

## 写入JSON数据到data1.json文件
with open('data2/data1.json', 'w') as f:
    json.dump(py_dict, f)

## 写入JSON数据到data2.json文件
with open('data2/data2.json', 'w') as f:
    json.dump(py_dict, f, indent=4)
{'name': 'tony', 'age': 30, 'sex': True, 'a': [1, 3], 'b': ('A', 'B', 'C')}
<class 'dict'>
{"name": "tony", "age": 30, "sex": true, "a": [1, 3], "b": ["A", "B", "C"]}
<class 'str'>
{
    "name": "tony",
    "age": 30,
    "sex": true,
    "a": [
        1,
        3
    ],
    "b": [
        "A",
        "B",
        "C"
    ]
}

JSON数据解码

import json

## 准备数据
json_obj = r'{"name": "tony", "age": 30, "sex": true, "a": [1, 3], "b": ["A", "B", "C"]}'
#json_obj = "{'name': 'tony', 'age': 30, 'sex': true, 'a': [1, 3], 'b': ['A', 'B', 'C']}"

py_dict = json.loads(json_obj)
print(type(py_dict))  ## <class 'dict'>
print(py_dict['name'])
print(py_dict['age'])
print(py_dict['sex'])

py_lista = py_dict['a']  ## 取出列表对象
print(py_lista)
py_listb = py_dict['b']  ## 取出列表对象
print(py_listb)

## 读取JSON数据到data2.json文件
with open('data2/data2.json', 'r') as f:
    data = json.load(f)
    print(data)
    print(type(data))  ## <class 'dict'>
<class 'dict'>
tony
30
True
[1, 3]
['A', 'B', 'C']
{'name': 'tony', 'age': 30, 'sex': True, 'a': [1, 3], 'b': ['A', 'B', 'C']}
<class 'dict'>

配置文件

配置文件结构

读取配置文件

import configparser

config = configparser.ConfigParser()  ## 创建配置解析器对象

config.read('data3/Setup.ini', encoding='utf-8')  ## 读取并解析配置文件

print(config.sections())  ## 返回所有的节

section1 = config['Startup']  ## 返回Startup节
print(config.options('Startup'))

print(section1['RequireOS'])
print(section1['RequireIE'])

print(config['Product']['msi'])

print(config['Windows 2000']['MajorVersion'])  ## 返回MajorVersion数据
print(config['Windows 2000']['ServicePackMajor'])

value = config.get('Windows 2000', 'MajorVersion')  ## 返回MajorVersion数据
print(type(value))  ## <class 'str'>

value = config.getint('Windows 2000', 'MajorVersion')  ## 返回MajorVersion数据
print(type(value))  ## <class 'int'>
['Startup', 'Product', 'Windows 2000']
['requireos', 'requiremsi', 'requireie']
Windows 2000
6.0.2600.0
AcroRead.msi
5
4
<class 'str'>
<class 'int'>

写入配置文件

import configparser

config = configparser.ConfigParser()  ## 创建配置解析器对象

config.read('data3/Setup.ini', encoding='utf-8')  ## 读取并解析配置文件

## 写入配置文件
config['Startup']['RequireMSI'] = '8.0'
config['Product']['RequireMSI'] = '4.0'

config.add_section('Section2')   #添加节
config.set('Section2', 'name', 'Mac')   #添加配置项

with open('data3/Setup.ini', 'w') as fw:
    config.write(fw)

数据库编程

数据持久化技术概述

  1. 文本文件
  2. 数据库

MySQL数据库管理系统

Python DB-API

建立数据连接

创建游标

案例:MySQL数据库CURD操作

安装PyMySQL模块

数据查询操作

有条件查询实现代码
import pymysql

## 1. 建立数据库连接
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='986370165',
                             database='MyDB',
                             charset='utf8')

try:
    ## 2. 创建游标对象
    with connection.cursor() as cursor:

        ## 3. 执行SQL操作
        ## sql = 'select name, userid from user where userid >%s'
        ## cursor.execute(sql, [0])
        sql = 'select name, userid from user where userid >%(id)s'
        cursor.execute(sql, {'id': 0})

        ## 4. 提取结果集
        result_set = cursor.fetchall()

        for row in result_set:
            print('id:{0} - name:{1}'.format(row[1], row[0]))

    ## with代码块结束 5. 关闭游标

finally:
    ## 6. 关闭数据连接
    connection.close()
id:1 - name:Tom
id:2 - name:Ben
无条件查询实现代码
import pymysql

## 1. 建立数据库连接
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='986370165',
                             database='MyDB',
                             charset='utf8')

try:
    ## 2. 创建游标对象
    with connection.cursor() as cursor:

        ## 3. 执行SQL操作
        sql = 'select max(userid) from user'
        cursor.execute(sql)

        ## 4. 提取结果集
        row = cursor.fetchone()

        if row is not None:
            print('最大用户Id :{0}'.format(row[0]))

    ## with代码块结束 5. 关闭游标

finally:
    ## 6. 关闭数据连接
    connection.close()
最大用户Id :2

数据修改操作

  1. 数据插入
import pymysql


## 查询最大用户Id
def read_max_userid():
    ## 1. 建立数据库连接
    connection = pymysql.connect(host='localhost',
                                 user='root',
                                 password='986370165',
                                 database='MyDB',
                                 charset='utf8')

    try:
        ## 2. 创建游标对象
        with connection.cursor() as cursor:

            ## 3. 执行SQL操作
            sql = 'select max(userid) from user'
            cursor.execute(sql)

            ## 4. 提取结果集
            row = cursor.fetchone()

            if row is not None:
                print('最大用户Id :{0}'.format(row[0]))
                return row[0]

        ## with代码块结束 5. 关闭游标

    finally:
        ## 6. 关闭数据连接
        connection.close()


## 1. 建立数据库连接
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='986370165',
                             database='MyDB',
                             charset='utf8')

## 查询最大值
maxid = read_max_userid()

try:
    ## 2. 创建游标对象
    with connection.cursor() as cursor:

        ## 3. 执行SQL操作
        sql = 'insert into user (userid, name) values (%s,%s)'
        nextid = maxid + 1
        name = 'Tony' + str(nextid)
        affectedcount = cursor.execute(sql, (nextid, name))

        print('影响的数据行数:{0}'.format(affectedcount))
        ## 4. 提交数据库事物
        connection.commit()

    ## with代码块结束 5. 关闭游标

except pymysql.DatabaseError:
    ## 4. 回滚数据库事物
    connection.rollback()
finally:
    ## 6. 关闭数据连接
    connection.close()
最大用户Id :2
影响的数据行数:1
  1. 数据更新
import pymysql

## 1. 建立数据库连接
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='986370165',
                             database='MyDB',
                             charset='utf8')

try:
    ## 2. 创建游标对象
    with connection.cursor() as cursor:

        ## 3. 执行SQL操作
        sql = 'update user set name = %s where userid > %s'
        affectedcount = cursor.execute(sql, ('Tom', 2))

        print('影响的数据行数:{0}'.format(affectedcount))
        ## 4. 提交数据库事物
        connection.commit()

    ## with代码块结束 5. 关闭游标

except pymysql.DatabaseError as e:
    ## 4. 回滚数据库事物
    connection.rollback()
    print(e)
finally:
    ## 6. 关闭数据连接
    connection.close()
影响的数据行数:1
  1. 数据删除
import pymysql


## 查询最大用户Id
def read_max_userid():
    ## 1. 建立数据库连接
    connection = pymysql.connect(host='localhost',
                                 user='root',
                                 password='986370165',
                                 database='MyDB',
                                 charset='utf8')

    try:
        ## 2. 创建游标对象
        with connection.cursor() as cursor:

            ## 3. 执行SQL操作
            sql = 'select max(userid) from user'
            cursor.execute(sql)

            ## 4. 提取结果集
            row = cursor.fetchone()

            if row is not None:
                print('最大用户Id :{0}'.format(row[0]))
                return row[0]

        ## with代码块结束 5. 关闭游标

    finally:
        ## 6. 关闭数据连接
        connection.close()


## 1. 建立数据库连接
connection = pymysql.connect(host='localhost',
                             user='root',
                             password='986370165',
                             database='MyDB',
                             charset='utf8')

## 查询最大值
maxid = read_max_userid()

try:
    ## 2. 创建游标对象
    with connection.cursor() as cursor:

        ## 3. 执行SQL操作
        sql = 'delete from user where userid = %s'
        affectedcount = cursor.execute(sql, (maxid))

        print('影响的数据行数:{0}'.format(affectedcount))
        ## 4. 提交数据库事物
        connection.commit()

    ## with代码块结束 5. 关闭游标

except pymysql.DatabaseError:
    ## 4. 回滚数据库事物
    connection.rollback()
finally:
    ## 6. 关闭数据连接
    connection.close()
最大用户Id :3
影响的数据行数:1

NoSQL数据存储

dbm数据库的打开和关闭

dbm.open(file,flag=’r’)

‘r’,’w’,’c’,’n’

with dbm.open(file,’c’) as db:

pass

dbm数据存储

import dbm

with dbm.open('mydb', 'c') as db:
    db['name'] = 'tony'  ## 更新数据
    print(db['name'].decode())  ## 取出数据

    age = int(db.get('age', b'18').decode())  ## 取出数据
    print(age)

    if 'age' in db:  ## 判断是否存在age数据
        db['age'] = '20'  ## 或者 b'20'

    del db['name']  ## 删除name数据
tony
18

wxPython图形用户界面编程

Python图形用户界面开发工具包

  • Tkinter
  • PyQt
  • wxPython

wxPython安装

wxPython基础

  • 窗口
  • 控件
  • 事件处理
  • 布局管理

wxPython类层次结构

第一个wxPython程序

import wx

## 创建应用程序对象
app = wx.App()
## 创建窗口对象
frm = wx.Frame(None, title="第一个GUI程序!", size=(400, 300), pos=(100, 100))

frm.Show()  ## 显示窗口

app.MainLoop()  ## 进入主事件循环
0
import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title="第一个GUI程序!", size=(400, 300), pos=(100, 100))
        ## TODO


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True

    def OnExit(self):
        print('应用程序退出')
        return 0


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环
应用程序退出
import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title="第一个GUI程序!", size=(400, 300))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(parent=self)
        statictext = wx.StaticText(parent=panel, label='Hello World!', pos=(10, 10))


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

wxPython界面构建层次结构

事件处理

一对一事件处理

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='一对一事件处理', size=(300, 180))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(parent=self)
        self.statictext = wx.StaticText(parent=panel, pos=(110, 20))
        b = wx.Button(parent=panel, label='OK', pos=(100, 50))
        self.Bind(wx.EVT_BUTTON, self.on_click, b)

    def on_click(self, event):
        print(type(event))  ## <class 'wx._core.CommandEvent'>
        self.statictext.SetLabelText('Hello, world.')


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环
<class 'wx._core.CommandEvent'>
<class 'wx._core.CommandEvent'>
<class 'wx._core.CommandEvent'>

一对多事件处理

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='一对多事件处理', size=(300, 180))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(parent=self)
        self.statictext = wx.StaticText(parent=panel, pos=(110, 15))
        b1 = wx.Button(parent=panel, id=10, label='Button1', pos=(100, 45))
        b2 = wx.Button(parent=panel, id=11, label='Button2', pos=(100, 85))
        ## self.Bind(wx.EVT_BUTTON, self.on_click, b1)
        ## self.Bind(wx.EVT_BUTTON, self.on_click, id=11)
        self.Bind(wx.EVT_BUTTON, self.on_click, id=10, id2=20)

    def on_click(self, event):
        event_id = event.GetId()
        print(event_id)
        if event_id == 10:
            self.statictext.SetLabelText('Button1单击')
        else:
            self.statictext.SetLabelText('Button2单击')


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环
10
11
10
11
10
11
10
11

示例:鼠标事件处理

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title="鼠标事件处理", size=(400, 300))
        self.Centre()  ## 设置窗口居中
        self.Bind(wx.EVT_LEFT_DOWN, self.on_left_down)
        self.Bind(wx.EVT_LEFT_UP, self.on_left_up)
        self.Bind(wx.EVT_MOTION, self.on_mouse_move)

    def on_left_down(self, evt):
        print('鼠标按下')

    def on_left_up(self, evt):
        print('鼠标释放')

    def on_mouse_move(self, event):
        if event.Dragging() and event.LeftIsDown():
            pos = event.GetPosition()
            print(pos)


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环
鼠标按下
鼠标释放
鼠标按下
鼠标释放
鼠标按下
鼠标释放
鼠标按下
(129, 99)
鼠标释放
鼠标按下
鼠标释放
鼠标按下
鼠标释放
鼠标按下
(58, 114)
(60, 115)
(61, 116)
(62, 117)
(63, 117)
(64, 117)
(66, 118)
(67, 119)
(69, 119)
(73, 119)
(79, 119)
鼠标释放
鼠标按下
鼠标释放
鼠标按下
(81, 169)
(75, 170)
(72, 170)
(68, 171)
(65, 171)
(63, 171)
(61, 171)
(60, 171)
鼠标释放
鼠标按下
鼠标释放
鼠标按下
鼠标释放
鼠标按下
(201, 55)
(202, 55)
(204, 57)
(206, 59)
(208, 61)
(211, 63)
(214, 65)
(217, 68)
(221, 71)
(224, 74)
(228, 77)
(232, 80)
(235, 83)
(239, 85)
(241, 87)
(243, 88)
(245, 89)
(246, 91)
(249, 92)
(251, 92)
(252, 93)
(253, 93)
(254, 93)
(255, 93)
(256, 93)
(257, 94)
(259, 94)
(260, 94)
(261, 94)
(262, 94)
(264, 94)
(265, 94)
(266, 94)
(267, 94)
(269, 94)
(270, 94)
(272, 94)
(273, 94)
(275, 94)
(276, 94)
(277, 94)
(277, 93)
(278, 92)
(279, 91)
(279, 90)
(279, 89)
(279, 88)
(279, 87)
(280, 86)
(280, 85)
(280, 84)
(280, 83)
(279, 83)
(278, 84)
(277, 85)
(274, 87)
(272, 88)
(268, 91)
(264, 94)
(259, 97)
(253, 102)
(247, 107)
(240, 111)
(233, 116)
(227, 120)
(222, 123)
(219, 125)
(215, 128)
(211, 131)
(207, 133)
(201, 135)
(197, 137)
(194, 138)
(190, 139)
(186, 140)
(184, 141)
(180, 141)
(177, 141)
(175, 141)
(171, 141)
(169, 141)
(166, 140)
(162, 139)
(158, 137)
(154, 135)
(153, 133)
(149, 131)
(143, 127)
(138, 123)
(133, 120)
(129, 116)
(125, 113)
(121, 108)
(117, 104)
(114, 100)
(112, 97)
(111, 94)
(108, 88)
(106, 84)
(105, 80)
(105, 77)
(105, 73)
(105, 70)
(106, 67)
(107, 63)
(108, 61)
(110, 58)
(112, 55)
(114, 53)
(116, 51)
(119, 48)
(122, 46)
(125, 44)
(128, 43)
(132, 41)
(135, 40)
(140, 39)
(145, 38)
(150, 38)
(155, 37)
(161, 37)
(166, 37)
(171, 37)
(175, 37)
(179, 37)
(181, 37)
(185, 38)
(189, 40)
(191, 41)
(194, 43)
(197, 45)
(200, 47)
(202, 48)
(205, 50)
(208, 52)
(209, 55)
(212, 57)
(214, 59)
(216, 62)
(217, 65)
(219, 67)
(221, 70)
(222, 73)
(224, 75)
(224, 78)
(224, 79)
(224, 81)
(224, 83)
(224, 86)
(224, 88)
(224, 90)
(224, 92)
(224, 94)
(223, 96)
(222, 99)
(220, 100)
(219, 102)
(216, 104)
(213, 107)
(209, 109)
(205, 111)
(201, 113)
(196, 114)
(191, 115)
(187, 115)
(182, 116)
(179, 116)
(174, 116)
(167, 116)
(163, 116)
(158, 116)
(153, 116)
(149, 116)
(145, 115)
(143, 114)
(139, 113)
(135, 111)
(132, 110)
(130, 108)
(128, 107)
(126, 106)
(125, 105)
(123, 103)
(122, 102)
(120, 100)
(118, 96)
(116, 92)
(115, 87)
(115, 83)
(114, 79)
(114, 76)
(114, 72)
(115, 68)
(116, 65)
(117, 63)
(119, 59)
(122, 55)
(125, 52)
(128, 48)
(134, 45)
(138, 42)
(145, 39)
(153, 37)
(162, 34)
(168, 34)
(181, 34)
(191, 34)
(200, 34)
(210, 35)
(218, 37)
(228, 41)
(237, 45)
(246, 50)
(253, 54)
(259, 59)
(265, 64)
(270, 69)
(276, 74)
(281, 80)
(283, 84)
(286, 91)
(288, 96)
(292, 103)
(293, 107)
(294, 112)
(294, 116)
(294, 121)
(294, 124)
(294, 127)
(294, 129)
(292, 132)
(291, 135)
(291, 137)
(289, 139)
(287, 142)
(284, 144)
(283, 145)
(280, 147)
(277, 149)
(276, 150)
(273, 151)
(269, 153)
(264, 153)
(259, 154)
(254, 154)
(249, 154)
(244, 154)
(237, 152)
(232, 151)
(228, 150)
(223, 147)
(216, 145)
(212, 142)
(206, 138)
(203, 135)
(200, 133)
(198, 130)
(195, 127)
(194, 123)
(192, 122)
(192, 118)
(191, 116)
(191, 112)
(191, 109)
(192, 106)
(193, 104)
(195, 101)
(196, 100)
(198, 98)
(200, 96)
(201, 95)
(203, 95)
(206, 94)
(208, 93)
(211, 93)
(214, 93)
(216, 93)
(219, 94)
(221, 94)
(223, 96)
(226, 99)
(229, 102)
(232, 107)
(237, 113)
(240, 119)
(242, 126)
(245, 131)
(247, 136)
(247, 140)
(247, 145)
(247, 150)
(247, 153)
(246, 157)
(243, 161)
(240, 165)
(237, 168)
(233, 171)
(227, 175)
(223, 177)
(214, 180)
(203, 182)
(190, 182)
(178, 182)
(166, 182)
(152, 181)
(139, 180)
(126, 177)
(113, 175)
(105, 171)
(96, 167)
(91, 165)
(87, 163)
(85, 163)
(85, 162)
(84, 161)
(84, 160)
(83, 158)
鼠标释放

布局管理

Box布局器

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='Box布局', size=(300, 120))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(parent=self)
        ## 创建垂直方向Box布局管理器对象
        vbox = wx.BoxSizer(wx.VERTICAL)
        self.statictext = wx.StaticText(parent=panel, label='Button1单击')
        ## 添加静态文本到Box布局管理器
        vbox.Add(self.statictext, proportion=2, flag=wx.FIXED_MINSIZE | wx.TOP | wx.CENTER, border=10)

        b1 = wx.Button(parent=panel, id=10, label='Button1')
        b2 = wx.Button(parent=panel, id=11, label='Button2')
        self.Bind(wx.EVT_BUTTON, self.on_click, id=10, id2=20)
        ## 创建水平方向的Box布局管理器对象
        hbox = wx.BoxSizer(wx.HORIZONTAL)
        ## 添加b1到水平Box布局管理
        hbox.Add(b1, 0, wx.EXPAND | wx.BOTTOM, 5)
        ## 添加b2到水平Box布局管理
        hbox.Add(b2, 0, wx.EXPAND | wx.BOTTOM, 5)

        ## 将水平Box布局管理器到垂直Box布局管理器
        vbox.Add(hbox, proportion=1, flag=wx.CENTER)

        panel.SetSizer(vbox)

    def on_click(self, event):
        event_id = event.GetId()
        print(event_id)
        if event_id == 10:
            self.statictext.SetLabelText('Button1单击')
        else:
            self.statictext.SetLabelText('Button2单击')


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环
10
11
10
11
10
11
10
11

StaticBox布局

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='StaticBox布局', size=(300, 120))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(parent=self)
        ## 创建垂直方向的Box布局管理器对象
        vbox = wx.BoxSizer(wx.VERTICAL)
        self.statictext = wx.StaticText(parent=panel, label='Button1单击')
        ## 添加静态文本到Box布局管理器
        vbox.Add(self.statictext, proportion=2, flag=wx.FIXED_MINSIZE | wx.TOP | wx.CENTER, border=10)

        b1 = wx.Button(parent=panel, id=10, label='Button1')
        b2 = wx.Button(parent=panel, id=11, label='Button2')
        self.Bind(wx.EVT_BUTTON, self.on_click, id=10, id2=20)

        ## 创建静态框对象
        sb = wx.StaticBox(panel, label="按钮框")
        ## 创建水平方向的StaticBox布局管理器
        hsbox = wx.StaticBoxSizer(sb, wx.HORIZONTAL)
        ## 添加b1到水平StaticBox布局管理
        hsbox.Add(b1, 0, wx.EXPAND | wx.BOTTOM, 5)
        ## 添加b2到水平StaticBox布局管理
        hsbox.Add(b2, 0, wx.EXPAND | wx.BOTTOM, 5)

        ## 添加hbox到vbox
        vbox.Add(hsbox, proportion=1, flag=wx.CENTER)

        panel.SetSizer(vbox)

    def on_click(self, event):
        event_id = event.GetId()
        print(event_id)
        if event_id == 10:
            self.statictext.SetLabelText('Button1单击')
        else:
            self.statictext.SetLabelText('Button2单击')


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环
11
10
11

Grid布局

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='Grid布局', size=(300, 300))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(self)
        btn1 = wx.Button(panel, label='1')
        btn2 = wx.Button(panel, label='2')
        btn3 = wx.Button(panel, label='3')
        btn4 = wx.Button(panel, label='4')
        btn5 = wx.Button(panel, label='5')
        btn6 = wx.Button(panel, label='6')
        btn7 = wx.Button(panel, label='7')
        btn8 = wx.Button(panel, label='8')
        btn9 = wx.Button(panel, label='9')

        grid = wx.GridSizer(cols=3, rows=3, vgap=0, hgap=0)

        ## grid.AddMany([
        ##     (btn1, 0, wx.EXPAND),
        ##     (btn2, 0, wx.EXPAND),
        ##     (btn3, 0, wx.EXPAND),
        ##     (btn4, 0, wx.EXPAND),
        ##     (btn5, 0, wx.EXPAND),
        ##     (btn6, 0, wx.EXPAND),
        ##     (btn7, 0, wx.EXPAND),
        ##     (btn8, 0, wx.EXPAND),
        ##     (btn9, 0, wx.EXPAND)
        ## ])

        grid.Add(btn1, 0, wx.EXPAND)
        grid.Add(btn2, 0, wx.EXPAND)
        grid.Add(btn3, 0, wx.EXPAND)
        grid.Add(btn4, 0, wx.EXPAND)
        grid.Add(btn5, 0, wx.EXPAND)
        grid.Add(btn6, 0, wx.EXPAND)
        grid.Add(btn7, 0, wx.EXPAND)
        grid.Add(btn8, 0, wx.EXPAND)
        grid.Add(btn9, 0, wx.EXPAND)

        panel.SetSizer(grid)


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

FlexGrid布局

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='FlexGrid布局', size=(400, 200))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(parent=self)

        fgs = wx.FlexGridSizer(3, 2, 10, 10)

        title = wx.StaticText(panel, label="标题:")
        author = wx.StaticText(panel, label="作者名:")
        review = wx.StaticText(panel, label="内容:")

        tc1 = wx.TextCtrl(panel)
        tc2 = wx.TextCtrl(panel)
        tc3 = wx.TextCtrl(panel, style=wx.TE_MULTILINE)

        fgs.AddMany([title, (tc1, 1, wx.EXPAND),
                     author, (tc2, 1, wx.EXPAND),
                     review, (tc3, 1, wx.EXPAND)])

        fgs.AddGrowableRow(0, 1)
        fgs.AddGrowableRow(1, 1)
        fgs.AddGrowableRow(2, 3)
        fgs.AddGrowableCol(0, 1)
        fgs.AddGrowableCol(1, 2)

        hbox = wx.BoxSizer(wx.HORIZONTAL)
        hbox.Add(fgs, proportion=1, flag=wx.ALL | wx.EXPAND, border=15)

        panel.SetSizer(hbox)


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

wxPython控件

静态文本和按钮

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='静态文本和按钮', size=(300, 200))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(parent=self)
        ## 创建垂直方向的Box布局管理器
        vbox = wx.BoxSizer(wx.VERTICAL)

        self.statictext = wx.StaticText(parent=panel, label='StaticText1', style=wx.ALIGN_CENTRE_HORIZONTAL)
        b1 = wx.Button(parent=panel, label='OK')
        self.Bind(wx.EVT_BUTTON, self.on_click, b1)

        b2 = wx.ToggleButton(panel, -1, 'ToggleButton')
        self.Bind(wx.EVT_BUTTON, self.on_click, b2)

        bmp = wx.Bitmap('icon/1.png', wx.BITMAP_TYPE_PNG)
        b3 = wx.BitmapButton(panel, -1, bmp)
        self.Bind(wx.EVT_BUTTON, self.on_click, b3)

        ## 添加静态文本和按钮到Box布局管理器
        vbox.Add(100, 10, proportion=1, flag=wx.CENTER | wx.FIXED_MINSIZE)
        vbox.Add(self.statictext, proportion=1, flag=wx.CENTER | wx.FIXED_MINSIZE)
        vbox.Add(b1, proportion=1, flag=wx.CENTER | wx.EXPAND)
        vbox.Add(b2, proportion=1, flag=wx.CENTER | wx.EXPAND)
        vbox.Add(b3, proportion=1, flag=wx.CENTER | wx.EXPAND)

        panel.SetSizer(vbox)

    def on_click(self, event):
        self.statictext.SetLabelText('Hello, world.')


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

文本输入控件

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='文本框', size=(400, 200))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(self)

        hbox = wx.BoxSizer(wx.HORIZONTAL)

        fgs = wx.FlexGridSizer(3, 2, 10, 10)

        userid = wx.StaticText(panel, label="用户ID:")
        pwd = wx.StaticText(panel, label="密码:")
        content = wx.StaticText(panel, label="多行文本:")

        tc1 = wx.TextCtrl(panel)
        tc2 = wx.TextCtrl(panel, style=wx.TE_PASSWORD)
        tc3 = wx.TextCtrl(panel, style=wx.TE_MULTILINE)

        ## 设置tc1初始值
        tc1.SetValue('tony')
        ## 获取tc1值
        print('读取用户ID:{0}'.format(tc1.GetValue()))

        fgs.AddMany([userid, (tc1, 1, wx.EXPAND),
                     pwd, (tc2, 1, wx.EXPAND),
                     content, (tc3, 1, wx.EXPAND)])
        fgs.AddGrowableRow(0, 1)
        fgs.AddGrowableRow(1, 1)
        fgs.AddGrowableRow(2, 3)
        fgs.AddGrowableCol(0, 1)
        fgs.AddGrowableCol(1, 2)
        hbox.Add(fgs, proportion=1, flag=wx.ALL | wx.EXPAND, border=15)
        panel.SetSizer(hbox)


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环
读取用户ID:tony

复选框和单选按钮

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='复选框和单选按钮', size=(400, 130))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(self)

        hbox1 = wx.BoxSizer(wx.HORIZONTAL)

        statictext = wx.StaticText(panel, label='选择你喜欢的编程语言:')
        cb1 = wx.CheckBox(panel, 1, 'Python')
        cb2 = wx.CheckBox(panel, 2, 'Java')
        cb2.SetValue(True)
        cb3 = wx.CheckBox(panel, 3, 'C++')
        self.Bind(wx.EVT_CHECKBOX, self.on_checkbox_click, id=1, id2=3)

        hbox1.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
        hbox1.Add(cb1, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
        hbox1.Add(cb2, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
        hbox1.Add(cb3, 1, flag=wx.ALL | wx.FIXED_MINSIZE)

        hbox2 = wx.BoxSizer(wx.HORIZONTAL)

        statictext = wx.StaticText(panel, label='选择性别:')
        radio1 = wx.RadioButton(panel, 4, '男', style=wx.RB_GROUP)
        radio2 = wx.RadioButton(panel, 5, '女')
        self.Bind(wx.EVT_RADIOBUTTON, self.on_radio1_click, id=4, id2=5)

        hbox2.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
        hbox2.Add(radio1, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
        hbox2.Add(radio2, 1, flag=wx.ALL | wx.FIXED_MINSIZE)

        hbox3 = wx.BoxSizer(wx.HORIZONTAL)

        statictext = wx.StaticText(panel, label='选择你最喜欢吃的水果:')
        radio3 = wx.RadioButton(panel, 6, '苹果', style=wx.RB_GROUP)
        radio4 = wx.RadioButton(panel, 7, '橘子')
        radio5 = wx.RadioButton(panel, 8, '香蕉')
        self.Bind(wx.EVT_RADIOBUTTON, self.on_radio2_click, id=6, id2=8)

        hbox3.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
        hbox3.Add(radio3, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
        hbox3.Add(radio4, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
        hbox3.Add(radio5, 1, flag=wx.ALL | wx.FIXED_MINSIZE)

        vbox = wx.BoxSizer(wx.VERTICAL)
        vbox.Add(hbox1, 1, flag=wx.ALL | wx.EXPAND, border=5)
        vbox.Add(hbox2, 1, flag=wx.ALL | wx.EXPAND, border=5)
        vbox.Add(hbox3, 1, flag=wx.ALL | wx.EXPAND, border=5)
        panel.SetSizer(vbox)

    def on_checkbox_click(self, event):
        cb = event.GetEventObject()
        print('选择 {0},状态{1}'.format(cb.GetLabel(), event.IsChecked()))

    def on_radio1_click(self, event):
        rb = event.GetEventObject()
        print('第一组 {0} 被选中'.format(rb.GetLabel()))

    def on_radio2_click(self, event):
        rb = event.GetEventObject()
        print('第二组 {0} 被选中'.format(rb.GetLabel()))

class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环
第二组 橘子 被选中
第二组 香蕉 被选中
第一组 女 被选中
选择 C++,状态True
选择 Python,状态True

下拉列表

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='下拉列表', size=(400, 130))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(self)

        hbox1 = wx.BoxSizer(wx.HORIZONTAL)

        statictext = wx.StaticText(panel, label='选择你喜欢的编程语言:')

        list1 = ['Python', 'C++', 'Java']
        ch1 = wx.ComboBox(panel, -1, value='C', choices=list1, style=wx.CB_SORT)
        self.Bind(wx.EVT_COMBOBOX, self.on_combobox, ch1)

        hbox1.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
        hbox1.Add(ch1, 1, flag=wx.ALL | wx.FIXED_MINSIZE)

        hbox2 = wx.BoxSizer(wx.HORIZONTAL)

        statictext = wx.StaticText(panel, label='选择性别:')
        list2 = ['男', '女']
        ch2 = wx.Choice(panel, -1, choices=list2)
        self.Bind(wx.EVT_CHOICE, self.on_choice, ch2)

        hbox2.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
        hbox2.Add(ch2, 1, flag=wx.ALL | wx.FIXED_MINSIZE)

        vbox = wx.BoxSizer(wx.VERTICAL)
        vbox.Add(hbox1, 1, flag=wx.ALL | wx.EXPAND, border=5)
        vbox.Add(hbox2, 1, flag=wx.ALL | wx.EXPAND, border=5)
        panel.SetSizer(vbox)

    def on_combobox(self, event):
        print('选择 {0}'.format(event.GetString()))

    def on_choice(self, event):
        print('选择 {0}'.format(event.GetString()))


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环
选择 Java

列表

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='下拉列表', size=(350, 180))
        self.Centre()  ## 设置窗口居中
        panel = wx.Panel(self)

        hbox1 = wx.BoxSizer(wx.HORIZONTAL)

        statictext = wx.StaticText(panel, label='选择你喜欢的编程语言:')

        list1 = ['Python', 'C++', 'Java']
        lb1 = wx.ListBox(panel, -1, choices=list1, style=wx.LB_SINGLE)
        self.Bind(wx.EVT_LISTBOX, self.on_listbox1, lb1)

        hbox1.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
        hbox1.Add(lb1, 1, flag=wx.ALL | wx.FIXED_MINSIZE)

        hbox2 = wx.BoxSizer(wx.HORIZONTAL)

        statictext = wx.StaticText(panel, label='选择你喜欢吃的水果:')
        list2 = ['苹果', '橘子', '香蕉']
        lb2 = wx.ListBox(panel, -1, choices=list2, style=wx.LB_EXTENDED)
        self.Bind(wx.EVT_LISTBOX, self.on_listbox2, lb2)

        hbox2.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
        hbox2.Add(lb2, 1, flag=wx.ALL | wx.FIXED_MINSIZE)

        vbox = wx.BoxSizer(wx.VERTICAL)
        vbox.Add(hbox1, 1, flag=wx.ALL | wx.EXPAND, border=5)
        vbox.Add(hbox2, 1, flag=wx.ALL | wx.EXPAND, border=5)
        panel.SetSizer(vbox)

    def on_listbox1(self, event):
        listbox = event.GetEventObject()
        print('选择 {0}'.format(listbox.GetSelection()))

    def on_listbox2(self, event):
        listbox = event.GetEventObject()
        print('选择 {0}'.format(listbox.GetSelections()))


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环
选择 1
选择 2

静态图片控件

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='静态图片控件', size=(300, 300))
        self.bmps = [wx.Bitmap('images/bird5.gif', wx.BITMAP_TYPE_GIF),
                     wx.Bitmap('images/bird4.gif', wx.BITMAP_TYPE_GIF),
                     wx.Bitmap('images/bird3.gif', wx.BITMAP_TYPE_GIF)]

        self.Centre()  ## 设置窗口居中
        self.panel = wx.Panel(parent=self)
        ## 创建垂直方向的Box布局管理器
        vbox = wx.BoxSizer(wx.VERTICAL)

        b1 = wx.Button(parent=self.panel, id=1, label='Button1')
        b2 = wx.Button(self.panel, id=2, label='Button2')
        self.Bind(wx.EVT_BUTTON, self.on_click, id=1, id2=2)

        self.image = wx.StaticBitmap(self.panel, -1, self.bmps[0])

        ## 添加标控件到Box布局管理器
        vbox.Add(b1, proportion=1, flag=wx.CENTER | wx.EXPAND)
        vbox.Add(b2, proportion=1, flag=wx.CENTER | wx.EXPAND)
        vbox.Add(self.image, proportion=3, flag=wx.CENTER)

        self.panel.SetSizer(vbox)

    def on_click(self, event):
        event_id = event.GetId()
        if event_id == 1:
            self.image.SetBitmap(self.bmps[1])
        else:
            self.image.SetBitmap(self.bmps[2])
        self.panel.Layout()


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

高级窗口

分隔窗口

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='分隔窗口', size=(350, 180))
        self.Centre()  ## 设置窗口居中

        splitter = wx.SplitterWindow(self, -1)
        leftpanel = wx.Panel(splitter)
        rightpanel = wx.Panel(splitter)
        splitter.SplitVertically(leftpanel, rightpanel, 100)
        splitter.SetMinimumPaneSize(80)

        list2 = ['苹果', '橘子', '香蕉']
        lb2 = wx.ListBox(leftpanel, -1, choices=list2, style=wx.LB_SINGLE)
        self.Bind(wx.EVT_LISTBOX, self.on_listbox, lb2)

        vbox1 = wx.BoxSizer(wx.VERTICAL)
        vbox1.Add(lb2, 1, flag=wx.ALL | wx.EXPAND, border=5)
        leftpanel.SetSizer(vbox1)

        vbox2 = wx.BoxSizer(wx.VERTICAL)
        self.content = wx.StaticText(rightpanel, label='右侧面板')
        vbox2.Add(self.content, 1, flag=wx.ALL | wx.EXPAND, border=5)
        rightpanel.SetSizer(vbox2)

    def on_listbox(self, event):
        s = '选择 {0}'.format(event.GetString())
        self.content.SetLabel(s)


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

使用树

import wx


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='树控件', size=(500, 400))
        self.Centre()  ## 设置窗口居中

        splitter = wx.SplitterWindow(self)
        leftpanel = wx.Panel(splitter)
        rightpanel = wx.Panel(splitter)
        splitter.SplitVertically(leftpanel, rightpanel, 200)
        splitter.SetMinimumPaneSize(80)

        self.tree = self.CreateTreeCtrl(leftpanel)
        self.Bind(wx.EVT_TREE_SEL_CHANGING, self.on_click, self.tree)
        vbox1 = wx.BoxSizer(wx.VERTICAL)
        vbox1.Add(self.tree, 1, flag=wx.ALL | wx.EXPAND, border=5)
        leftpanel.SetSizer(vbox1)

        vbox2 = wx.BoxSizer(wx.VERTICAL)
        self.content = wx.StaticText(rightpanel, label='右侧面板')
        vbox2.Add(self.content, 1, flag=wx.ALL | wx.EXPAND, border=5)
        rightpanel.SetSizer(vbox2)

    def on_click(self, event):
        item = event.GetItem()
        self.content.SetLabel(self.tree.GetItemText(item))

    def CreateTreeCtrl(self, parent):
        tree = wx.TreeCtrl(parent)

        items = []

        imglist = wx.ImageList(16, 16, True, 2)
        imglist.Add(wx.ArtProvider.GetBitmap(wx.ART_FOLDER, size=wx.Size(16, 16)))
        imglist.Add(wx.ArtProvider.GetBitmap(wx.ART_NORMAL_FILE, size=wx.Size(16, 16)))
        tree.AssignImageList(imglist)

        root = tree.AddRoot("TreeRoot", image=0)

        items.append(tree.AppendItem(root, "Item 1", 0))
        items.append(tree.AppendItem(root, "Item 2", 0))
        items.append(tree.AppendItem(root, "Item 3", 0))
        items.append(tree.AppendItem(root, "Item 4", 0))
        items.append(tree.AppendItem(root, "Item 5", 0))

        for ii in range(len(items)):
            id = items[ii]
            tree.AppendItem(id, "Subitem 1", 1)
            tree.AppendItem(id, "Subitem 2", 1)
            tree.AppendItem(id, "Subitem 3", 1)
            tree.AppendItem(id, "Subitem 4", 1)
            tree.AppendItem(id, "Subitem 5", 1)

        tree.Expand(root)  ## 展开根下子节点
        tree.Expand(items[0])  ## 展开Item 1下子节点
        tree.Expand(items[3])  ## 展开Item 4下子节点
        tree.SelectItem(root)  ## 选中根节点

        return tree


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

使用网络

import wx
import wx.grid

data = [['0036', '高等数学', '李放', '人民邮电出版社', '20000812', '1'],
        ['0004', 'FLASH精选', '刘扬', '中国纺织出版社', '19990312', '2'],
        ['0026', '软件工程', '牛田', '经济科学出版社', '20000328', '4'],
        ['0015', '人工智能', '周未', '机械工业出版社', '19991223', '3'],
        ['0037', '南方周末', '邓光明', '南方出版社', '20000923', '3'],
        ['0008', '新概念3', '余智', '外语出版社', '19990723', '2'],
        ['0019', '通讯与网络', '欧阳杰', '机械工业出版社', '20000517', '1'],
        ['0014', '期货分析', '孙宝', '飞鸟出版社', '19991122', '3'],
        ['0023', '经济概论', '思佳', '北京大学出版社', '20000819', '3'],
        ['0017', '计算机理论基础', '戴家', '机械工业出版社', '20000218', '4'],
        ['0002', '汇编语言', '李利光', '北京大学出版社', '19980318', '2'],
        ['0033', '模拟电路', '邓英才', '电子工业出版社', '20000527', '2'],
        ['0011', '南方旅游', '王爱国', '南方出版社', '19990930', '2'],
        ['0039', '黑幕', '李仪', '华光出版社', '20000508', '14'],
        ['0001', '软件工程', '戴国强', '机械工业出版社', '19980528', '2'],
        ['0034', '集邮爱好者', '李云', '人民邮电出版社', '20000630', '1'],
        ['0031', '软件工程', '戴志名', '电子工业出版社', '20000324', '3'],
        ['0030', '数据库及应用', '孙家萧', '清华大学出版社', '20000619', '1'],
        ['0024', '经济与科学', '毛波', '经济科学出版社', '20000923', '2'],
        ['0009', '军事要闻', '张强', '解放军出版社', '19990722', '3'],
        ['0003', '计算机基础', '王飞', '经济科学出版社', '19980218', '1'],
        ['0020', '现代操作系统', '王小国', '机械工业出版社', '20010128', '1'],
        ['0025', '计算机体系结构', '方丹', '机械工业出版社', '20000328', '4'],
        ['0010', '大众生活', '许阳', '电子出版社', '19990819', '3'],
        ['0021', '网络基础', '王大尉', '北京大学出版社', '20000617', '1'],
        ['0006', '世界杯', '柳飞', '世界出版社', '19990412', '2'],
        ['0028', '高级语言程序设计', '寇国华', '清华大学出版社', '20000117', '3'],
        ['0038', '十大旅游胜地', '潭晓明', '南方出版社', '20000403', '2'],
        ['0018', '编译原理', '郑键', '机械工业出版社', '20000415', '2'],
        ['0007', 'JAVA程序设计', '张余', '人民邮电出版社', '19990613', '1'],
        ['0013', '幽灵', '钱力华', '华光出版社', '19991008', '1'],
        ['0022', '万紫千红', '丛丽', '北京大学出版社', '20000702', '3'],
        ['0027', '世界语言大观', '候丙辉', '经济科学出版社', '20000814', '2'],
        ['0029', '操作系统概论', '聂元名', '清华大学出版社', '20001028', '1'],
        ['0016', '数据库系统概念', '吴红', '机械工业出版社', '20000328', '3'],
        ['0005', 'java基础', '王一', '电子工业出版社', '19990528', '3'],
        ['0032', 'SQL使用手册', '贺民', '电子工业出版社', '19990425', '2']]

column_names = ['书籍编号', '书籍名称', '作者', '出版社', '出版日期', '库存数量']


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='网格控件', size=(550, 500))
        self.Centre()  ## 设置窗口居中
        self.grid = self.CreateGrid(self)
        self.Bind(wx.grid.EVT_GRID_LABEL_LEFT_CLICK, self.OnLabelLeftClick)

    def OnLabelLeftClick(self, event):
        print("RowIdx:{0}".format(event.GetRow()))
        print("ColIdx:{0}".format(event.GetCol()))
        print(data[event.GetRow()])
        event.Skip()

    def CreateGrid(self, parent):
        grid = wx.grid.Grid(parent)
        grid.CreateGrid(len(data), len(data[0]))

        for row in range(len(data)):
            for col in range(len(data[row])):
                grid.SetColLabelValue(col, column_names[col])
                grid.SetCellValue(row, col, data[row][col])
        ## 设置行和列自定调整
        grid.AutoSize()

        return grid


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环
import wx
import wx.grid

data = [['0036', '高等数学', '李放', '人民邮电出版社', '20000812', '1'],
        ['0004', 'FLASH精选', '刘扬', '中国纺织出版社', '19990312', '2'],
        ['0026', '软件工程', '牛田', '经济科学出版社', '20000328', '4'],
        ['0015', '人工智能', '周未', '机械工业出版社', '19991223', '3'],
        ['0037', '南方周末', '邓光明', '南方出版社', '20000923', '3'],
        ['0008', '新概念3', '余智', '外语出版社', '19990723', '2'],
        ['0019', '通讯与网络', '欧阳杰', '机械工业出版社', '20000517', '1'],
        ['0014', '期货分析', '孙宝', '飞鸟出版社', '19991122', '3'],
        ['0023', '经济概论', '思佳', '北京大学出版社', '20000819', '3'],
        ['0017', '计算机理论基础', '戴家', '机械工业出版社', '20000218', '4'],
        ['0002', '汇编语言', '李利光', '北京大学出版社', '19980318', '2'],
        ['0033', '模拟电路', '邓英才', '电子工业出版社', '20000527', '2'],
        ['0011', '南方旅游', '王爱国', '南方出版社', '19990930', '2'],
        ['0039', '黑幕', '李仪', '华光出版社', '20000508', '14'],
        ['0001', '软件工程', '戴国强', '机械工业出版社', '19980528', '2'],
        ['0034', '集邮爱好者', '李云', '人民邮电出版社', '20000630', '1'],
        ['0031', '软件工程', '戴志名', '电子工业出版社', '20000324', '3'],
        ['0030', '数据库及应用', '孙家萧', '清华大学出版社', '20000619', '1'],
        ['0024', '经济与科学', '毛波', '经济科学出版社', '20000923', '2'],
        ['0009', '军事要闻', '张强', '解放军出版社', '19990722', '3'],
        ['0003', '计算机基础', '王飞', '经济科学出版社', '19980218', '1'],
        ['0020', '现代操作系统', '王小国', '机械工业出版社', '20010128', '1'],
        ['0025', '计算机体系结构', '方丹', '机械工业出版社', '20000328', '4'],
        ['0010', '大众生活', '许阳', '电子出版社', '19990819', '3'],
        ['0021', '网络基础', '王大尉', '北京大学出版社', '20000617', '1'],
        ['0006', '世界杯', '柳飞', '世界出版社', '19990412', '2'],
        ['0028', '高级语言程序设计', '寇国华', '清华大学出版社', '20000117', '3'],
        ['0038', '十大旅游胜地', '潭晓明', '南方出版社', '20000403', '2'],
        ['0018', '编译原理', '郑键', '机械工业出版社', '20000415', '2'],
        ['0007', 'JAVA程序设计', '张余', '人民邮电出版社', '19990613', '1'],
        ['0013', '幽灵', '钱力华', '华光出版社', '19991008', '1'],
        ['0022', '万紫千红', '丛丽', '北京大学出版社', '20000702', '3'],
        ['0027', '世界语言大观', '候丙辉', '经济科学出版社', '20000814', '2'],
        ['0029', '操作系统概论', '聂元名', '清华大学出版社', '20001028', '1'],
        ['0016', '数据库系统概念', '吴红', '机械工业出版社', '20000328', '3'],
        ['0005', 'java基础', '王一', '电子工业出版社', '19990528', '3'],
        ['0032', 'SQL使用手册', '贺民', '电子工业出版社', '19990425', '2']]

column_names = ['书籍编号', '书籍名称书籍名称', '作者', '出版社', '出版日期', '库存数量']


class MyGridTable(wx.grid.GridTableBase):
    def __init__(self):
        super().__init__()
        self.colLabels = column_names

    def GetNumberRows(self):
        return len(data)

    def GetNumberCols(self):
        return len(data[0])

    def GetValue(self, row, col):
        return data[row][col]

    def GetColLabelValue(self, col):
        return self.colLabels[col]


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='网格控件', size=(550, 500))
        self.Centre()  ## 设置窗口居中
        self.grid = self.CreateGrid(self)
        self.Bind(wx.grid.EVT_GRID_LABEL_LEFT_CLICK, self.OnLabelLeftClick)

    def OnLabelLeftClick(self, event):
        print("RowIdx:{0}".format(event.GetRow()))
        print("ColIdx:{0}".format(event.GetCol()))
        print(data[event.GetRow()])
        event.Skip()

    def CreateGrid(self, parent):
        grid = wx.grid.Grid(parent)
        tablebase = MyGridTable()
        grid.SetTable(tablebase, True)
        ## 设置行和列自定调整
        grid.AutoSize()

        return grid


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环

使用菜单

import wx
import wx.grid


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='使用菜单', size=(550, 500))
        self.Centre()  ## 设置窗口居中
        
        self.text = wx.TextCtrl(self, -1, style=wx.EXPAND | wx.TE_MULTILINE)
        vbox = wx.BoxSizer(wx.VERTICAL)
        vbox.Add(self.text, proportion=1, flag=wx.EXPAND | wx.ALL, border=1)
        self.SetSizer(vbox)

        menubar = wx.MenuBar()

        file_menu = wx.Menu()
        new_item = wx.MenuItem(file_menu, wx.ID_NEW, text="新建", kind=wx.ITEM_NORMAL)
        self.Bind(wx.EVT_MENU, self.on_newitem_click, id=wx.ID_NEW)
        file_menu.Append(new_item)
        file_menu.AppendSeparator()

        edit_menu = wx.Menu()
        copy_item = wx.MenuItem(edit_menu, 100, text="复制", kind=wx.ITEM_NORMAL)
        edit_menu.Append(copy_item)

        cut_item = wx.MenuItem(edit_menu, 101, text="剪切", kind=wx.ITEM_NORMAL)
        edit_menu.Append(cut_item)

        paste_item = wx.MenuItem(edit_menu, 102, text="粘贴", kind=wx.ITEM_NORMAL)
        edit_menu.Append(paste_item)

        self.Bind(wx.EVT_MENU, self.on_editmenu_click, id=100, id2=102)

        file_menu.Append(wx.ID_ANY, "编辑", edit_menu)

        menubar.Append(file_menu, '文件')
        self.SetMenuBar(menubar)

    def on_newitem_click(self, event):
        self.text.SetLabel('单击【新建】菜单')

    def on_editmenu_click(self, event):
        event_id = event.GetId()
        if event_id == 100:
            self.text.SetLabel('单击【复制】菜单')
        elif event_id == 101:
            self.text.SetLabel('单击【剪切】菜单')
        else:
            self.text.SetLabel('单击【粘贴】菜单')


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环
C:\Users\HP\AppData\Local\Temp\ipykernel_21396\3458874188.py:36: DeprecationWarning: Menu.Append() is deprecated
  file_menu.Append(wx.ID_ANY, "编辑", edit_menu)

使用工具栏

import wx
import wx.grid


## 自定义窗口类MyFrame
class MyFrame(wx.Frame):
    def __init__(self):
        super().__init__(parent=None, title='使用工具栏', size=(550, 500))
        self.Centre()  ## 设置窗口居中
        self.Show(True)

        self.text = wx.TextCtrl(self, -1, style=wx.EXPAND | wx.TE_MULTILINE)
        vbox = wx.BoxSizer(wx.VERTICAL)
        vbox.Add(self.text, proportion=1, flag=wx.EXPAND | wx.ALL, border=1)
        self.SetSizer(vbox)

        menubar = wx.MenuBar()

        file_menu = wx.Menu()
        new_item = wx.MenuItem(file_menu, wx.ID_NEW, text="新建", kind=wx.ITEM_NORMAL)
        file_menu.Append(new_item)
        file_menu.AppendSeparator()

        edit_menu = wx.Menu()
        copy_item = wx.MenuItem(edit_menu, 100, text="复制", kind=wx.ITEM_NORMAL)
        edit_menu.Append(copy_item)

        cut_item = wx.MenuItem(edit_menu, 101, text="剪切", kind=wx.ITEM_NORMAL)
        edit_menu.Append(cut_item)

        paste_item = wx.MenuItem(edit_menu, 102, text="粘贴", kind=wx.ITEM_NORMAL)
        edit_menu.Append(paste_item)

        file_menu.Append(wx.ID_ANY, "编辑", edit_menu)

        menubar.Append(file_menu, '文件')
        self.SetMenuBar(menubar)

        tb = wx.ToolBar(self, wx.ID_ANY)
        self.ToolBar = tb
        tsize = (24, 24)
        new_bmp = wx.ArtProvider.GetBitmap(wx.ART_NEW, wx.ART_TOOLBAR, tsize)
        open_bmp = wx.ArtProvider.GetBitmap(wx.ART_FILE_OPEN, wx.ART_TOOLBAR, tsize)
        copy_bmp = wx.ArtProvider.GetBitmap(wx.ART_COPY, wx.ART_TOOLBAR, tsize)
        paste_bmp = wx.ArtProvider.GetBitmap(wx.ART_PASTE, wx.ART_TOOLBAR, tsize)

        tb.AddTool(10, "New", new_bmp, kind=wx.ITEM_NORMAL, shortHelp="New")
        tb.AddTool(20, "Open", open_bmp, kind=wx.ITEM_NORMAL, shortHelp="Open")
        tb.AddSeparator()
        tb.AddTool(30, "Copy", copy_bmp, kind=wx.ITEM_NORMAL, shortHelp="Copy")
        tb.AddTool(40, "Paste", paste_bmp, kind=wx.ITEM_NORMAL, shortHelp="Paste")
        tb.AddSeparator()

        tb.AddTool(201, "back", wx.Bitmap("menu_icon/back.png"), kind=wx.ITEM_NORMAL, shortHelp="Back")
        tb.AddTool(202, "forward", wx.Bitmap("menu_icon/forward.png"), kind=wx.ITEM_NORMAL, shortHelp="Forward")
        self.Bind(wx.EVT_MENU, self.on_click, id=201, id2=202)
        tb.AddSeparator()

        tb.Realize()

    def on_click(self, event):
        event_id = event.GetId()
        if event_id == 201:
            self.text.SetLabel('单击【Back】按钮')
        else:
            self.text.SetLabel('单击【Forward】按钮')


class App(wx.App):

    def OnInit(self):
        ## 创建窗口对象
        frame = MyFrame()
        frame.Show()
        return True


if __name__ == '__main__':
    app = App()
    app.MainLoop()  ## 进入主事件循环
C:\Users\HP\AppData\Local\Temp\ipykernel_24844\2637029235.py:34: DeprecationWarning: Menu.Append() is deprecated
  file_menu.Append(wx.ID_ANY, "编辑", edit_menu)

项目实战1:网络爬虫余爬取股票数据

网络爬虫基数概述

网络通信技术

多线程技术

数据交换技术

web前端技术

数据存储技术

爬取数据

网页中静态和动态数据

使用urllib爬取数据

  1. 获得静态数据
import urllib.request


url = "file:///C:/Users/HP/nasdaq-Apple1.html"
req = urllib.request.Request(url)

with urllib.request.urlopen(req) as response:
    data = response.read()
    htmlstr = data.decode()
    print(htmlstr)
<!doctype html>
<html lang="en">
<head>
    <meta charset="UTF-8">
    <meta name="Generator" content="EditPlus®">
    <meta name="Author" content="">
    <meta name="Keywords" content="">
    <meta name="Description" content="">
    <title>Document</title>
</head>
<body>
<div id="quotes_content_left_pnlAJAX">
    <table class="historical-data__table">
        <thead class="historical-data__table-headings">
        <tr class="historical-data__row historical-data__row--headings">
            <th class="historical-data__table-heading" scope="col">Date</th>
            <th class="historical-data__table-heading" scope="col">Open</th>
            <th class="historical-data__table-heading" scope="col">High</th>
            <th class="historical-data__table-heading" scope="col">Low</th>
            <th class="historical-data__table-heading" scope="col">Close/Last</th>
            <th class="historical-data__table-heading" scope="col">Volume</th>
        </tr>
        </thead>
        <tbody class="historical-data__table-body">
        <tr class="historical-data__row">
            <th>10/04/2019</th>
            <td>225.64</td>
            <td>227.49</td>
            <td>223.89</td>
            <td>227.01</td>
            <td>34,755,550</td>
        </tr>
        <tr class="historical-data__row">
            <th>10/03/2019</th>
            <td>218.43</td>
            <td>220.96</td>
            <td>215.132</td>
            <td>220.82</td>
            <td>30,352,690</td>
        </tr>
        <tr class="historical-data__row">
            <th>10/02/2019</th>
            <td>223.06</td>
            <td>223.58</td>
            <td>217.93</td>
            <td>218.96</td>
            <td>35,767,260</td>
        </tr>
        <tr class="historical-data__row">
            <th>10/01/2019</th>
            <td>225.07</td>
            <td>228.22</td>
            <td>224.2</td>
            <td>224.59</td>
            <td>36,187,160</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/30/2019</th>
            <td>220.9</td>
            <td>224.58</td>
            <td>220.79</td>
            <td>223.97</td>
            <td>26,318,580</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/27/2019</th>
            <td>220.54</td>
            <td>220.96</td>
            <td>217.2814</td>
            <td>218.82</td>
            <td>25,361,290</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/26/2019</th>
            <td>220</td>
            <td>220.94</td>
            <td>218.83</td>
            <td>219.89</td>
            <td>19,088,310</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/25/2019</th>
            <td>218.55</td>
            <td>221.5</td>
            <td>217.1402</td>
            <td>221.03</td>
            <td>22,481,010</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/24/2019</th>
            <td>221.03</td>
            <td>222.49</td>
            <td>217.19</td>
            <td>217.68</td>
            <td>31,434,370</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/23/2019</th>
            <td>218.95</td>
            <td>219.84</td>
            <td>217.65</td>
            <td>218.72</td>
            <td>19,419,650</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/20/2019</th>
            <td>221.38</td>
            <td>222.56</td>
            <td>217.473</td>
            <td>217.73</td>
            <td>57,977,090</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/19/2019</th>
            <td>222.01</td>
            <td>223.76</td>
            <td>220.37</td>
            <td>220.96</td>
            <td>22,187,880</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/18/2019</th>
            <td>221.06</td>
            <td>222.85</td>
            <td>219.44</td>
            <td>222.77</td>
            <td>25,643,090</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/17/2019</th>
            <td>219.96</td>
            <td>220.82</td>
            <td>219.12</td>
            <td>220.7</td>
            <td>18,386,470</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/16/2019</th>
            <td>217.73</td>
            <td>220.13</td>
            <td>217.56</td>
            <td>219.9</td>
            <td>21,158,140</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/13/2019</th>
            <td>220</td>
            <td>220.79</td>
            <td>217.02</td>
            <td>218.75</td>
            <td>39,763,300</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/12/2019</th>
            <td>224.8</td>
            <td>226.42</td>
            <td>222.86</td>
            <td>223.085</td>
            <td>32,226,670</td>
        </tr>
        <tr class="historical-data__row">
            <th>09/11/2019</th>
            <td>218.07</td>
            <td>223.71</td>
            <td>217.73</td>
            <td>223.59</td>
            <td>44,289,650</td>
        </tr>
        </tbody>
    </table>
</div>
</body>
</html>
  1. 获得动态数据
import re
import urllib.request

url = 'http://q.stock.sohu.com/hisHq?code=cn_600519&stat=1&order=D&period=d&callback=historySearchHandler&rt=jsonp&0.8115656498417958'
req = urllib.request.Request(url)

with urllib.request.urlopen(req) as response:
    data = response.read()
    htmlstr = data.decode('gbk')
    print(htmlstr)
    htmlstr = htmlstr.replace('historySearchHandler(', '')
    htmlstr = htmlstr.replace(')', '')
    print('替换后的:', htmlstr)
historySearchHandler([{"status":0,"hq":[["2023-04-18","1753.00","1758.00","5.00","0.29%","1746.02","1769.00","18314","322010.75","0.15%"],["2023-04-17","1740.00","1753.00","39.58","2.31%","1728.00","1753.00","30467","530340.12","0.24%"],["2023-04-14","1726.00","1713.42","-9.58","-0.56%","1704.80","1733.00","21232","364652.69","0.17%"],["2023-04-13","1690.00","1723.00","28.90","1.71%","1684.01","1723.59","29543","504931.03","0.24%"],["2023-04-12","1747.26","1694.10","-51.40","-2.94%","1692.82","1750.00","51105","873265.75","0.41%"],["2023-04-11","1793.00","1745.50","-26.20","-1.48%","1744.00","1793.00","29209","513885.44","0.23%"],["2023-04-10","1790.88","1771.70","-19.29","-1.08%","1744.00","1790.88","29418","517115.03","0.23%"],["2023-04-07","1795.00","1790.99","-5.97","-0.33%","1788.34","1806.01","13525","242816.05","0.11%"],["2023-04-06","1805.00","1796.96","-17.63","-0.97%","1788.22","1815.90","14874","267625.19","0.12%"],["2023-04-04","1812.00","1814.59","12.52","0.69%","1787.00","1815.17","20066","361427.53","0.16%"],["2023-04-03","1825.00","1802.07","-17.93","-0.99%","1800.08","1827.77","21417","387581.16","0.17%"],["2023-03-31","1825.00","1820.00","20.00","1.11%","1819.00","1848.00","27446","502479.06","0.22%"],["2023-03-30","1793.00","1800.00","10.00","0.56%","1779.00","1805.00","19257","345357.31","0.15%"],["2023-03-29","1799.00","1790.00","8.20","0.46%","1785.07","1800.00","15393","276190.94","0.12%"],["2023-03-28","1770.00","1781.80","14.01","0.79%","1765.02","1790.00","17261","307311.31","0.14%"],["2023-03-27","1778.60","1767.79","-10.83","-0.61%","1756.00","1778.60","15296","270075.59","0.12%"],["2023-03-24","1769.08","1778.62","3.76","0.21%","1766.00","1783.60","12770","226964.92","0.10%"],["2023-03-23","1766.00","1774.86","1.51","0.09%","1765.01","1791.11","17356","308282.16","0.14%"],["2023-03-22","1780.00","1773.35","-1.65","-0.09%","1765.55","1793.00","15330","272764.88","0.12%"],["2023-03-21","1735.00","1775.00","45.40","2.62%","1723.97","1785.85","31142","549105.19","0.25%"],["2023-03-20","1751.00","1729.60","-12.40","-0.71%","1728.00","1755.00","20491","355787.22","0.16%"],["2023-03-17","1770.00","1742.00","-9.99","-0.57%","1736.00","1775.89","27023","474424.94","0.22%"],["2023-03-16","1740.00","1751.99","1.07","0.06%","1739.01","1770.00","17646","309679.09","0.14%"],["2023-03-15","1778.37","1750.92","-15.08","-0.85%","1750.12","1784.88","19213","339269.84","0.15%"],["2023-03-14","1763.78","1766.00","4.00","0.23%","1738.50","1779.88","23705","417728.91","0.19%"],["2023-03-13","1751.00","1762.00","12.00","0.69%","1749.00","1775.00","20560","362647.62","0.16%"],["2023-03-10","1751.57","1750.00","-20.02","-1.13%","1750.00","1781.00","21161","372513.91","0.17%"],["2023-03-09","1768.00","1770.02","-0.40","-0.02%","1740.00","1785.00","27612","488144.28","0.22%"],["2023-03-08","1780.02","1770.42","-17.88","-1.00%","1761.12","1785.94","22764","403578.72","0.18%"],["2023-03-07","1805.98","1788.30","-18.84","-1.04%","1788.00","1816.60","22785","410130.25","0.18%"],["2023-03-06","1818.18","1807.14","-10.90","-0.60%","1796.77","1818.50","20646","373007.94","0.16%"],["2023-03-03","1839.77","1818.04","-9.96","-0.54%","1802.48","1841.61","16198","294684.25","0.13%"],["2023-03-02","1829.00","1828.00","-10.53","-0.57%","1821.10","1838.99","13144","240529.23","0.10%"],["2023-03-01","1813.00","1838.53","24.79","1.37%","1803.23","1848.00","24458","447559.22","0.19%"],["2023-02-28","1819.00","1813.74","3.33","0.18%","1783.30","1822.01","23952","431487.69","0.19%"],["2023-02-27","1778.50","1810.41","22.41","1.25%","1775.02","1815.00","22065","397812.88","0.18%"],["2023-02-24","1810.11","1788.00","-30.00","-1.65%","1782.18","1810.19","24635","441562.16","0.20%"],["2023-02-23","1840.00","1818.00","-18.00","-0.98%","1805.25","1848.80","21881","398399.12","0.17%"],["2023-02-22","1855.01","1836.00","-31.00","-1.66%","1831.80","1863.90","21869","403101.59","0.17%"],["2023-02-21","1874.00","1867.00","-8.00","-0.43%","1851.00","1874.00","18751","349163.34","0.15%"],["2023-02-20","1821.00","1875.00","54.22","2.98%","1817.20","1878.80","29669","548880.00","0.24%"],["2023-02-17","1850.16","1820.78","-41.04","-2.20%","1820.05","1873.00","26443","488032.88","0.21%"],["2023-02-16","1841.34","1861.82","20.82","1.13%","1828.00","1887.00","33246","619691.50","0.26%"],["2023-02-15","1843.78","1841.00","-2.79","-0.15%","1835.81","1855.30","18177","335142.22","0.14%"],["2023-02-14","1856.46","1843.79","-12.56","-0.68%","1835.00","1857.40","19566","360176.94","0.16%"],["2023-02-13","1810.00","1856.35","46.35","2.56%","1810.00","1874.50","38147","705838.25","0.30%"],["2023-02-10","1810.10","1810.00","-8.00","-0.44%","1801.05","1818.49","17985","325385.94","0.14%"],["2023-02-09","1778.00","1818.00","34.00","1.91%","1775.01","1829.75","29754","540139.94","0.24%"],["2023-02-08","1800.01","1784.00","-13.00","-0.72%","1775.00","1805.97","16676","298057.47","0.13%"],["2023-02-07","1808.08","1797.00","2.00","0.11%","1787.73","1808.80","24322","437367.19","0.19%"],["2023-02-06","1780.00","1795.00","-23.00","-1.27%","1760.00","1795.00","42661","759573.94","0.34%"],["2023-02-03","1820.00","1818.00","-18.11","-0.99%","1795.68","1826.00","34945","632463.50","0.28%"],["2023-02-02","1848.38","1836.11","-8.86","-0.48%","1826.00","1859.00","29759","546550.94","0.24%"],["2023-02-01","1854.98","1844.97","-0.79","-0.04%","1811.40","1859.00","33974","624467.94","0.27%"],["2023-01-31","1896.50","1845.76","-42.24","-2.24%","1833.07","1899.95","32991","612831.12","0.26%"],["2023-01-30","1909.00","1888.00","27.99","1.50%","1880.00","1909.00","35923","679975.69","0.29%"],["2023-01-20","1889.19","1860.01","-20.20","-1.07%","1858.00","1898.25","25609","480735.59","0.20%"],["2023-01-19","1892.50","1880.21","-12.79","-0.68%","1866.00","1892.52","23439","440199.44","0.19%"],["2023-01-18","1914.00","1893.00","-15.00","-0.79%","1890.00","1925.30","21063","400866.53","0.17%"],["2023-01-17","1913.16","1908.00","-4.90","-0.26%","1895.00","1923.00","21299","406832.16","0.17%"],["2023-01-16","1886.00","1912.90","25.90","1.37%","1881.00","1935.00","36848","705998.31","0.29%"],["2023-01-13","1844.18","1887.00","53.00","2.89%","1840.00","1888.00","31940","596987.62","0.25%"],["2023-01-12","1848.00","1834.00","-10.95","-0.59%","1833.00","1856.00","17193","316263.72","0.14%"],["2023-01-11","1856.00","1844.95","-9.50","-0.51%","1836.84","1860.00","22720","420148.78","0.18%"],["2023-01-10","1839.06","1854.45","13.25","0.72%","1830.50","1864.50","22732","420478.38","0.18%"],["2023-01-09","1835.00","1841.20","37.43","2.08%","1807.82","1849.98","30977","568418.12","0.25%"],["2023-01-06","1806.12","1803.77","2.77","0.15%","1787.00","1811.90","24904","448083.88","0.20%"],["2023-01-05","1737.00","1801.00","75.99","4.41%","1733.00","1801.00","47943","854158.69","0.38%"],["2023-01-04","1730.00","1725.01","-5.00","-0.29%","1716.00","1738.70","20416","352358.22","0.16%"],["2023-01-03","1731.20","1730.01","3.01","0.17%","1706.01","1738.43","26034","448776.03","0.21%"],["2022-12-30","1736.00","1727.00","8.00","0.47%","1727.00","1752.99","25333","440954.41","0.20%"],["2022-12-29","1717.00","1719.00","-14.00","-0.81%","1701.05","1726.99","22418","384449.97","0.18%"],["2022-12-28","1745.88","1733.00","0.00","0.00%","1708.01","1747.00","21438","369994.91","0.17%"],["2022-12-27","1738.00","1733.00","12.85","0.75%","1725.50","1747.15","17905","310927.03","0.14%"],["2022-12-26","1771.00","1742.06","-28.94","-1.63%","1735.02","1771.00","21384","374912.09","0.17%"],["2022-12-23","1752.40","1771.00","3.00","0.17%","1745.00","1782.00","17319","306360.84","0.14%"],["2022-12-22","1756.70","1768.00","29.00","1.67%","1745.00","1783.00","23175","409386.16","0.18%"],["2022-12-21","1724.00","1739.00","24.00","1.40%","1717.65","1739.00","22816","394892.62","0.18%"],["2022-12-20","1765.33","1715.00","-58.00","-3.27%","1682.45","1765.33","46198","794412.06","0.37%"],["2022-12-19","1798.80","1773.00","-13.87","-0.78%","1760.17","1798.80","24987","444723.66","0.20%"]],"code":"cn_600519","stat":["累计:","2022-12-19至2023-04-18","-28.87","-1.62%",1682.45,1935,1961308,35261288.98,"15.59%"]}])

替换后的: [{"status":0,"hq":[["2023-04-18","1753.00","1758.00","5.00","0.29%","1746.02","1769.00","18314","322010.75","0.15%"],["2023-04-17","1740.00","1753.00","39.58","2.31%","1728.00","1753.00","30467","530340.12","0.24%"],["2023-04-14","1726.00","1713.42","-9.58","-0.56%","1704.80","1733.00","21232","364652.69","0.17%"],["2023-04-13","1690.00","1723.00","28.90","1.71%","1684.01","1723.59","29543","504931.03","0.24%"],["2023-04-12","1747.26","1694.10","-51.40","-2.94%","1692.82","1750.00","51105","873265.75","0.41%"],["2023-04-11","1793.00","1745.50","-26.20","-1.48%","1744.00","1793.00","29209","513885.44","0.23%"],["2023-04-10","1790.88","1771.70","-19.29","-1.08%","1744.00","1790.88","29418","517115.03","0.23%"],["2023-04-07","1795.00","1790.99","-5.97","-0.33%","1788.34","1806.01","13525","242816.05","0.11%"],["2023-04-06","1805.00","1796.96","-17.63","-0.97%","1788.22","1815.90","14874","267625.19","0.12%"],["2023-04-04","1812.00","1814.59","12.52","0.69%","1787.00","1815.17","20066","361427.53","0.16%"],["2023-04-03","1825.00","1802.07","-17.93","-0.99%","1800.08","1827.77","21417","387581.16","0.17%"],["2023-03-31","1825.00","1820.00","20.00","1.11%","1819.00","1848.00","27446","502479.06","0.22%"],["2023-03-30","1793.00","1800.00","10.00","0.56%","1779.00","1805.00","19257","345357.31","0.15%"],["2023-03-29","1799.00","1790.00","8.20","0.46%","1785.07","1800.00","15393","276190.94","0.12%"],["2023-03-28","1770.00","1781.80","14.01","0.79%","1765.02","1790.00","17261","307311.31","0.14%"],["2023-03-27","1778.60","1767.79","-10.83","-0.61%","1756.00","1778.60","15296","270075.59","0.12%"],["2023-03-24","1769.08","1778.62","3.76","0.21%","1766.00","1783.60","12770","226964.92","0.10%"],["2023-03-23","1766.00","1774.86","1.51","0.09%","1765.01","1791.11","17356","308282.16","0.14%"],["2023-03-22","1780.00","1773.35","-1.65","-0.09%","1765.55","1793.00","15330","272764.88","0.12%"],["2023-03-21","1735.00","1775.00","45.40","2.62%","1723.97","1785.85","31142","549105.19","0.25%"],["2023-03-20","1751.00","1729.60","-12.40","-0.71%","1728.00","1755.00","20491","355787.22","0.16%"],["2023-03-17","1770.00","1742.00","-9.99","-0.57%","1736.00","1775.89","27023","474424.94","0.22%"],["2023-03-16","1740.00","1751.99","1.07","0.06%","1739.01","1770.00","17646","309679.09","0.14%"],["2023-03-15","1778.37","1750.92","-15.08","-0.85%","1750.12","1784.88","19213","339269.84","0.15%"],["2023-03-14","1763.78","1766.00","4.00","0.23%","1738.50","1779.88","23705","417728.91","0.19%"],["2023-03-13","1751.00","1762.00","12.00","0.69%","1749.00","1775.00","20560","362647.62","0.16%"],["2023-03-10","1751.57","1750.00","-20.02","-1.13%","1750.00","1781.00","21161","372513.91","0.17%"],["2023-03-09","1768.00","1770.02","-0.40","-0.02%","1740.00","1785.00","27612","488144.28","0.22%"],["2023-03-08","1780.02","1770.42","-17.88","-1.00%","1761.12","1785.94","22764","403578.72","0.18%"],["2023-03-07","1805.98","1788.30","-18.84","-1.04%","1788.00","1816.60","22785","410130.25","0.18%"],["2023-03-06","1818.18","1807.14","-10.90","-0.60%","1796.77","1818.50","20646","373007.94","0.16%"],["2023-03-03","1839.77","1818.04","-9.96","-0.54%","1802.48","1841.61","16198","294684.25","0.13%"],["2023-03-02","1829.00","1828.00","-10.53","-0.57%","1821.10","1838.99","13144","240529.23","0.10%"],["2023-03-01","1813.00","1838.53","24.79","1.37%","1803.23","1848.00","24458","447559.22","0.19%"],["2023-02-28","1819.00","1813.74","3.33","0.18%","1783.30","1822.01","23952","431487.69","0.19%"],["2023-02-27","1778.50","1810.41","22.41","1.25%","1775.02","1815.00","22065","397812.88","0.18%"],["2023-02-24","1810.11","1788.00","-30.00","-1.65%","1782.18","1810.19","24635","441562.16","0.20%"],["2023-02-23","1840.00","1818.00","-18.00","-0.98%","1805.25","1848.80","21881","398399.12","0.17%"],["2023-02-22","1855.01","1836.00","-31.00","-1.66%","1831.80","1863.90","21869","403101.59","0.17%"],["2023-02-21","1874.00","1867.00","-8.00","-0.43%","1851.00","1874.00","18751","349163.34","0.15%"],["2023-02-20","1821.00","1875.00","54.22","2.98%","1817.20","1878.80","29669","548880.00","0.24%"],["2023-02-17","1850.16","1820.78","-41.04","-2.20%","1820.05","1873.00","26443","488032.88","0.21%"],["2023-02-16","1841.34","1861.82","20.82","1.13%","1828.00","1887.00","33246","619691.50","0.26%"],["2023-02-15","1843.78","1841.00","-2.79","-0.15%","1835.81","1855.30","18177","335142.22","0.14%"],["2023-02-14","1856.46","1843.79","-12.56","-0.68%","1835.00","1857.40","19566","360176.94","0.16%"],["2023-02-13","1810.00","1856.35","46.35","2.56%","1810.00","1874.50","38147","705838.25","0.30%"],["2023-02-10","1810.10","1810.00","-8.00","-0.44%","1801.05","1818.49","17985","325385.94","0.14%"],["2023-02-09","1778.00","1818.00","34.00","1.91%","1775.01","1829.75","29754","540139.94","0.24%"],["2023-02-08","1800.01","1784.00","-13.00","-0.72%","1775.00","1805.97","16676","298057.47","0.13%"],["2023-02-07","1808.08","1797.00","2.00","0.11%","1787.73","1808.80","24322","437367.19","0.19%"],["2023-02-06","1780.00","1795.00","-23.00","-1.27%","1760.00","1795.00","42661","759573.94","0.34%"],["2023-02-03","1820.00","1818.00","-18.11","-0.99%","1795.68","1826.00","34945","632463.50","0.28%"],["2023-02-02","1848.38","1836.11","-8.86","-0.48%","1826.00","1859.00","29759","546550.94","0.24%"],["2023-02-01","1854.98","1844.97","-0.79","-0.04%","1811.40","1859.00","33974","624467.94","0.27%"],["2023-01-31","1896.50","1845.76","-42.24","-2.24%","1833.07","1899.95","32991","612831.12","0.26%"],["2023-01-30","1909.00","1888.00","27.99","1.50%","1880.00","1909.00","35923","679975.69","0.29%"],["2023-01-20","1889.19","1860.01","-20.20","-1.07%","1858.00","1898.25","25609","480735.59","0.20%"],["2023-01-19","1892.50","1880.21","-12.79","-0.68%","1866.00","1892.52","23439","440199.44","0.19%"],["2023-01-18","1914.00","1893.00","-15.00","-0.79%","1890.00","1925.30","21063","400866.53","0.17%"],["2023-01-17","1913.16","1908.00","-4.90","-0.26%","1895.00","1923.00","21299","406832.16","0.17%"],["2023-01-16","1886.00","1912.90","25.90","1.37%","1881.00","1935.00","36848","705998.31","0.29%"],["2023-01-13","1844.18","1887.00","53.00","2.89%","1840.00","1888.00","31940","596987.62","0.25%"],["2023-01-12","1848.00","1834.00","-10.95","-0.59%","1833.00","1856.00","17193","316263.72","0.14%"],["2023-01-11","1856.00","1844.95","-9.50","-0.51%","1836.84","1860.00","22720","420148.78","0.18%"],["2023-01-10","1839.06","1854.45","13.25","0.72%","1830.50","1864.50","22732","420478.38","0.18%"],["2023-01-09","1835.00","1841.20","37.43","2.08%","1807.82","1849.98","30977","568418.12","0.25%"],["2023-01-06","1806.12","1803.77","2.77","0.15%","1787.00","1811.90","24904","448083.88","0.20%"],["2023-01-05","1737.00","1801.00","75.99","4.41%","1733.00","1801.00","47943","854158.69","0.38%"],["2023-01-04","1730.00","1725.01","-5.00","-0.29%","1716.00","1738.70","20416","352358.22","0.16%"],["2023-01-03","1731.20","1730.01","3.01","0.17%","1706.01","1738.43","26034","448776.03","0.21%"],["2022-12-30","1736.00","1727.00","8.00","0.47%","1727.00","1752.99","25333","440954.41","0.20%"],["2022-12-29","1717.00","1719.00","-14.00","-0.81%","1701.05","1726.99","22418","384449.97","0.18%"],["2022-12-28","1745.88","1733.00","0.00","0.00%","1708.01","1747.00","21438","369994.91","0.17%"],["2022-12-27","1738.00","1733.00","12.85","0.75%","1725.50","1747.15","17905","310927.03","0.14%"],["2022-12-26","1771.00","1742.06","-28.94","-1.63%","1735.02","1771.00","21384","374912.09","0.17%"],["2022-12-23","1752.40","1771.00","3.00","0.17%","1745.00","1782.00","17319","306360.84","0.14%"],["2022-12-22","1756.70","1768.00","29.00","1.67%","1745.00","1783.00","23175","409386.16","0.18%"],["2022-12-21","1724.00","1739.00","24.00","1.40%","1717.65","1739.00","22816","394892.62","0.18%"],["2022-12-20","1765.33","1715.00","-58.00","-3.27%","1682.45","1765.33","46198","794412.06","0.37%"],["2022-12-19","1798.80","1773.00","-13.87","-0.78%","1760.17","1798.80","24987","444723.66","0.20%"]],"code":"cn_600519","stat":["累计:","2022-12-19至2023-04-18","-28.87","-1.62%",1682.45,1935,1961308,35261288.98,"15.59%"]}]
  1. 伪装成浏览器
import urllib.request


url = 'http://www.ctrip.com/'

req = urllib.request.Request(url)

req.add_header('User-Agent',
               'Mozilla/5.0 (iPhone; CPU iPhone OS 10_2_1 like Mac OS X) AppleWebKit/602.4.6 (KHTML, like Gecko) Version/10.0 Mobile/14D27 Safari/602.1')

with urllib.request.urlopen(req) as response:
    data = response.read()
    htmlstr = data.decode()
    if htmlstr.find('mobile') != -1:
        print('移动版')
移动版

使用Selenium爬取数据

from selenium import webdriver

driver = webdriver.Chrome()

driver.get('http://q.stock.sohu.com/cn/600519/lshq.shtml')
em = driver.find_element(By.id,'BIZ_hq_historySearch')
print(em.text)
## driver.close()
driver.quit()
---------------------------------------------------------------------------

AttributeError                            Traceback (most recent call last)

Input In [2], in <cell line: 6>()
      3 driver = webdriver.Chrome()
      5 driver.get('http://q.stock.sohu.com/cn/600519/lshq.shtml')
----> 6 em = driver.find_element_by_id('BIZ_hq_historySearch')
      7 print(em.text)
      8 ## driver.close()


AttributeError: 'WebDriver' object has no attribute 'find_element_by_id'

分析数据

使用正则表达式

import urllib.request

import os
import re

url = 'http://p.weather.com.cn/'


def findallimageurl(htmlstr):
    """从HTML代码中查找匹配的字符串"""

    ## 定义正则表达式
    pattern = r'http://\S+(?:\.png|\.jpg)'
    return re.findall(pattern, htmlstr)


def getfilename(urlstr):
    """根据图片连接地址截取图片名"""

    pos = urlstr.rfind('/')
    return urlstr[pos + 1:]


## 分析获得的url列表
url_list = []
req = urllib.request.Request(url)
with urllib.request.urlopen(req) as response:
    data = response.read()
    htmlstr = data.decode()

    url_list = findallimageurl(htmlstr)

for imagesrc in url_list:
    ## 根据图片地址下载
    req = urllib.request.Request(imagesrc)
    with urllib.request.urlopen(req) as response:
        data = response.read()
        ## 过滤掉用小于100kb字节的图片
        if len(data) < 1024 * 100:
            continue

        ## 创建download文件夹
        if not os.path.exists('download'):
            os.mkdir('download')

        ## 获得图片文件名
        filename = getfilename(imagesrc)
        filename = 'download/' + filename
        ## 保存图片到本地
        with open(filename, 'wb') as f:
            f.write(data)

    print('下载图片', filename)
下载图片 download/20230412105733E6869CA2C51FC9659543B01BCAD594C0.jpg
下载图片 download/2023041210583373DC4BF4E9ABC5CC8C084D45FB133E3A.jpg
下载图片 download/20230412105932202830A62B6E006C698504271BA9D52C.jpg
下载图片 download/20230406160425985ECFF0D26CB2A423DAECD29141F4EE.jpg
下载图片 download/20220401091431D32C5DA957F3441693885B05E271420C.jpg
下载图片 download/2023041812043228512B6723F81BA42BC286530A7AD859.jpg
下载图片 download/20230416152716215BBBA7CCF443222A245DA84B742444.jpg
下载图片 download/202304160947448C2B8A7CF30225471547902BD50AB088.jpg
下载图片 download/20230316141537671B47C5E4F520E11EE0E489187E624F.png

使用BeautifulSoup库

import os
import urllib.request

from bs4 import BeautifulSoup

url = 'http://p.weather.com.cn/'


def findallimageurl(htmlstr):
    """从HTML代码中查找匹配的字符串"""

    sp = BeautifulSoup(htmlstr, 'html.parser') #html.parser html.parser
    ## 返回所有的img标签对象
    imgtaglist = sp.find_all('img')

    ## 从img标签对象列表中返回对应的src列表
    srclist = list(map(lambda u: u.get('src'), imgtaglist))
    ## 过滤掉非.png和.jpg结尾文件src字符串
    filtered_srclist = filter(lambda u: u.lower().endswith('.png')
                                        or u.lower().endswith('.jpg'), srclist)

    return filtered_srclist


def getfilename(urlstr):
    """根据图片连接地址截取图片名"""

    pos = urlstr.rfind('/')
    return urlstr[pos + 1:]


## 分析获得的url列表
url_list = []
req = urllib.request.Request(url)
with urllib.request.urlopen(req) as response:
    data = response.read()
    htmlstr = data.decode()

    url_list = findallimageurl(htmlstr)

for imagesrc in url_list:
    ## 根据图片地址下载
    req = urllib.request.Request(imagesrc)
    with urllib.request.urlopen(req) as response:
        data = response.read()
        ## 过滤掉用小于100kb字节的图片
        if len(data) < 1024 * 100:
            continue

        ## 创建download文件夹
        if not os.path.exists('download1'):
            os.mkdir('download1')

        ## 获得图片文件名
        filename = getfilename(imagesrc)
        filename = 'download1/' + filename
        ## 保存图片到本地
        with open(filename, 'wb') as f:
            f.write(data)

    print('下载图片', filename)
下载图片 download1/20230412105733E6869CA2C51FC9659543B01BCAD594C0.jpg
下载图片 download1/2023041210583373DC4BF4E9ABC5CC8C084D45FB133E3A.jpg
下载图片 download1/20230412105932202830A62B6E006C698504271BA9D52C.jpg
下载图片 download1/20230406160425985ECFF0D26CB2A423DAECD29141F4EE.jpg
下载图片 download1/20220401091431D32C5DA957F3441693885B05E271420C.jpg
下载图片 download1/2023041812043228512B6723F81BA42BC286530A7AD859.jpg
下载图片 download1/20230416152716215BBBA7CCF443222A245DA84B742444.jpg
下载图片 download1/202304160947448C2B8A7CF30225471547902BD50AB088.jpg
下载图片 download1/20230316141537671B47C5E4F520E11EE0E489187E624F.png

爬取Nasdaq股票数据

import datetime
import hashlib
import logging
import os
import re
import threading
import time
import urllib.request

from bs4 import BeautifulSoup

from db.db_access import insert_hisq_data





logging.basicConfig(level=logging.INFO,
                    format='%(asctime)s - %(threadName)s - '
                           '%(name)s - %(funcName)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)

## url = 'https://www.nasdaq.com/symbol/aapl/historical#.UWdnJBDMhHk'
## 换成自己到路径
url = 'file:///C:/Users/HP/nasdaq-Apple1.html'

def validateUpdate(html):
    """验证数据是否更新,更新返回True,未更新返回False"""

    ## 创建md5对象
    md5obj = hashlib.md5()
    md5obj.update(html.encode(encoding='utf-8'))
    md5code = md5obj.hexdigest()

    old_md5code = ''
    f_name = 'md5.txt'

    if os.path.exists(f_name):  ## 如果文件存在读取文件内容
        with open(f_name, 'r', encoding='utf-8') as f:
            old_md5code = f.read()

    if md5code == old_md5code:
        logger.info('数据没有更新')
        return False
    else:
        ## 把新的md5码写入到文件中
        with open(f_name, 'w', encoding='utf-8') as f:
            f.write(md5code)
        logger.info('数据更新')
        return True


## 线程运行标志
isrunning = True
## 爬虫工作间隔
interval = 5


def controlthread_body():
    """控制线程体函数"""

    global interval, isrunning

    while isrunning:
        ## 控制爬虫工作计划
        i = input('输入Bye终止爬虫,输入数字改变爬虫工作间隔,单位秒:')
        logger.info('控制输入{0}'.format(i))
        try:
            interval = int(i)
        except ValueError:
            if i.lower() == 'bye':
                isrunning = False


def istradtime():
    """判断交易时间"""

    now = datetime.datetime.now()
    df = '%H%M%S'
    strnow = now.strftime(df)
    starttime = datetime.time(hour=21, minute=30).strftime(df)
    endtime = datetime.time(hour=4, minute=0).strftime(df)

    if now.weekday() == 5 \
            or now.weekday() == 6 \
            or (endtime < strnow < starttime):
        ## 非工作时间
        return False
    ## 工作时间
    return True


def workthread_body():
    """工作线程体函数"""

    global interval, isrunning

    while isrunning:

        if istradtime():
            ## 交易时间内不工作
            logger.info('交易时间,爬虫休眠1小时...')
            time.sleep(60 * 60)
            continue

        logger.info('爬虫开始工作...')
        req = urllib.request.Request(url)

        with urllib.request.urlopen(req) as response:
            data = response.read()
            html = data.decode()

            sp = BeautifulSoup(html, 'html.parser')
            ## 返回指定CSS选择器的div标签列表
            div = sp.select('div#quotes_content_left_pnlAJAX')
            ## 从列表中返回第一个元素
            divstring = div[0]

            if validateUpdate(divstring):  ## 数据更新
                ## 分析数据
                trlist = sp.select('div#quotes_content_left_pnlAJAX table tbody tr')

                data = []

                for tr in trlist:
                    trtext = tr.text.strip('\n\r ')
                    if trtext == '':
                        continue

                    rows = re.split(r'\s+', trtext)
                    fields = {}
                    try:
                        df = '%m/%d/%Y'
                        fields['Date'] = datetime.datetime.strptime(rows[0], df)
                    except ValueError:
                        ## 实时数据不分析(只有时间,如10:12)
                        continue
                    fields['Open'] = float(rows[1])
                    fields['High'] = float(rows[2])
                    fields['Low'] = float(rows[3])
                    fields['Close'] = float(rows[4])
                    fields['Volume'] = int(rows[5].replace(',', ''))
                    data.append(fields)

                ## 保存数据到数据库
                for row in data:
                    row['Symbol'] = 'AAPL'
                    insert_hisq_data(row)

            ## 爬虫休眠
            logger.info('爬虫休眠{0}秒...'.format(interval))
            time.sleep(interval)


def main():
    """主函数"""

    global interval, isrunning
    ## 创建工作线程对象workthread
    workthread = threading.Thread(target=workthread_body, name='WorkThread')
    ## 启动线程workthread
    workthread.start()

    ## 创建控制线程对象controlthread
    controlthread = threading.Thread(target=controlthread_body, name='ControlThread')
    ## 启动线程controlthread
    controlthread.start()


if __name__ == '__main__':
    main()
2023-04-19 15:46:27,709 - WorkThread - __main__ - workthread_body - INFO - 爬虫开始工作...
2023-04-19 15:46:28,157 - WorkThread - __main__ - validateUpdate - INFO - 数据更新
2023-04-19 15:46:28,236 - WorkThread - __main__ - workthread_body - INFO - 爬虫休眠5秒...
2023-04-19 15:46:33,247 - WorkThread - __main__ - workthread_body - INFO - 爬虫开始工作...
2023-04-19 15:46:33,255 - WorkThread - __main__ - validateUpdate - INFO - 数据没有更新
2023-04-19 15:46:33,256 - WorkThread - __main__ - workthread_body - INFO - 爬虫休眠5秒...


输入Bye终止爬虫,输入数字改变爬虫工作间隔,单位秒:3600


2023-04-19 15:46:36,048 - ControlThread - __main__ - controlthread_body - INFO - 控制输入3600


输入Bye终止爬虫,输入数字改变爬虫工作间隔,单位秒:

Exception in thread ControlThread:
Traceback (most recent call last):
  File "E:\anaconda\lib\threading.py", line 973, in _bootstrap_inner
    self.run()
  File "E:\anaconda\lib\threading.py", line 910, in run
    self._target(*self._args, **self._kwargs)
  File "C:\Users\HP\AppData\Local\Temp\ipykernel_22288\985097547.py", line 66, in controlthread_body
EOFError: EOF when reading a line
2023-04-19 15:46:38,259 - WorkThread - __main__ - workthread_body - INFO - 爬虫开始工作...
2023-04-19 15:46:38,267 - WorkThread - __main__ - validateUpdate - INFO - 数据没有更新
2023-04-19 15:46:38,267 - WorkThread - __main__ - workthread_body - INFO - 爬虫休眠3600秒...

Pandas进阶

import numpy as np
import pandas as pd
data = pd.Series(np.random.randn(9), index=[['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'd'], [1, 2, 3, 1, 3, 1, 2, 2, 3]])
data
a  1   -0.018841
   2    0.291057
   3   -0.869647
b  1    0.500437
   3   -1.678710
c  1   -1.957127
   2   -0.563527
d  2    0.454833
   3   -0.343765
dtype: float64
data.index
MultiIndex([('a', 1),
            ('a', 2),
            ('a', 3),
            ('b', 1),
            ('b', 3),
            ('c', 1),
            ('c', 2),
            ('d', 2),
            ('d', 3)],
           )
data['b']
1    0.500437
3   -1.678710
dtype: float64
data['b':'c']
b  1    0.500437
   3   -1.678710
c  1   -1.957127
   2   -0.563527
dtype: float64
data.loc[['b','d']]
b  1    0.500437
   3   -1.678710
d  2    0.454833
   3   -0.343765
dtype: float64
data.loc[:,2]
a    0.291057
c   -0.563527
d    0.454833
dtype: float64
frame = pd.DataFrame(np.arange(12).reshape((4, 3)), index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]],                 columns=[['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']])
frame

Ohio Colorado
Green Red Green
a 1 0 1 2
2 3 4 5
b 1 6 7 8
2 9 10 11
frame.index.names = ['key1', 'key2']
frame.columns.names = ['state', 'color']
frame

state Ohio Colorado
color Green Red Green
key1 key2
a 1 0 1 2
2 3 4 5
b 1 6 7 8
2 9 10 11
frame['Ohio']

color Green Red
key1 key2
a 1 0 1
2 3 4
b 1 6 7
2 9 10
from pandas import *
MultiIndex.from_arrays([['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']], names=['state', 'color'])
MultiIndex([(    'Ohio', 'Green'),
            (    'Ohio',   'Red'),
            ('Colorado', 'Green')],
           names=['state', 'color'])
frame.swaplevel('key1', 'key2')

state Ohio Colorado
color Green Red Green
key2 key1
1 a 0 1 2
2 a 3 4 5
1 b 6 7 8
2 b 9 10 11
frame.sort_index(level=1)

state Ohio Colorado
color Green Red Green
key1 key2
a 1 0 1 2
b 1 6 7 8
a 2 3 4 5
b 2 9 10 11
frame.swaplevel(0, 1).sort_index(level=0)

state Ohio Colorado
color Green Red Green
key2 key1
1 a 0 1 2
b 6 7 8
2 a 3 4 5
b 9 10 11
frame.sum(level='key2')
C:\Users\HP\AppData\Local\Temp\ipykernel_21392\2004046222.py:1: FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.sum(level=1) should use df.groupby(level=1).sum().
  frame.sum(level='key2')

state Ohio Colorado
color Green Red Green
key2
1 6 8 10
2 12 14 16
frame.sum(level='color', axis=1)
C:\Users\HP\AppData\Local\Temp\ipykernel_21392\4133796543.py:1: FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.sum(level=1) should use df.groupby(level=1).sum().
  frame.sum(level='color', axis=1)

color Green Red
key1 key2
a 1 2 1
2 8 4
b 1 14 7
2 20 10
frame.describe()

state Ohio Colorado
color Green Red Green
count 4.000000 4.000000 4.000000
mean 4.500000 5.500000 6.500000
std 3.872983 3.872983 3.872983
min 0.000000 1.000000 2.000000
25% 2.250000 3.250000 4.250000
50% 4.500000 5.500000 6.500000
75% 6.750000 7.750000 8.750000
max 9.000000 10.000000 11.000000
frame = pd.DataFrame({'a': range(7), 'b': range(7, 0, -1), 'c': ['one', 'one', 'one', 'two', 'two', 'two', 'two'], 'd': [0, 1, 2, 0, 1, 2, 3]})
frame

a b c d
0 0 7 one 0
1 1 6 one 1
2 2 5 one 2
3 3 4 two 0
4 4 3 two 1
5 5 2 two 2
6 6 1 two 3
frame2 = frame.set_index(['c', 'd'])
frame2

a b
c d
one 0 0 7
1 1 6
2 2 5
two 0 3 4
1 4 3
2 5 2
3 6 1
frame.set_index(['c', 'd'], drop=False)

a b c d
c d
one 0 0 7 one 0
1 1 6 one 1
2 2 5 one 2
two 0 3 4 two 0
1 4 3 two 1
2 5 2 two 2
3 6 1 two 3
frame2.reset_index()

c d a b
0 one 0 0 7
1 one 1 1 6
2 one 2 2 5
3 two 0 3 4
4 two 1 4 3
5 two 2 5 2
6 two 3 6 1
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1': range(7)})
df2 = pd.DataFrame({'key': ['a', 'b', 'd'], 'data2': range(3)})
df1

key data1
0 b 0
1 b 1
2 a 2
3 c 3
4 a 4
5 a 5
6 b 6
df2

key data2
0 a 0
1 b 1
2 d 2
pd.merge(df1,df2)

key data1 data2
0 b 0 1
1 b 1 1
2 b 6 1
3 a 2 0
4 a 4 0
5 a 5 0
pd.merge(df1,df2,on='key')

key data1 data2
0 b 0 1
1 b 1 1
2 b 6 1
3 a 2 0
4 a 4 0
5 a 5 0
df3 = pd.DataFrame({'lkey': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1': range(7)})
df4 = pd.DataFrame({'rkey': ['a', 'b', 'd'], 'data2': range(3)})
pd.merge(df3, df4, left_on='lkey', right_on='rkey')  #分别指定列名

lkey data1 rkey data2
0 b 0 b 1
1 b 1 b 1
2 b 6 b 1
3 a 2 a 0
4 a 4 a 0
5 a 5 a 0
df3

lkey data1
0 b 0
1 b 1
2 a 2
3 c 3
4 a 4
5 a 5
6 b 6
df4

rkey data2
0 a 0
1 b 1
2 d 2
pd.merge(df1,df2,how='outer') 

key data1 data2
0 b 0.0 1.0
1 b 1.0 1.0
2 b 6.0 1.0
3 a 2.0 0.0
4 a 4.0 0.0
5 a 5.0 0.0
6 c 3.0 NaN
7 d NaN 2.0
pd.merge(df1,df2,how='left')

key data1 data2
0 b 0 1.0
1 b 1 1.0
2 a 2 0.0
3 c 3 NaN
4 a 4 0.0
5 a 5 0.0
6 b 6 1.0
pd.merge(df1,df2,how='right') 

key data1 data2
0 a 2.0 0
1 a 4.0 0
2 a 5.0 0
3 b 0.0 1
4 b 1.0 1
5 b 6.0 1
6 d NaN 2
df1

key data1
0 b 0
1 b 1
2 a 2
3 c 3
4 a 4
5 a 5
6 b 6
df2

key data2
0 a 0
1 b 1
2 d 2
left = pd.DataFrame({'key1': ['foo', 'foo', 'bar'], 'key2': ['one', 'two', 'one'], 'lval': [1, 2, 3]})
right = pd.DataFrame({'key1': ['foo', 'foo', 'bar', 'bar'], 'key2': ['one', 'one', 'one', 'two'], 'rval': [4, 5, 6, 7]})
pd.merge(left, right, on=['key1', 'key2'], how='outer')

key1 key2 lval rval
0 foo one 1.0 4.0
1 foo one 1.0 5.0
2 foo two 2.0 NaN
3 bar one 3.0 6.0
4 bar two NaN 7.0
pd.merge(left, right, on='key1')

key1 key2_x lval key2_y rval
0 foo one 1 one 4
1 foo one 1 one 5
2 foo two 2 one 4
3 foo two 2 one 5
4 bar one 3 one 6
5 bar one 3 two 7
left

key1 key2 lval
0 foo one 1
1 foo two 2
2 bar one 3
right

key1 key2 rval
0 foo one 4
1 foo one 5
2 bar one 6
3 bar two 7
pd.merge(left, right, on='key1', suffixes=('_left', '_right')) 

key1 key2_left lval key2_right rval
0 foo one 1 one 4
1 foo one 1 one 5
2 foo two 2 one 4
3 foo two 2 one 5
4 bar one 3 one 6
5 bar one 3 two 7
import pandas as pd
left1 = pd.DataFrame({'key': ['a', 'b', 'a', 'a', 'b', 'c'], 'value': range(6)})
right1 = pd.DataFrame({'group_val': [3.5, 7]}, index=['a', 'b'])
left1

key value
0 a 0
1 b 1
2 a 2
3 a 3
4 b 4
5 c 5
right1

group_val
a 3.5
b 7.0
pd.merge(left1, right1, left_on='key', right_index=True)

key value group_val
0 a 0 3.5
2 a 2 3.5
3 a 3 3.5
1 b 1 7.0
4 b 4 7.0
import pandas as pd
import numpy as np
lefth = pd.DataFrame({'key1': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], 'key2': [2000, 2001, 2002, 2001, 2002], 'data': np.arange(5.)})
righth = pd.DataFrame(np.arange(12).reshape((6, 2)),  index=[['Nevada', 'Nevada', 'Ohio', 'Ohio', 'Ohio', 'Ohio'], [2001, 2000, 2000, 2000, 2001, 2002]], columns=['event1', 'event2'])
lefth

key1 key2 data
0 Ohio 2000 0.0
1 Ohio 2001 1.0
2 Ohio 2002 2.0
3 Nevada 2001 3.0
4 Nevada 2002 4.0
righth

event1 event2
Nevada 2001 0 1
2000 2 3
Ohio 2000 4 5
2000 6 7
2001 8 9
2002 10 11
pd.merge(lefth, righth, left_on=['key1', 'key2'], right_index=True)

key1 key2 data event1 event2
0 Ohio 2000 0.0 4 5
0 Ohio 2000 0.0 6 7
1 Ohio 2001 1.0 8 9
2 Ohio 2002 2.0 10 11
3 Nevada 2001 3.0 0 1
pd.merge(lefth, righth, left_on=['key1', 'key2'], right_index=True, how='outer')

key1 key2 data event1 event2
0 Ohio 2000 0.0 4.0 5.0
0 Ohio 2000 0.0 6.0 7.0
1 Ohio 2001 1.0 8.0 9.0
2 Ohio 2002 2.0 10.0 11.0
3 Nevada 2001 3.0 0.0 1.0
4 Nevada 2002 4.0 NaN NaN
4 Nevada 2000 NaN 2.0 3.0
left2 = pd.DataFrame([[1., 2.], [3., 4.], [5., 6.]], index=['a', 'c', 'e'], columns=['Ohio', 'Nevada'])
right2 = pd.DataFrame([[7., 8.], [9., 10.], [11., 12.], [13, 14]], index=['b', 'c', 'd', 'e'], columns=['Missouri', 'Alabama'])
left2         

Ohio Nevada
a 1.0 2.0
c 3.0 4.0
e 5.0 6.0
right2

Missouri Alabama
b 7.0 8.0
c 9.0 10.0
d 11.0 12.0
e 13.0 14.0
pd.merge(left2, right2, how='outer', left_index=True, right_index=True)

Ohio Nevada Missouri Alabama
a 1.0 2.0 NaN NaN
b NaN NaN 7.0 8.0
c 3.0 4.0 9.0 10.0
d NaN NaN 11.0 12.0
e 5.0 6.0 13.0 14.0
left2.join(right2, how='outer')

Ohio Nevada Missouri Alabama
a 1.0 2.0 NaN NaN
b NaN NaN 7.0 8.0
c 3.0 4.0 9.0 10.0
d NaN NaN 11.0 12.0
e 5.0 6.0 13.0 14.0
another = pd.DataFrame([[7., 8.], [9., 10.], [11., 12.], [16., 17.]], index=['a', 'c', 'e', 'f'],                     columns=['New York', 'Oregon'])
another

New York Oregon
a 7.0 8.0
c 9.0 10.0
e 11.0 12.0
f 16.0 17.0
left2.join(right2)

Ohio Nevada Missouri Alabama
a 1.0 2.0 NaN NaN
c 3.0 4.0 9.0 10.0
e 5.0 6.0 13.0 14.0
left2.join([right2, another])

Ohio Nevada Missouri Alabama New York Oregon
a 1.0 2.0 NaN NaN 7.0 8.0
c 3.0 4.0 9.0 10.0 9.0 10.0
e 5.0 6.0 13.0 14.0 11.0 12.0
left2.join([right2, another], how='outer')

Ohio Nevada Missouri Alabama New York Oregon
a 1.0 2.0 NaN NaN 7.0 8.0
c 3.0 4.0 9.0 10.0 9.0 10.0
e 5.0 6.0 13.0 14.0 11.0 12.0
b NaN NaN 7.0 8.0 NaN NaN
d NaN NaN 11.0 12.0 NaN NaN
f NaN NaN NaN NaN 16.0 17.0
s1 = pd.Series([0, 1], index=['a', 'b'])
s2 = pd.Series([2, 3, 4], index=['c', 'd', 'e'])
s3 = pd.Series([5, 6], index=['f', 'g'])
pd.concat([s1, s2, s3])
a    0
b    1
c    2
d    3
e    4
f    5
g    6
dtype: int64
pd.concat([s1, s2, s3], axis=1) #变为DataFrame

0 1 2
a 0.0 NaN NaN
b 1.0 NaN NaN
c NaN 2.0 NaN
d NaN 3.0 NaN
e NaN 4.0 NaN
f NaN NaN 5.0
g NaN NaN 6.0
s4 = pd.concat([s1, s3])
s4
a    0
b    1
f    5
g    6
dtype: int64
pd.concat([s1, s4], axis=1)

0 1
a 0.0 0
b 1.0 1
f NaN 5
g NaN 6
pd.concat([s1, s4], axis=1, join='inner')

0 1
a 0 0
b 1 1
pd.concat([s1, s4], axis=1, join_axes=[['a', 'c', 'b', 'e']])
---------------------------------------------------------------------------

TypeError                                 Traceback (most recent call last)

Input In [20], in <cell line: 1>()
----> 1 pd.concat([s1, s4], axis=1, join_axes=[['a', 'c', 'b', 'e']])


File E:\anaconda\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)


TypeError: concat() got an unexpected keyword argument 'join_axes'
result = pd.concat([s1, s1, s3], keys=['one', 'two', 'three'])
result
one    a    0
       b    1
two    a    0
       b    1
three  f    5
       g    6
dtype: int64
result.unstack()

a b f g
one 0.0 1.0 NaN NaN
two 0.0 1.0 NaN NaN
three NaN NaN 5.0 6.0
pd.concat([s1, s2, s3], axis=1, keys=['one', 'two', 'three'])

one two three
a 0.0 NaN NaN
b 1.0 NaN NaN
c NaN 2.0 NaN
d NaN 3.0 NaN
e NaN 4.0 NaN
f NaN NaN 5.0
g NaN NaN 6.0
df1 = pd.DataFrame(np.arange(6).reshape(3, 2), index=['a', 'b', 'c'], columns=['one', 'two'])
df2 = pd.DataFrame(5 + np.arange(4).reshape(2, 2), index=['a', 'c'], columns=['three', 'four'])
df1

one two
a 0 1
b 2 3
c 4 5
df2

three four
a 5 6
c 7 8
pd.concat([df1, df2], axis=1, keys=['level1', 'level2'])

level1 level2
one two three four
a 0 1 5.0 6.0
b 2 3 NaN NaN
c 4 5 7.0 8.0
pd.concat({'level1': df1, 'level2': df2}, axis=1)

level1 level2
one two three four
a 0 1 5.0 6.0
b 2 3 NaN NaN
c 4 5 7.0 8.0
pd.concat([df1, df2], axis=1, keys=['level1', 'level2'], names=['upper', 'lower'])

upper level1 level2
lower one two three four
a 0 1 5.0 6.0
b 2 3 NaN NaN
c 4 5 7.0 8.0
df1 = pd.DataFrame(np.random.randn(3, 4), columns=['a', 'b', 'c', 'd'])
df2 = pd.DataFrame(np.random.randn(2, 3), columns=['b', 'd', 'a'])
df1

a b c d
0 0.527674 2.145525 1.979097 1.702063
1 -0.350557 -0.511584 -1.061349 -0.702928
2 -1.239068 -1.240555 -0.295705 0.209181
df2

b d a
0 1.718647 -2.931403 0.129779
1 1.482412 -1.022705 -1.186445
pd.concat([df1, df2], ignore_index=True)

a b c d
0 0.527674 2.145525 1.979097 1.702063
1 -0.350557 -0.511584 -1.061349 -0.702928
2 -1.239068 -1.240555 -0.295705 0.209181
3 0.129779 1.718647 NaN -2.931403
4 -1.186445 1.482412 NaN -1.022705
a = pd.Series([np.nan, 2.5, np.nan, 3.5, 4.5, np.nan], index=['f', 'e', 'd', 'c', 'b', 'a'])
b = pd.Series(np.arange(len(a), dtype=np.float64), index=['f', 'e', 'd', 'c', 'b', 'a'])
b[-1] = np.nan
a
f    NaN
e    2.5
d    NaN
c    3.5
b    4.5
a    NaN
dtype: float64
b
f    0.0
e    1.0
d    2.0
c    3.0
b    4.0
a    NaN
dtype: float64
np.where(pd.isnull(a), b, a)
array([0. , 2.5, 2. , 3.5, 4.5, nan])
b[:-2].combine_first(a[2:])
a    NaN
b    4.5
c    3.0
d    2.0
e    1.0
f    0.0
dtype: float64
df1 = pd.DataFrame({'a': [1., np.nan, 5., np.nan], 'b': [np.nan, 2., np.nan, 6.], 'c': range(2, 18, 4)})
df2 = pd.DataFrame({'a': [5., 4., np.nan, 3., 7.], 'b': [np.nan, 3., 4., 6., 8.]})
df1

a b c
0 1.0 NaN 2
1 NaN 2.0 6
2 5.0 NaN 10
3 NaN 6.0 14
df2

a b
0 5.0 NaN
1 4.0 3.0
2 NaN 4.0
3 3.0 6.0
4 7.0 8.0
df1.combine_first(df2)

a b c
0 1.0 NaN 2.0
1 4.0 2.0 6.0
2 5.0 4.0 10.0
3 3.0 6.0 14.0
4 7.0 8.0 NaN
data = pd.DataFrame(np.arange(6).reshape((2, 3)), index=pd.Index(['Ohio', 'Colorado'], name='state'), columns=pd.Index(['one', 'two', 'three'], name='number'))
data
result = data.stack()
result
result.unstack()  #默认操作最内层
result.unstack(0)  #指定分层编号

state Ohio Colorado
number
one 0 3
two 1 4
three 2 5
data = pd.DataFrame(np.arange(6).reshape((2, 3)), index=pd.Index(['Ohio', 'Colorado'], name='state'), columns=pd.Index(['one', 'two', 'three'], name='number'))
data

number one two three
state
Ohio 0 1 2
Colorado 3 4 5
result = data.stack()
result
state     number
Ohio      one       0
          two       1
          three     2
Colorado  one       3
          two       4
          three     5
dtype: int32
result.unstack()  #默认操作最内层

number one two three
state
Ohio 0 1 2
Colorado 3 4 5
result.unstack(0)  #指定操作最内层

state Ohio Colorado
number
one 0 3
two 1 4
three 2 5
s1 = pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([4, 5, 6], index=['c', 'd', 'e'])
data2 = pd.concat([s1, s2], keys=['one', 'two'])
data2
data2.unstack() #默认引入缺失数据
data2.unstack().stack()
data2.unstack().stack(dropna=False)
one  a    0.0
     b    1.0
     c    2.0
     d    3.0
     e    NaN
two  a    NaN
     b    NaN
     c    4.0
     d    5.0
     e    6.0
dtype: float64
s1 = pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([4, 5, 6], index=['c', 'd', 'e'])
data2 = pd.concat([s1, s2], keys=['one', 'two'])
data2
one  a    0
     b    1
     c    2
     d    3
two  c    4
     d    5
     e    6
dtype: int64
data2.unstack() #默认引入缺失数据

a b c d e
one 0.0 1.0 2.0 3.0 NaN
two NaN NaN 4.0 5.0 6.0
data2.unstack().stack()
one  a    0.0
     b    1.0
     c    2.0
     d    3.0
two  c    4.0
     d    5.0
     e    6.0
dtype: float64
data2.unstack().stack(dropna=False)
one  a    0.0
     b    1.0
     c    2.0
     d    3.0
     e    NaN
two  a    NaN
     b    NaN
     c    4.0
     d    5.0
     e    6.0
dtype: float64
df = pd.DataFrame({'left': result, 'right': result + 5}, columns=pd.Index(['left', 'right'], name='side'))
df

side left right
state number
Ohio one 0 5
two 1 6
three 2 7
Colorado one 3 8
two 4 9
three 5 10
df.unstack('state')

side left right
state Ohio Colorado Ohio Colorado
number
one 0 3 5 8
two 1 4 6 9
three 2 5 7 10
df.unstack('state').stack('side')

state Colorado Ohio
number side
one left 3 0
right 8 5
two left 4 1
right 9 6
three left 5 2
right 10 7
data

number one two three
state
Ohio 0 1 2
Colorado 3 4 5
data

number one two three
state
Ohio 0 1 2
Colorado 3 4 5
data = pd.DataFrame({'k1': ['one', 'two'] * 3 + ['two'], 'k2': [1, 1, 2, 3, 3, 4, 4]})
data

k1 k2
0 one 1
1 two 1
2 one 2
3 two 3
4 one 3
5 two 4
6 two 4
data.duplicated() #默认判断全部列
0    False
1    False
2    False
3    False
4    False
5    False
6     True
dtype: bool
data.drop_duplicates() #默认保留第一次出现的值

k1 k2
0 one 1
1 two 1
2 one 2
3 two 3
4 one 3
5 two 4
data['v1'] = range(7)
data.drop_duplicates(['k1'])

k1 k2 v1
0 one 1 0
1 two 1 1
data.drop_duplicates(['k1', 'k2'], keep='last')

k1 k2 v1
0 one 1 0
1 two 1 1
2 one 2 2
3 two 3 3
4 one 3 4
6 two 4 6
data = pd.DataFrame({'food': ['bacon', 'pulled pork', 'bacon', 'Pastrami', 'corned beef', 'Bacon',
                'pastrami', 'honey ham', 'nova lox'], 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})
data

food ounces
0 bacon 4.0
1 pulled pork 3.0
2 bacon 12.0
3 Pastrami 6.0
4 corned beef 7.5
5 Bacon 8.0
6 pastrami 3.0
7 honey ham 5.0
8 nova lox 6.0
meat_to_animal = {
  'bacon': 'pig',
  'pulled pork': 'pig',
  'pastrami': 'cow',
  'corned beef': 'cow',
  'honey ham': 'pig',
  'nova lox': 'salmon'
}
lowercased = data['food'].str.lower()
lowercased
0          bacon
1    pulled pork
2          bacon
3       pastrami
4    corned beef
5          bacon
6       pastrami
7      honey ham
8       nova lox
Name: food, dtype: object
data['animal'] = lowercased.map(meat_to_animal)
data

food ounces animal
0 bacon 4.0 pig
1 pulled pork 3.0 pig
2 bacon 12.0 pig
3 Pastrami 6.0 cow
4 corned beef 7.5 cow
5 Bacon 8.0 pig
6 pastrami 3.0 cow
7 honey ham 5.0 pig
8 nova lox 6.0 salmon
data['food'].map(lambda x: meat_to_animal[x.lower()])
0       pig
1       pig
2       pig
3       cow
4       cow
5       pig
6       cow
7       pig
8    salmon
Name: food, dtype: object
data = pd.Series([1., -999., 2., -999., -1000., 3.])
data
0       1.0
1    -999.0
2       2.0
3    -999.0
4   -1000.0
5       3.0
dtype: float64
data.replace(-999, np.nan)
0       1.0
1       NaN
2       2.0
3       NaN
4   -1000.0
5       3.0
dtype: float64
data.replace([-999, -1000], [np.nan, 0])
0    1.0
1    NaN
2    2.0
3    NaN
4    0.0
5    3.0
dtype: float64
data.replace({-999: np.nan, -1000: 0})
0    1.0
1    NaN
2    2.0
3    NaN
4    0.0
5    3.0
dtype: float64
data = pd.DataFrame(np.arange(12).reshape((3, 4)), index=['Ohio', 'Colorado', 'New York'],                   columns=['one', 'two', 'three', 'four'])
data.rename(index=str.title, columns=str.upper)

ONE TWO THREE FOUR
Ohio 0 1 2 3
Colorado 4 5 6 7
New York 8 9 10 11
transform = lambda x: x[:4].upper()
data.index.map(transform)
Index(['OHIO', 'COLO', 'NEW '], dtype='object')
data.index=data.index.map(transform)
data

one two three four
OHIO 0 1 2 3
COLO 4 5 6 7
NEW 8 9 10 11
data.rename(index={'OHIO': 'INDIANA'},  columns={'three': 'peekaboo'})

one two peekaboo four
INDIANA 0 1 2 3
COLO 4 5 6 7
NEW 8 9 10 11
data.rename(index={'OHIO': 'INDIANA'}, inplace=True)
data 

one two three four
INDIANA 0 1 2 3
COLO 4 5 6 7
NEW 8 9 10 11
ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32]
bins = [18, 25, 35, 60, 100]
cats = pd.cut(ages, bins)
cats
[(18, 25], (18, 25], (18, 25], (25, 35], (18, 25], ..., (25, 35], (60, 100], (35, 60], (35, 60], (25, 35]]
Length: 12
Categories (4, interval[int64, right]): [(18, 25] < (25, 35] < (35, 60] < (60, 100]]
cats.codes       #分组的编码
array([0, 0, 0, 1, 0, 0, 2, 1, 3, 2, 2, 1], dtype=int8)
cats.categories
IntervalIndex([(18, 25], (25, 35], (35, 60], (60, 100]], dtype='interval[int64, right]')
pd.value_counts(cats)
(18, 25]     5
(25, 35]     3
(35, 60]     3
(60, 100]    1
dtype: int64
pd.cut(ages, [18, 26, 36, 61, 100], right=False)
[[18, 26), [18, 26), [18, 26), [26, 36), [18, 26), ..., [26, 36), [61, 100), [36, 61), [36, 61), [26, 36)]
Length: 12
Categories (4, interval[int64, left]): [[18, 26) < [26, 36) < [36, 61) < [61, 100)]
group_names = ['Youth', 'YoungAdult', 'MiddleAged', 'Senior']
pd.cut(ages, bins, labels=group_names)
['Youth', 'Youth', 'Youth', 'YoungAdult', 'Youth', ..., 'YoungAdult', 'Senior', 'MiddleAged', 'MiddleAged', 'YoungAdult']
Length: 12
Categories (4, object): ['Youth' < 'YoungAdult' < 'MiddleAged' < 'Senior']
data = np.random.rand(20)
data
array([0.12967787, 0.87168374, 0.24167497, 0.56688941, 0.22964312,
       0.30205167, 0.88297675, 0.22349301, 0.18292263, 0.81072534,
       0.25054152, 0.99378214, 0.78439125, 0.3970331 , 0.89049743,
       0.51677834, 0.76808437, 0.54701119, 0.79386529, 0.25451132])
temp=pd.cut(data, 4, precision=2)   #划分的分组数而不是边界,边界按最大最小平均分
temp
[(0.13, 0.35], (0.78, 0.99], (0.13, 0.35], (0.56, 0.78], (0.13, 0.35], ..., (0.35, 0.56], (0.56, 0.78], (0.35, 0.56], (0.78, 0.99], (0.13, 0.35]]
Length: 20
Categories (4, interval[float64, right]): [(0.13, 0.35] < (0.35, 0.56] < (0.56, 0.78] < (0.78, 0.99]]
pd.value_counts(temp)
(0.13, 0.35]    8
(0.78, 0.99]    7
(0.35, 0.56]    3
(0.56, 0.78]    2
dtype: int64
data = np.random.randn(1000)  ## Normally distributed
cats = pd.qcut(data, 4)  #将所有数据平均分为4部分
cats
[(-0.726, -0.00747], (-0.00747, 0.636], (-3.057, -0.726], (-3.057, -0.726], (-0.00747, 0.636], ..., (-0.726, -0.00747], (-0.726, -0.00747], (-0.726, -0.00747], (0.636, 2.834], (-3.057, -0.726]]
Length: 1000
Categories (4, interval[float64, right]): [(-3.057, -0.726] < (-0.726, -0.00747] < (-0.00747, 0.636] < (0.636, 2.834]]
pd.value_counts(cats)
(-3.057, -0.726]      250
(-0.726, -0.00747]    250
(-0.00747, 0.636]     250
(0.636, 2.834]        250
dtype: int64
pd.qcut(data, [0, 0.1, 0.5, 0.9, 1.])
[(-1.239, -0.00747], (-0.00747, 1.338], (-3.057, -1.239], (-3.057, -1.239], (-0.00747, 1.338], ..., (-1.239, -0.00747], (-1.239, -0.00747], (-1.239, -0.00747], (1.338, 2.834], (-3.057, -1.239]]
Length: 1000
Categories (4, interval[float64, right]): [(-3.057, -1.239] < (-1.239, -0.00747] < (-0.00747, 1.338] < (1.338, 2.834]]
data = pd.DataFrame(np.random.randn(1000, 4))
data.describe()

0 1 2 3
count 1000.000000 1000.000000 1000.000000 1000.000000
mean -0.068024 0.015781 0.048655 -0.019467
std 1.050557 0.963683 0.972374 1.031390
min -3.617567 -2.550853 -3.372664 -3.196753
25% -0.718715 -0.591289 -0.606569 -0.712316
50% -0.066156 0.004574 0.068207 0.000122
75% 0.627520 0.662984 0.747493 0.673216
max 2.940831 2.865724 3.369795 3.364796
col=data[2]
col[np.abs(col) > 3]
340    3.196054
445   -3.159953
533   -3.156547
628    3.369795
698   -3.372664
Name: 2, dtype: float64
data[(np.abs(data) > 3).any(1)] #选出超过3的行

0 1 2 3
55 -3.157032 -0.841691 1.018759 -0.018302
340 0.456149 0.854559 3.196054 0.353166
343 -3.283047 -0.316560 -0.121576 0.584322
407 -0.089158 -0.604724 1.028259 3.364796
445 0.300672 -0.848071 -3.159953 0.870023
533 -0.048864 0.152498 -3.156547 -0.968370
628 1.119083 0.171787 3.369795 -0.550373
698 -0.517293 -1.208259 -3.372664 -0.418606
824 -3.459360 -0.702142 0.325501 0.653165
873 -3.617567 -1.302917 -0.577524 0.859530
923 -0.920904 -0.103102 -0.581829 -3.196753
981 0.672200 -0.274157 -0.883970 -3.038320
data[np.abs(data) > 3] = np.sign(data) * 3
data.describe()

0 1 2 3
count 1000.000000 1000.000000 1000.000000 1000.000000
mean -0.066507 0.015781 0.048779 -0.019597
std 1.045975 0.963683 0.968296 1.029555
min -3.000000 -2.550853 -3.000000 -3.000000
25% -0.718715 -0.591289 -0.606569 -0.712316
50% -0.066156 0.004574 0.068207 0.000122
75% 0.627520 0.662984 0.747493 0.673216
max 2.940831 2.865724 3.000000 3.000000
df = pd.DataFrame(np.arange(5 * 4).reshape((5, 4)))
sampler = np.random.permutation(5)  #表示新顺序的数组
sampler
array([1, 2, 0, 4, 3])
df

0 1 2 3
0 0 1 2 3
1 4 5 6 7
2 8 9 10 11
3 12 13 14 15
4 16 17 18 19
df.take(sampler)

0 1 2 3
1 4 5 6 7
2 8 9 10 11
0 0 1 2 3
4 16 17 18 19
3 12 13 14 15
df.sample(n=3)

0 1 2 3
4 16 17 18 19
1 4 5 6 7
2 8 9 10 11
choices = pd.Series([5, 7, -1, 6, 4])
draws = choices.sample(n=10, replace=True)
draws
1    7
0    5
4    4
0    5
3    6
3    6
1    7
4    4
1    7
1    7
dtype: int64
df = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'b'],  'data1': range(6)})
df

key data1
0 b 0
1 b 1
2 a 2
3 c 3
4 a 4
5 b 5
pd.get_dummies(df['key'])

a b c
0 0 1 0
1 0 1 0
2 1 0 0
3 0 0 1
4 1 0 0
5 0 1 0
dummies = pd.get_dummies(df['key'], prefix='key')
df_with_dummy = df[['data1']].join(dummies)
df_with_dummy

data1 key_a key_b key_c
0 0 0 1 0
1 1 0 1 0
2 2 1 0 0
3 3 0 0 1
4 4 1 0 0
5 5 0 1 0
data = {'Dave': 'dave@google.com', 'Steve': 'steve@gmail.com', 'Rob': 'rob@gmail.com', 'Wes': np.nan}
data = pd.Series(data)
data
Dave     dave@google.com
Steve    steve@gmail.com
Rob        rob@gmail.com
Wes                  NaN
dtype: object
data.isnull()
Dave     False
Steve    False
Rob      False
Wes       True
dtype: bool
data.str.contains('gmail') #data.map可以将字符串函数作用于各个值,但是遇见NaN会报错,str不会
Dave     False
Steve     True
Rob       True
Wes        NaN
dtype: object
import re
pattern='([A-Z0-9._%+-]+)@([A-Z0-9._-]+)\\.([A-Z]{2,4})'
data.str.findall(pattern, flags=re.IGNORECASE)
Dave     [(dave, google, com)]
Steve    [(steve, gmail, com)]
Rob        [(rob, gmail, com)]
Wes                        NaN
dtype: object
df = pd.DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'], 'key2' : ['one', 'two', 'one', 'two', 'one'],
                   'data1' : np.random.randn(5), 'data2' : np.random.randn(5)})
df

key1 key2 data1 data2
0 a one -0.083293 0.456279
1 a two -0.442362 -0.337304
2 b one 0.244770 0.943875
3 b two 0.862879 0.444040
4 a one 0.858584 0.527193
grouped = df['data1'].groupby(df['key1'])
grouped
<pandas.core.groupby.generic.SeriesGroupBy object at 0x0000024CEBF7B880>
grouped.mean()
key1
a    0.110977
b    0.553824
Name: data1, dtype: float64
means = df['data1'].groupby([df['key1'], df['key2']]).mean()
means
key1  key2
a     one     0.387646
      two    -0.442362
b     one     0.244770
      two     0.862879
Name: data1, dtype: float64
means.unstack()

key2 one two
key1
a 0.387646 -0.442362
b 0.244770 0.862879
states = np.array(['Ohio', 'California', 'California', 'Ohio', 'Ohio'])
years = np.array([2005, 2005, 2006, 2005, 2006])
df['data1'].groupby([states, years]).mean()
California  2005   -0.442362
            2006    0.244770
Ohio        2005    0.389793
            2006    0.858584
Name: data1, dtype: float64
df.groupby('key1').mean()

data1 data2
key1
a 0.110977 0.215390
b 0.553824 0.693958
df.groupby(['key1', 'key2']).mean()

data1 data2
key1 key2
a one 0.387646 0.491736
two -0.442362 -0.337304
b one 0.244770 0.943875
two 0.862879 0.444040
df.groupby(['key1', 'key2']).size()  #忽略缺失值
key1  key2
a     one     2
      two     1
b     one     1
      two     1
dtype: int64
for name, group in df.groupby('key1'):
    print(name)
    print(group)  
a
  key1 key2     data1     data2
0    a  one -0.083293  0.456279
1    a  two -0.442362 -0.337304
4    a  one  0.858584  0.527193
b
  key1 key2     data1     data2
2    b  one  0.244770  0.943875
3    b  two  0.862879  0.444040
for (k1, k2), group in df.groupby(['key1', 'key2']):
    print((k1, k2))
    print(group)
('a', 'one')
  key1 key2     data1     data2
0    a  one -0.083293  0.456279
4    a  one  0.858584  0.527193
('a', 'two')
  key1 key2     data1     data2
1    a  two -0.442362 -0.337304
('b', 'one')
  key1 key2    data1     data2
2    b  one  0.24477  0.943875
('b', 'two')
  key1 key2     data1    data2
3    b  two  0.862879  0.44404
pieces = dict(list(df.groupby('key1')))
pieces['b']

key1 key2 data1 data2
2 b one 0.244770 0.943875
3 b two 0.862879 0.444040
df.dtypes
key1      object
key2      object
data1    float64
data2    float64
dtype: object
grouped = df.groupby(df.dtypes, axis=1)
for dtype, group in grouped:
    print(dtype)
    print(group)  
float64
      data1     data2
0 -0.083293  0.456279
1 -0.442362 -0.337304
2  0.244770  0.943875
3  0.862879  0.444040
4  0.858584  0.527193
object
  key1 key2
0    a  one
1    a  two
2    b  one
3    b  two
4    a  one
people = pd.DataFrame(np.random.randn(5, 5), columns=['a', 'b', 'c', 'd', 'e'], index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
people.iloc[2:3, [1, 2]] = np.nan ## Add a few NA values
people

a b c d e
Joe 0.231080 -0.440371 0.409642 -0.114867 0.328406
Steve -0.775944 -1.258328 -2.723042 0.615950 -1.263696
Wes 1.965413 NaN NaN -1.284734 0.204553
Jim -0.097869 0.182042 0.061867 -0.648661 -0.217448
Travis -0.006042 -0.612533 0.537186 0.646037 1.339316
mapping = {'a': 'red', 'b': 'red', 'c': 'blue', 'd': 'blue', 'e': 'red', 'f' : 'orange'}
by_column = people.groupby(mapping, axis=1)
by_column.sum()

blue red
Joe 0.294776 0.119115
Steve -2.107092 -3.297967
Wes -1.284734 2.169965
Jim -0.586794 -0.133276
Travis 1.183222 0.720741
map_series = pd.Series(mapping)
map_series
a       red
b       red
c      blue
d      blue
e       red
f    orange
dtype: object
people.groupby(map_series, axis=1).count()

blue red
Joe 2 3
Steve 2 3
Wes 1 2
Jim 2 3
Travis 2 3
people.groupby(len).sum()

a b c d e
3 2.098624 -0.258329 0.471509 -2.048262 0.315510
5 -0.775944 -1.258328 -2.723042 0.615950 -1.263696
6 -0.006042 -0.612533 0.537186 0.646037 1.339316
key_list = ['one', 'one', 'one', 'two', 'two']
people.groupby([len, key_list]).min()

a b c d e
3 one 0.231080 -0.440371 0.409642 -1.284734 0.204553
two -0.097869 0.182042 0.061867 -0.648661 -0.217448
5 one -0.775944 -1.258328 -2.723042 0.615950 -1.263696
6 two -0.006042 -0.612533 0.537186 0.646037 1.339316
columns = pd.MultiIndex.from_arrays([['US', 'US', 'US', 'JP', 'JP'], [1, 3, 5, 1, 3]], names=['cty', 'tenor'])
hier_df = pd.DataFrame(np.random.randn(4, 5), columns=columns)
hier_df

cty US JP
tenor 1 3 5 1 3
0 0.860698 -0.379994 0.644758 -0.231480 0.346634
1 1.237142 0.038387 0.600247 0.431467 0.137392
2 -2.211133 1.528952 0.056726 -0.629724 -0.125510
3 -1.272170 -1.088555 -1.950819 -0.253229 0.910727
hier_df.groupby(level='cty', axis=1).count()

cty JP US
0 2 3
1 2 3
2 2 3
3 2 3
df

key1 key2 data1 data2
0 a one -0.083293 0.456279
1 a two -0.442362 -0.337304
2 b one 0.244770 0.943875
3 b two 0.862879 0.444040
4 a one 0.858584 0.527193
grouped = df.groupby('key1')
grouped['data1'].quantile(0.9)  #计算该百分位的值,如果没有则插值
key1
a    0.670209
b    0.801068
Name: data1, dtype: float64
def peak_to_peak(arr): return arr.max() - arr.min()
grouped.agg(peak_to_peak)
C:\Users\HP\AppData\Local\Temp\ipykernel_20412\238647417.py:2: FutureWarning: ['key2'] did not aggregate successfully. If any error is raised this will raise in a future version of pandas. Drop these columns/ops to avoid this warning.
  grouped.agg(peak_to_peak)

data1 data2
key1
a 1.300946 0.864497
b 0.618109 0.499836
grouped.describe()

data1 data2
count mean std min 25% 50% 75% max count mean std min 25% 50% 75% max
key1
a 3.0 0.110977 0.671878 -0.442362 -0.262827 -0.083293 0.387646 0.858584 3.0 0.215390 0.479958 -0.337304 0.059488 0.456279 0.491736 0.527193
b 2.0 0.553824 0.437069 0.244770 0.399297 0.553824 0.708352 0.862879 2.0 0.693958 0.353437 0.444040 0.568999 0.693958 0.818916 0.943875
tips = pd.read_csv('examples/tips.csv')
tips['tip_pct'] = tips['tip'] / tips['total_bill']
tips[:4]
---------------------------------------------------------------------------

FileNotFoundError                         Traceback (most recent call last)

Input In [135], in <cell line: 1>()
----> 1 tips = pd.read_csv('examples/tips.csv')
      2 tips['tip_pct'] = tips['tip'] / tips['total_bill']
      3 tips[:4]


File E:\anaconda\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
    305 if len(args) > num_allow_args:
    306     warnings.warn(
    307         msg.format(arguments=arguments),
    308         FutureWarning,
    309         stacklevel=stacklevel,
    310     )
--> 311 return func(*args, **kwargs)


File E:\anaconda\lib\site-packages\pandas\io\parsers\readers.py:680, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
    665 kwds_defaults = _refine_defaults_read(
    666     dialect,
    667     delimiter,
   (...)
    676     defaults={"delimiter": ","},
    677 )
    678 kwds.update(kwds_defaults)
--> 680 return _read(filepath_or_buffer, kwds)


File E:\anaconda\lib\site-packages\pandas\io\parsers\readers.py:575, in _read(filepath_or_buffer, kwds)
    572 _validate_names(kwds.get("names", None))
    574 ## Create the parser.
--> 575 parser = TextFileReader(filepath_or_buffer, **kwds)
    577 if chunksize or iterator:
    578     return parser


File E:\anaconda\lib\site-packages\pandas\io\parsers\readers.py:933, in TextFileReader.__init__(self, f, engine, **kwds)
    930     self.options["has_index_names"] = kwds["has_index_names"]
    932 self.handles: IOHandles | None = None
--> 933 self._engine = self._make_engine(f, self.engine)


File E:\anaconda\lib\site-packages\pandas\io\parsers\readers.py:1217, in TextFileReader._make_engine(self, f, engine)
   1213     mode = "rb"
   1214 ## error: No overload variant of "get_handle" matches argument types
   1215 ## "Union[str, PathLike[str], ReadCsvBuffer[bytes], ReadCsvBuffer[str]]"
   1216 ## , "str", "bool", "Any", "Any", "Any", "Any", "Any"
-> 1217 self.handles = get_handle(  ## type: ignore[call-overload]
   1218     f,
   1219     mode,
   1220     encoding=self.options.get("encoding", None),
   1221     compression=self.options.get("compression", None),
   1222     memory_map=self.options.get("memory_map", False),
   1223     is_text=is_text,
   1224     errors=self.options.get("encoding_errors", "strict"),
   1225     storage_options=self.options.get("storage_options", None),
   1226 )
   1227 assert self.handles is not None
   1228 f = self.handles.handle


File E:\anaconda\lib\site-packages\pandas\io\common.py:789, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
    784 elif isinstance(handle, str):
    785     ## Check whether the filename is to be opened in binary mode.
    786     ## Binary mode does not support 'encoding' and 'newline'.
    787     if ioargs.encoding and "b" not in ioargs.mode:
    788         ## Encoding
--> 789         handle = open(
    790             handle,
    791             ioargs.mode,
    792             encoding=ioargs.encoding,
    793             errors=errors,
    794             newline="",
    795         )
    796     else:
    797         ## Binary mode
    798         handle = open(handle, ioargs.mode)


FileNotFoundError: [Errno 2] No such file or directory: 'examples/tips.csv'
frame = pd.DataFrame({'data1': np.random.randn(1000), 'data2': np.random.randn(1000)})
quartiles = pd.cut(frame.data1, 4)
quartiles[:10]
0     (-0.436, 1.211]
1     (-0.436, 1.211]
2      (1.211, 2.858]
3     (-0.436, 1.211]
4      (1.211, 2.858]
5    (-2.083, -0.436]
6    (-2.083, -0.436]
7      (1.211, 2.858]
8     (-0.436, 1.211]
9    (-2.083, -0.436]
Name: data1, dtype: category
Categories (4, interval[float64, right]): [(-3.737, -2.083] < (-2.083, -0.436] < (-0.436, 1.211] < (1.211, 2.858]]
def get_stats(group):
    return {'min': group.min(), 'max': group.max(), 'count': group.count(), 'mean': group.mean()}
grouped = frame.data2.groupby(quartiles)
grouped.apply(get_stats).unstack()

min max count mean
data1
(-3.737, -2.083] -1.417666 1.053207 15.0 -0.021193
(-2.083, -0.436] -2.815877 2.712397 296.0 -0.097675
(-0.436, 1.211] -2.950480 3.093977 568.0 0.006707
(1.211, 2.858] -2.621023 2.433423 121.0 -0.102975
grouping = pd.qcut(frame.data1, 10, labels=False) #平均分
grouped = frame.data2.groupby(grouping)
grouped.apply(get_stats).unstack()

min max count mean
data1
0 -1.979885 2.546603 100.0 0.015369
1 -2.815877 2.560098 100.0 -0.189069
2 -2.367227 2.290479 100.0 -0.097540
3 -2.057884 3.093977 100.0 0.000976
4 -2.314728 2.157829 100.0 0.125600
5 -2.944465 1.991280 100.0 -0.092507
6 -2.503720 2.415097 100.0 -0.045530
7 -2.950480 2.553021 100.0 0.057286
8 -2.688502 2.356049 100.0 -0.059030
9 -2.339079 2.433423 100.0 -0.094356
tips.pivot_table(index=['day', 'smoker'])  #默认计算分组平均数
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Input In [139], in <cell line: 1>()
----> 1 tips.pivot_table(index=['day', 'smoker'])


NameError: name 'tips' is not defined
from io import StringIO
data = """\
Sample  Nationality  Handedness
1   USA  Right-handed
2   Japan    Left-handed
3   USA  Right-handed
4   Japan    Right-handed
5   Japan    Left-handed
6   Japan    Right-handed
7   USA  Right-handed
8   USA  Left-handed
9   Japan    Right-handed
10  USA  Right-handed"""
data = pd.read_table(StringIO(data), sep='\s+')
pd.crosstab(data.Nationality, data.Handedness, margins=True)

Handedness Left-handed Right-handed All
Nationality
Japan 2 3 5
USA 1 4 5
All 3 7 10
pd.crosstab([tips.time, tips.day], tips.smoker, margins=True)
---------------------------------------------------------------------------

NameError                                 Traceback (most recent call last)

Input In [142], in <cell line: 1>()
----> 1 pd.crosstab([tips.time, tips.day], tips.smoker, margins=True)


NameError: name 'tips' is not defined

文章作者: J&Ocean
版权声明: 本博客所有文章除特別声明外,均采用 CC BY 4.0 许可协议。转载请注明来源 J&Ocean !
评论
  目录