_hello="helloworld"
score=0
y=20
y=True
print(_hello)
helloworld
print(score)
0
print(y)
True
变量
python是动态类型语言,不检查数据类型
可以接收其他类型的数据
a=b=c=10
print(a)
10
_hello="helloworld"
score_for_student=10.0
y=20
name1="Tom";name2="Tony"
a=b=c=10
if y>10:
print(y)
print(score_for_student)
else:
print(y*10)
print(_hello)
20
10.0
helloworld
_hello="helloworld"
score_for_student=10.0
y=20
name1="Tom";name2="Tony"
a=b=c=10
if y>10:
print(y)
print(score_for_student)
else:
print(y*10)
print(_hello)
20
10.0
helloworld
import module1
from module1 import z
y=20
print(y)
print(module1.y)
print(z)
20
True
10.0
import module1
from module1 import z
y=20
print(y)
print(module1.y)
print(z)
20
True
10.0
import com.pkg2.hello as module1
from com.pkg2.hello import z as x
print(x)
y=20
print(y)
print(module1.y)
print(z)
10.1
20
True
10.0
编码规范
命名规范
- 包名: 全部小写字母,中间可以由的隔开,不推荐使用下画线。作为命名空间,包名野窍应该具有唯一性,推荐采用公司或组织域名的倒置,如com.apple . quicktime . v2 。
- 模块名: 全部小写字母,如果是多个单词构成, 可以用下画线隔开, 如dummy_threading 。
- 类名: 采用大驼峰法命名③,如SplitViewController 。
- 异常名:异常属于类, 命名同类命名,但应该使用Error 作为后缀。如FileNotFoundError 。
- 变量名: 全部小写字母,如果由多个单词构成,可以用下画线隔开。如果变量应用于模块或函数内部,则变量名可以由单下画线开头: 变量类内部私有使用变量名可以双下画线开头。不要命名双下画线开头和结尾的变量,这是Python 保留的。另外,避免使用小写L 、大写0 和大写I 作为变量名。
- 函数名和方法名: 命名同变量命名,如balance_account 、push_cm exit 。
- 常量名: 全部大写字母,如果是由多个单词构成,可以用下画线隔开,如YEAR 和WEEK OF MONTH 。
注释规范
单行注释、多行注释和文档注释
文件注释
文件注释就是在每一个文件开头添加注释,采用多行注释。文件注释通常包括如下信息:版权信息、文件名、所在模块、作者信息、历史版本信息、文件内容和作用等。
#
#版权所有2015 北京智捷东方科技有限公司
#许可信息查看LICENSE . txt 文件
#描述:
## 实现日期基本功能
#历史版本:
## 2015 7 22 :创建关东升
## 2015 - 8 - 20 : 添加socket 库
## 2015 - 8 - 22 :添加math 库
#
上述注释只是提供了版权信息、文件内容和历史版本信息等,文件注释要根据实际情况包
括内容。
文档注释
代码注释
使用todo注释
导入规范
导入语句应该按照从通用到特殊的顺序分组, 顺序是: 标准库→ 第三方库→ 自己模块。每一组之间有一个空行,而且组中模块是按照英文字母顺序排序的。
import io
import os
import pkgutil
import platform
import re
import sys
import time
from html import unescape
from com.pkgl import example
代码规范
空行
- import 语句块前后保留两个空行
- 函数声明之前保留两个空行
- 类声明之前保留两个空行
- 方法声明之前保留一个空行
- 两个逻辑代码块之间应该保留一个空行
空格
- 赋值符号“=”前后各有一个空格
- 所有的二元运算符都应该使用空格与操作数分开
- 一元运算符:算法运算符取反“”和运算符取反“ ~ ”
- 括号内不要有空格, Python 中括号包括小括号“()飞中括号“ []”和大括号“{}”
- 不要在逗号、分号、冒号前面有空格,而是要在它们后面有一个空格,除非该符号已经是行尾了
- 参数列表、索引或切片的左括号前不应有空格
缩进
4 个空格常被作为缩进排版的一个级别。虽然在开发时程序员可以使用制表符进行缩进,而默认情况下一个制表符等于8 个空格,但是不同的IDE 工具中一个制表符与空格对应个数会有不同,所以不要使用制表符缩进。
断行
一行代码中最多79 个字符, 对于文档注释和多行注释时一行最多72 个字符,但是如果注释中包含URL 地址可以不受这个限制。否则,如果超过则需断行,可以依据下面的一般规范断开。
- 在逗号后面断开
- 在运算符前面断开
- 尽量不要使用续行符“ \ ” , 当有括号(包括大括号、中括号和小括号) 则在括号中断开, 这样可以不使用续行符
数据类型
数字类型
整数类型
28
28
0b11100
28
0o34
28
0x1c
28
浮点类型
1.0
1.0
0.0
0.0
3.36e2
336.0
1.56e-2
0.0156
复数类型
1+2j
(1+2j)
(1+2j)+(1+2j)
(2+4j)
布尔类型
bool(0)
False
bool(2)
True
bool(1)
True
bool('')
False
bool(' ')
True
bool([])
False
bool({})
False
数字类型相互转换
隐式类型转换
a=1+True
print(a)
2
a=1.0+1
type(a)
float
print(a)
2.0
a=1.0+True
print(a)
2.0
a=1.0+1+True
print(a)
3.0
a=1.0+1+False
print(a)
2.0
显式类型转换
int(False)
0
int(True)
1
int(19.6)
19
float(5)
5.0
float(False)
0.0
float(True)
1.0
字符串类型
字符串表示方式
s = 'Hello World'
print(s)
Hello World
s="Hello World"
print(s)
Hello World
s='\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064'
print(s)
Hello World
s="\u0048\u0065\u006c\u006c\u006f\u0020\u0057\u006f\u0072\u006c\u0064"
print(s)
Hello World
s='Hello\n World'
print(s)
Hello
World
s='Hello\t World'
print(s)
Hello World
s='Hello \'World'
print(s)
Hello 'World
s="hello'world"
print(s)
hello'world
s='hello"world'
print(s)
hello"world
s='hello\\world'
print(s)
hello\world
s='hello\u005c world'
print(s)
hello\ world
s='hello\tworld'
print(s)
hello world
s=r'hello\tworld'
print(s)
hello\tworld
s='''hello
world'''
print(s)
hello
world
s='''hello
\tworld'''
print(s)
hello
world
字符串格式化
name='Mary'
age=18
s='她的年龄是{0}岁。'.format(age)
print(s)
她的年龄是18岁。
s='{0}芳龄是{1}岁'.format(name,age)
print(s)
Mary芳龄是18岁
s='{1}芳龄是{0}岁'.format(age,name)
print(s)
Mary芳龄是18岁
s='{n}芳龄是{a}岁'.format(n=name,a=age)
print(s)
Mary芳龄是18岁
name='Mary'
age=18
money=1234.5678
"{0}芳龄是{1:d}岁。".format(name,age)
'Mary芳龄是18岁。'
"{1}芳龄是{0:5d}岁。".format(age,name)
'Mary芳龄是 18岁。'
"{0}今天收入是{1:f}元".format(name,money)
'Mary今天收入是1234.567800元'
"{0}今天收入是{1:.2f}".format(name,money)
'Mary今天收入是1234.57'
"{0}今天收入是{1:10.2f}".format(name,money)
'Mary今天收入是 1234.57'
"{0}今天收入是{1:g}".format(name,money)
'Mary今天收入是1234.57'
"{0}今天收入是{1:G}".format(name,money)
'Mary今天收入是1234.57'
"{0}今天收入是{1:e}".format(name,money)
'Mary今天收入是1.234568e+03'
"{0}今天收入是{1:E}".format(name,money)
'Mary今天收入是1.234568E+03'
字符串查找
source_str="there is a string accessing example"
len(source_str)
35
source_str[16]
'g'
source_str.find('r')
3
source_str.rfind('r')
13
source_str.find('ing')
14
source_str.rfind('ing')
24
source_str.find('e',15)
21
source_str.find('ing',5)
14
source_str.rfind('ing',5)
24
source_str.find('ing',18,28)
24
source_str.find('ingg',5)
-1
字符串与数字相互转换
int('9')
9
int('9.6')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [24], in <cell line: 1>()
----> 1 int('9.6')
ValueError: invalid literal for int() with base 10: '9.6'
float('9.6')
9.6
int('AB')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [26], in <cell line: 1>()
----> 1 int('AB')
ValueError: invalid literal for int() with base 10: 'AB'
str(3.24)
'3.24'
str(True)
'True'
str([])
'[]'
str([1,2,3])
'[1, 2, 3]'
str(34)
'34'
'{0:2f}'.format(3.24)
'3.240000'
'{:.1f}'.format(3.24)
'3.2'
'{:10.1f}'.format(3.24)
' 3.2'
运算符
算数运算符
一元运算符
a=12
-a
-12
二元运算符
1+2
3
2-1
1
2*3
6
3/2
1.5
3%2
1
3//2
1
-3//2
-2
10**2
100
10.22+10
20.22
10.0+True+2
13.0
'hello'+'world'
'helloworld'
'hello'+2
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [47], in <cell line: 1>()
----> 1 'hello'+2
TypeError: can only concatenate str (not "int") to str
'hello'*2
'hellohello'
关系运算符
a=1
b=2
a>b
False
a<b
True
a>=b
False
a<=b
True
1.0!=1
False
a='hello'
b='hello'
a==b
True
a='World'
a>b
False
a<b
True
a=[]
b=[1,2]
a==b
False
a<b
True
a=[1,2]
a==b
True
逻辑运算符
i=0
a=10
b=9
if a>b or i==1:
print("或运算为真")
else:
print("或运算为假")
if a<b and i==1:
print("与运算为真")
else:
print("与运算为假")
def f1():
return a>b
def f2():
print('--f2--')
return a==b
print(f1() or f2())
或运算为真
与运算为假
True
位运算符
a=0b10110010
b=0b01011110
print("a|b={0}".format(a|b))
print("a&b={0}".format(a&b))
print("a^b={0}".format(a^b))
print("~a={0}".format(~a))
print("a>>2={0}".format(a>>2))
print("a<<2={0}".format(a<<2))
c=-0b1100
print("c>>2={0}".format(c>>2))
print("c<<2={0}".format(c<<2))
a|b=254
a&b=18
a^b=236
~a=-179
a>>2=44
a<<2=712
c>>2=-3
c<<2=-48
赋值运算符
a=1
b=2
a+=b
print(a)
a+=b+3
print(a)
a-=b
print(a)
a*=b
print(a)
a/=b
print(a)
a%=b
print(a)
a=0b10110010
b=0b01011110
a|=b
print(a)
a^=b
print(a)
3
8
6
12
6.0
0.0
254
160
其他运算符
同一性测试运算符
成员测试运算符
class Person:
def __init__(self,name,age):
self.name=name
self.age=age
p1=Person('Tony',18)
p2=Person('Tony',18)
print(p1==p2)
print(p1 is p2)
print(p1!=p2)
print(p1 is not p2)
False
False
True
True
class Person:
def __init__(self,name,age):
self.name=name
self.age=age
def __eq__(self,other):
if self.name==other.name and self.age==other.age:
return True
else:
return False
p1=Person('Tony',18)
p2=Person('Tony',18)
print(p1==p2)
print(p1 is p2)
print(p1!=p2)
print(p1 is not p2)
True
False
False
True
string_a='hello'
print('e' in string_a)
print('ell' not in string_a)
list_a=[1,2]
print(2 in list_a)
print(1 not in list_a)
True
False
True
False
控制语句
分支语句
if结构
score=5
if score>=85:
print('perfect')
if score<60:
print('hard')
if score>=60 and score<85:
print('justsoso')
hard
if-else结构
score=75
if score>=60:
print('justsoso')
if score>=90:
print('perfect')
else:
print("不及格")
justsoso
elif结构
score=80
if score>=90:
grade='A'
elif score>=80:
grade='B'
elif score>=70:
grade='C'
elif score>=60:
grade='D'
else:
grade='F'
print(grade)
B
条件表达式
score=85
result='justsoso' if score>=60 else 'hard'
print(result)
justsoso
循环语句
while语句
i=0
while i*i<100_000:
i+=1
print(i)
print(i*i)
317
100489
for语句
print('----范围----')
for num in range(1,10):
print("{0}*{0}={1}".format(num,num*num))
print('----字符串----')
for item in "hello":
print(item)
print('----整数列表----')
numbers=[43,32,53,54,75,7,10]
for item in numbers:
print(item)
----范围----
1*1=1
2*2=4
3*3=9
4*4=16
5*5=25
6*6=36
7*7=49
8*8=64
9*9=81
----字符串----
h
e
l
l
o
----整数列表----
43
32
53
54
75
7
10
跳转语句
break语句
for item in range(10):
if item==3:
break
print(item)
0
1
2
continue语句
for item in range(10):
if item==3:
continue
print(item)
0
1
2
4
5
6
7
8
9
while和for中的else语句
i=0
while i*i<10:
i+=1
print("{0}*{0}={1}".format(num,num*num))
else:
print("whileover")
print('----------')
for item in range(10):
if item==3:
break
print(item)
else:
print('forover')
9*9=81
9*9=81
9*9=81
9*9=81
whileover
----------
0
1
2
使用范围
range()函数语法:
$$
range([start,]stop[,step])
$$
for item in range(1,10,2):
print(item)
print('------------')
for item in range(1,-10,-3):
print(item)
1
3
5
7
9
------------
1
-2
-5
-8
数据结构
元组
序列
索引操作
a='hello'
a[0]
'h'
a[1]
'e'
a[2]
'l'
a[3]
'l'
a[4]
'o'
a[5]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Input In [7], in <cell line: 1>()
----> 1 a[5]
IndexError: string index out of range
max(a)
'o'
min(a)
'e'
len(a)
5
序列的+和*
a*3
'hellohellohello'
print(a)
hello
a+=' '
a+='world'
print(a)
hello world
序列分片
- [start:end]:start是开始索引,end是结束索引
- [start:end:step]:start是开始索引,end是结束索引,step是步长,可取正负整数
实际切下分片为:[start,end)
a[1:3]
'el'
a[:3]
'hel'
a[0:3]
'hel'
a[0:]
'hello world'
a[0:5]
'hello'
a[:]
'hello world'
a[1:-1]
'ello worl'
a[1:5]
'ello'
a[1:5:2]
'el'
创建元组
21,32,43,45
Input In [26]
21,32,43,45
^
SyntaxError: invalid character ',' (U+FF0C)
21,32,43,45
(21, 32, 43, 45)
(21,32,43,45)
(21, 32, 43, 45)
print(a)
hello world
a=(21,32,43,45)
print(a)
(21, 32, 43, 45)
('hello','world')
('hello', 'world')
('hello','world',1,2,3)
('hello', 'world', 1, 2, 3)
tuple([21,32,43,45])
(21, 32, 43, 45)
a=(21)
type(a)
int
a=(21,)
type(a)
tuple
a=()
type(a)
tuple
访问元组
a=('hello','world',1,2,3)
a[1]
'world'
a[1:3]
('world', 1)
a[2:]
(1, 2, 3)
a[:2]
('hello', 'world')
str1,str2,n1,n2,n3=a
str1
'hello'
str2
'world'
n1
1
n2
2
n3
3
str1,str2,*n=a
str1
'hello'
str2
'world'
n
[1, 2, 3]
str1,_,n1,n2,_=a
str1
'hello'
n1
1
n2
2
遍历元组
a=(21,32,43,45)
for item in a:
print(item)
print('---------------------')
for i,item in enumerate(a):
print('{0}-{1}'.format(i,item))
21
32
43
45
---------------------
0-21
1-32
2-43
3-45
列表
列表创建
[20,10,50,40,30]
[20, 10, 50, 40, 30]
[]
[]
['hello','world',1,2,3]
['hello', 'world', 1, 2, 3]
a=[10]
type(a)
list
a=[10,]
type(a)
list
list((20,10,50,40,30))
[20, 10, 50, 40, 30]
追加元素
list.append(x)
list.extend(t)
student_list=['张三','李四','王五']
student_list.append('董六')
student_list
['张三', '李四', '王五', '董六']
student_list+=['刘备','关羽']
student_list
['张三', '李四', '王五', '董六', '刘备', '关羽']
student_list.extend(['张飞','赵云'])
student_list
['张三', '李四', '王五', '董六', '刘备', '关羽', '张飞', '赵云']
插入元素
list.insert(i,x)
student_list=['zhangsan','lisi','wangwu']
student_list.insert(2,'liubei')
student_list
['zhangsan', 'lisi', 'liubei', 'wangwu']
替换元素
student_list=['zhangsan','lisi','wangwu']
student_list[0]='zhugeliang'
student_list
['zhugeliang', 'lisi', 'wangwu']
删除元素
remove()方法
如果找到多个,只会删除第一个
student_list=['zhangsan','lisi','wangwu','wangwu']
student_list.remove('wangwu')
student_list
['zhangsan', 'lisi', 'wangwu']
student_list.remove('wangwu')
student_list
['zhangsan', 'lisi']
pop()方法
item=list.pop([i])
i是指定删除元素的索引
student_list=['zhangsan','lisi','wangwu']
student_list.pop()
'wangwu'
student_list
['zhangsan', 'lisi']
student_list.pop(0)
'zhangsan'
student_list
['lisi']
其他常用办法
- reverse():倒置列表
- copy():复制列表
- clear():清楚列表中的所有元素
- index(x[,i[,j]]):返回x第一次出现的索引,i为开始查找索引,j是结束查找索引,继承序列
- count(x):返回x出现的次数,方法继承序列
a=[21,32,43,45]
a.reverse()
a
[45, 43, 32, 21]
b=a.copy()
b
[45, 43, 32, 21]
a.clear()
a
[]
b
[45, 43, 32, 21]
a=[45,43,32,21,32]
a.count(32)
2
student_list=['zhangsan','lisi','wangwu']
student_list.index('wangwu')
2
student_tuple=('zhangsan','lisi','wangwu')
student_tuple.index('wangwu')
2
student_tuple.index('lisi',1,2)
1
列表推导式
n_list=[]
for x in range(10):
if x%2==0:
n_list.append(x**2)
print(n_list)
[0, 4, 16, 36, 64]
n_list=[x**2 for x in range(10) if x%2==0]
n_list
[0, 4, 16, 36, 64]
n_list=[x for x in range(100) if x%2==0 if x%5==0]
n_list
[0, 10, 20, 30, 40, 50, 60, 70, 80, 90]
集合
创建可变集合
a={'zhangsan','lisi','wangwu'}
a
{'lisi', 'wangwu', 'zhangsan'}
a={'zhangsan','lisi','wangwu','wangwu'}
len(a)
3
a
{'lisi', 'wangwu', 'zhangsan'}
set((20,10,50,40,30))
{10, 20, 30, 40, 50}
b={}
type(b)
dict
b=set()
type(b)
set
修改可变集合
- add(elem):添加元素,已存在不能添加
- remove(elem):删除元素,不存在则抛出错误
- discard(elem):删除元素,不存在不抛出
- pop():删除返回集合中任意元素,返回值是删除的元素
- clear():清楚集合
student_set={'zhangsan','lisi','wangwu'}
student_set.add('dongliu')
student_set
{'dongliu', 'lisi', 'wangwu', 'zhangsan'}
student_set.remove('lisi')
student_set
{'dongliu', 'wangwu', 'zhangsan'}
student_set.remove('lisi')
---------------------------------------------------------------------------
KeyError Traceback (most recent call last)
Input In [129], in <cell line: 1>()
----> 1 student_set.remove('lisi')
KeyError: 'lisi'
student_set.discard('lisi')
student_set
{'dongliu', 'wangwu', 'zhangsan'}
student_set.discard('wangwu')
student_set
{'dongliu', 'zhangsan'}
student_set.pop()
'dongliu'
student_set
{'zhangsan'}
student_set.clear()
student_set
set()
遍历集合
student_set={'zhangsan','lisi','wangwu'}
for item in student_set:
print(item)
print('----------')
for i,item in enumerate(student_set):
print('{0}-{1}'.format(i,item))
lisi
wangwu
zhangsan
----------
0-lisi
1-wangwu
2-zhangsan
不可变集合
student_set=frozenset({'zhangsan','lisi','wangwu'})
student_set
frozenset({'lisi', 'wangwu', 'zhangsan'})
type(student_set)
frozenset
student_set.add('dongliu')
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [142], in <cell line: 1>()
----> 1 student_set.add('dongliu')
AttributeError: 'frozenset' object has no attribute 'add'
a=(21,32,43,45)
seta=frozenset(a)
seta
frozenset({21, 32, 43, 45})
集合推导式
n_list={x for x in range(100) if x%2==0 if x%5==0}
print(n_list)
{0, 70, 40, 10, 80, 50, 20, 90, 60, 30}
input_list=[2,3,2,4,5,6,6,6]
n_set=[x**2 for x in input_list]
n_set
[4, 9, 4, 16, 25, 36, 36, 36]
n_list={x**2 for x in input_list}
n_list
{4, 9, 16, 25, 36}
字典
创建字典
dict1={102:'zhangsan',105:'lisi',109:'wangwu'}
len(dict1)
3
dict1
{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}
type(dict1)
dict
dict1={}
dict1
{}
dict({102:'zhangsan',105:'lisi',109:'wangwu'})
{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}
dict(((102,'zhangsan'),(105,'lisi'),(109,'wangwu')))
{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}
dict([(102,'zhangsan'),(105,'lisi'),(109,'wangwu')])
{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}
t1=(102,'zhangsan')
t2=(105,'lisi')
t3=(109,'wangwu')
t=(t1,t2,t3)
dict(t)
{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}
list1=[t1,t2,t3]
dict(list1)
{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}
dict(zip([102,105,109],['zhangsan','lisi','wangwu']))
{102: 'zhangsan', 105: 'lisi', 109: 'wangwu'}
访问字典
- get(key[,default]):通过键返回值,如果键不存在返回默认值
- items():返回字典的所有键值对
- keys():返回字典键视图
- values():返回字典值视图
dict1={102:'zhangsan',105:'lisi',109:'wangwu'}
dict1.get(105)
'lisi'
dict1.get(101)
dict1.get(101,'dongliu')
'dongliu'
dict1.items()
dict_items([(102, 'zhangsan'), (105, 'lisi'), (109, 'wangwu')])
dict1.keys()
dict_keys([102, 105, 109])
dict1.values()
dict_values(['zhangsan', 'lisi', 'wangwu'])
student_dict={102:'zhangsan',105:'lisi',109:'wangwu'}
102 in student_dict
True
'lisi' in student_dict
False
print('---bianlijian---')
for student_id in student_dict.keys():
print('xuehao:'+str(student_id))
print('---bianlizhi---')
for student_name in student_dict.values():
print('xuesheng:'+student_name)
print('---bianlijian:zhi---')
for student_id,student_name in student_dict.items():
print('xuehao:{0}-xuesheng:{1}'.format(student_id,student_name))
---bianlijian---
xuehao:102
xuehao:105
xuehao:109
---bianlizhi---
xuesheng:zhangsan
xuesheng:lisi
xuesheng:wangwu
---bianlijian:zhi---
xuehao:102-xuesheng:zhangsan
xuehao:105-xuesheng:lisi
xuehao:109-xuesheng:wangwu
字典推导式
input_dict={'one':1,'two':2,'three':3,'four':4}
output_dict={k:v for k,v in input_dict.items() if v%2==0}
output_dict
{'two': 2, 'four': 4}
keys=[k for k,v in input_dict.items() if v%2==0]
keys
['two', 'four']
函数式编程
定义函数
def ---:
---
return ---
def rectangle_area(width,height):
area=width*height
return area
r_area=rectangle_area(320,420)
print("320*420的矩形面积{0}".format(r_area))
320*420的矩形面积134400
函数参数
使用关键字参数调用函数
def print_area(width,height):
area=width*height
print("{0}*{1}矩形的面积是:{2}".format(width,height,area))
print_area(320,420)
print_area(width=320,height=420)
print_area(320,height=420)
print(height=420,width=320)
320*420矩形的面积是:134400
320*420矩形的面积是:134400
320*420矩形的面积是:134400
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [4], in <cell line: 8>()
6 print_area(width=320,height=420)
7 print_area(320,height=420)
----> 8 print(height=420,width=320)
TypeError: 'height' is an invalid keyword argument for print()
参数默认值
def make_coffee(name="Cappuccino"):
return "制作一杯{0}".format(name)
coffee1=make_coffee("Latte")
coffee2=make_coffee()
print(coffee1)
print(coffee2)
制作一杯Latte
制作一杯Cappuccino
可变参数
*可变参数
def sum(*numbers,multiple=1):
total=0
for number in numbers:
total+=number
return total*multiple
print(sum(100.0,20.0,30.0))
print(sum(80,30))
print(sum(30,80,multiple=2))
double_tuple={50.0,60.0,0.0}
print(sum(30,80,*double_tuple))
150.0
110
220
220.0
**可变参数
def show(sep=':', **info):
print('----info----')
for key, value in info.items():
print('{0} {2} {1}'.format(key, value, sep))
show('->', name='tony', age=18, sex = True)
show(student_name='tony',student_no='1000',sep='=')
stu_dict={'name':'tony','age':18}
show(**stu_dict,sex=True,sep='=')
----info----
name -> tony
age -> 18
sex -> True
----info----
student_name = tony
student_no = 1000
----info----
name = tony
age = 18
sex = True
函数返回值
无返回值函数
def show(sep=':', **info):
print('----info----')
for key, value in info.items():
print('{0} {2} {1}'.format(key, value, sep))
return
result=show('->', name='tony', age=18, sex = True)
print(result)
def sum(*numbers,multiple=1):
total=0
for number in numbers:
total+=number
return total*multiple
print(sum(100.0,20.0,30.0))
print(sum(80,30))
----info----
name -> tony
age -> 18
sex -> True
None
150.0
110
多返回值函数
def position(dt,speed):
posx=speed[0]*dt
posy=speed[1]*dt
return(posx,posy)
move=position(60,(10,-5))
print("物体位移:({0},{1})".format(move[0],move[1]))
物体位移:(600,-300)
函数变量作用域
x=20
def print_value():
print("函数中x={0}".format(x))
print_value()
print("全局变量={0}".format(x))
函数中x=20
全局变量=20
x=20
def print_value():
x=10
print("函数中x={0}".format(x))
print_value()
print("全局变量={0}".format(x))
函数中x=10
全局变量=20
x=20
def print_value():
global x
x=10
print("函数中x={0}".format(x))
print_value()
print("全局变量={0}".format(x))
函数中x=10
全局变量=10
生成器
def square(num):
n_list=[]
for i in range(1,num+1):
n_list.append(i*i)
return n_list
for i in square(5):
print(i,end=' ')
1 4 9 16 25
def square(num):
n_list=[]
for i in range(1,num+1):
yield i*i
return n_list
for i in square(5):
print(i,end=' ')
1 4 9 16 25
def square(num):
for i in range(1,num+1):
yield i*i
n_seq=square(5)
n_seq.__next__()
1
n_seq.__next__()
4
n_seq.__next__()
9
n_seq.__next__()
16
n_seq.__next__()
25
n_seq.__next__()
---------------------------------------------------------------------------
StopIteration Traceback (most recent call last)
Input In [14], in <cell line: 1>()
----> 1 n_seq.__next__()
StopIteration:
嵌套函数
def calculate(n1,n2,opr):
multiple=2
def add(a,b):
return (a+b)*multiple
def sub(a,b):
return (a-b)*multiple
if opr=='+':
return add(n1,n2)
else:
return sub(n1,n2)
print(calculate(10,5,'+'))
30
函数式编程基础
函数类型
def calculate_fun(opr):
def add(a,b):
return a+b
def sub(a,b):
return a-b
if opr=='+':
return add
else:
return sub
f1=calculate_fun('+')
f2=calculate_fun('-')
print(type(f1))
print('10+5={0}'.format(f1(10,5)))
print('10-5={0}'.format(f2(10,5)))
<class 'function'>
10+5=15
10-5=5
Lamda表达式
def calculate_fun(opr):
if opr=='+':
return lambda a,b:(a+b)
else:
return lambda a,b:(a-b)
f1=calculate_fun('+')
f2=calculate_fun('-')
print(type(f1))
print('10+5={0}'.format(f1(10,5)))
print('10-5={0}'.format(f2(10,5)))
<class 'function'>
10+5=15
10-5=5
三大基础函数
filter()
users=['tony','tom','ben','alex']
users_filter=filter(lambda u:u.startswith('t'),users)
print(list(users_filter))
['tony', 'tom']
number_list=range(1,11)
number_filter=filter(lambda it:it%2==0,number_list)
print(list(number_filter))
[2, 4, 6, 8, 10]
map()
users=['tony','tom','ben','alex']
users_map=map(lambda u:u.lower(),users)
print(list(users_map))
['tony', 'tom', 'ben', 'alex']
users=['tony','tom','ben','alex']
users_filter=filter(lambda u:u.startswith('t'),users)
users_map=map(lambda u:u.lower(),filter(lambda u:u.startswith('t'),users))
print(list(users_map))
['tony', 'tom']
from functools import reduce
a={1,2,3,4}
a_reduce=reduce(lambda acc,i:acc+i,a)
print(a_reduce)
10
面向对象编程
面向对象概述oop
面向对象三个基本特性
封装性
继承性
多态性
类和对象
定义类
class 类名[(父类)]:
类体
class Animal(object):
pass
创建和使用对象
animal=Animal()
print(animal)
<__main__.Animal object at 0x00000222D7FA4160>
实例变量
class Animal(object):
def __init__(self,age,sex,weight):
self.age=age
self.sex=sex
self.weight=weight
animal=Animal(2,1,10.0)
print('age:{0}'.format(animal.age))
print('sex:{0}'.format('female' if animal.sex==0 else 'male'))
print('weight:{0}'.format(animal.weight))
age:2
sex:male
weight:10.0
类变量
class Account:
interest_rate=0.0668
def __init__(self,owner,amount):
self.owner=owner
self.amount=amount
account=Account('tony',1_800_000.0)
print('account:{0}'.format(account.owner))
print('amount:{0}'.format(account.amount))
print('interest_rate:{0}'.format(account.interest_rate))
account:tony
amount:1800000.0
interest_rate:0.0668
构造方法
class Animal(object):
def __init__(self,age,sex=1,weight=0.0):
self.age=age
self.sex=sex
self.weight=weight
a1=Animal(2,1,10.0)
a2=Animal(1,weight=5.0)
a3=Animal(1,sex=0)
print('age:{0}'.format(a1.age))
print('sex:{0}'.format('female' if a3.sex==0 else 'male'))
print('weight:{0}'.format(a2.weight))
age:2
sex:female
weight:5.0
实例方法
class Animal(object):
def __init__(self,age,sex=1,weight=0.0):
self.age=age
self.sex=sex
self.weight=weight
def eat(self):
self.weight+=0.05
print('eat')
def run(self):
self.weight-=0.01
print('run')
a1=Animal(2,0,10.0)
print(a1.weight)
a1.eat()
print(a1.weight)
a1.run()
print(a1.weight)
10.0
eat
10.05
run
10.040000000000001
类方法
class Account:
interest_rate=0.0668
def __init__(self,owner,amount):
self.owner=owner
self.amount=amount
@classmethod
def interest_by(cls,amt):
return cls.interest_rate*amt
interest=Account.interest_by(12000.0)
print(interest)
801.6
静态方法
class Account:
interest_rate=0.0668
def __init__(self,owner,amount):
self.owner=owner
self.amount=amount
@classmethod
def interest_by(cls,amt):
return cls.interest_rate*amt
@staticmethod
def interest_with(amt):
return Account.interest_by(amt)
interest1=Account.interest_by(12000.0)
print(interest1)
interest2=Account.interest_with(12000.0)
print(interest2)
801.6
801.6
封装性
私有变量
class Animal(object):
def __init__(self,age,sex=1,weight=0.0):
self.age=age
self.sex=sex
self.__weight=weight
def eat(self):
self.weight+=0.05
print('eat')
def run(self):
self.weight-=0.01
print('run')
a1=Animal(2,0,10.0)
print(a1.weight)
a1.eat()
a1.run()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [4], in <cell line: 14>()
11 print('run')
13 a1=Animal(2,0,10.0)
---> 14 print(a1.weight)
15 a1.eat()
16 a1.run()
AttributeError: 'Animal' object has no attribute 'weight'
私有方法
class Animal(object):
def __init__(self,age,sex=1,weight=0.0):
self.age=age
self.sex=sex
self.__weight=weight
def eat(self):
self.__weight+=0.05
print('eat')
def __run(self):
self.__weight-=0.01
print('run')
a1=Animal(2,0,10.0)
a1.eat()
a1.run()
eat
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [5], in <cell line: 15>()
13 a1=Animal(2,0,10.0)
14 a1.eat()
---> 15 a1.run()
AttributeError: 'Animal' object has no attribute 'run'
定义属性
class Animal(object):
def __init__(self,age,sex=1,weight=0.0):
self.age=age
self.sex=sex
self.__weight=weight
def set_weight(self,weight):
self.__weight=weight
def get_weight(self):
return self.__weight
a1=Animal(2,0,10.0)
print(a1.get_weight)
a1.set_weight(123.45)
print(a1.get_weight)
<bound method Animal.get_weight of <__main__.Animal object at 0x000002310CC31CA0>>
<bound method Animal.get_weight of <__main__.Animal object at 0x000002310CC31CA0>>
class Animal(object):
def __init__(self,age,sex=1,weight=0.0):
self.age=age
self.sex=sex
self.__weight=weight
@property
def weight(self):
return self.__weight
@weight.setter
def weight(self,weight):
self.__weight=weight
a1=Animal(2,0,10.0)
print(a1.weight)
a1.weight=123.45
print(a1.weight)
10.0
123.45
继承性
继承概念
class Person:
def __init__(self,name,age):
self.name=name
self.age=age
def info(self):
template='Person[name={0},age={1}]'
s=template.format(self.name,self.age)
return s
class Student(Person):
def __init__(self,name,age,school):
super().__init__(name,age)
self.school=school
重写方法
class Animal(object):
def __init__(self,age,sex=1,weight=0.0):
self.age=age
self.sex=sex
self.weight=weight
def eat(self):
self.weight+=0.05
print('eat')
class Dog(Animal):
def eat(self):
self.weight+=0.1
print('gougouchi...')
a1=Dog(2,0,10.0)
a1.eat()
gougouchi...
多继承
class ParentClass1:
def run(self):
print('ParentClass1 run...')
class ParentClass2:
def run(self):
print('ParentClass2 run...')
class SubClass1(ParentClass1,ParentClass2):
pass
class SubClass2(ParentClass2,ParentClass1):
pass
class SubClass3(ParentClass1,ParentClass2):
def run(self):
print('SubClass3 run...')
sub1=SubClass1()
sub1.run()
sub2=SubClass2()
sub2.run()
sub3=SubClass3()
sub3.run()
ParentClass1 run...
ParentClass2 run...
SubClass3 run...
多态性
多态概念
class Figure:
def draw(self):
print('draw figure...')
class Ellipse(Figure):
def draw(self):
print('draw Ellipse')
class Triangle(Figure):
def draw(self):
print('draw Triangle')
f1=Figure()
f1.draw()
f2=Ellipse()
f2.draw()
f3=Triangle()
f3.draw()
draw figure...
draw Ellipse
draw Triangle
类型检查
class Figure:
def draw(self):
print('draw figure...')
class Ellipse(Figure):
def draw(self):
print('draw Ellipse')
class Triangle(Figure):
def draw(self):
print('draw Triangle')
f1=Figure()
f1.draw()
f2=Ellipse()
f2.draw()
f3=Triangle()
f3.draw()
print(isinstance(f1,Triangle))
print(isinstance(f2,Triangle))
print(isinstance(f3,Triangle))
print(isinstance(f2,Figure))
draw figure...
draw Ellipse
draw Triangle
False
False
True
True
鸭子类型
class Animal(object):
def run(self):
print('animal run')
class Dog(Animal):
def run(self):
print('dog run')
class Car(object):
def run(self):
print('car run')
def go(animal):
animal.run()
go(Animal())
go(Dog())
go(Car())
animal run
dog run
car run
Python根类——object
两个重要方法
str()方法
class Person:
def __init__(self,name,age):
self.name=name
self.age=age
def __str__(self):
template='Person [name={0},age={1}]'
s=template.format(self.name,self.age)
return s
person=Person('Tony',18)
print(person)
Person [name=Tony,age=18]
对象比较方法
class Person:
def __init__(self,name,age):
self.name=name
self.age=age
def __str__(self):
template='Person [name={0},age={1}]'
s=template.format(self.name,self.age)
return s
def __eq__(self,other):
if self.name==other.name and self.age==other.age:
return True
else:
return False
p1=Person('Tony',18)
p2=Person('Tony',18)
print(p1==p2)
True
枚举类
定义枚举类
import enum
class WeekDays(enum.Enum):
MONDAY=1
TUESDAY=2
WEDNESDAY=3
THURSDAY=4
FRIDAY=5
day=WeekDays.FRIDAY
print(day)
print(day.value)
print(day.name)
WeekDays.FRIDAY
5
FRIDAY
限制枚举类
import enum
@enum.unique
class WeekDays(enum.IntEnum):
MONDAY=1
TUESDAY=2
WEDNESDAY=3
THURSDAY=4
FRIDAY=5
day=WeekDays.FRIDAY
print(day)
print(day.value)
print(day.name)
WeekDays.FRIDAY
5
FRIDAY
使用枚举类
import enum
@enum.unique
class WeekDays(enum.IntEnum):
MONDAY=1
TUESDAY=2
WEDNESDAY=3
THURSDAY=4
FRIDAY=5
day=WeekDays.FRIDAY
if day==WeekDays.MONDAY:
print('work')
elif day==WeekDays.FRIDAY:
print('study')
study
异常处理
常见异常
AttributeError异常
class Animal(object):
pass
al=Animal()
al.run()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [3], in <cell line: 1>()
----> 1 al.run()
AttributeError: 'Animal' object has no attribute 'run'
print(al.age)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [4], in <cell line: 1>()
----> 1 print(al.age)
AttributeError: 'Animal' object has no attribute 'age'
print(Animal.weight)
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [5], in <cell line: 1>()
----> 1 print(Animal.weight)
AttributeError: type object 'Animal' has no attribute 'weight'
OSError异常
f=open('abc.txt')
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Input In [6], in <cell line: 1>()
----> 1 f=open('abc.txt')
FileNotFoundError: [Errno 2] No such file or directory: 'abc.txt'
IndexError异常
code_list=[125,56,89,36]
code_list[4]
---------------------------------------------------------------------------
IndexError Traceback (most recent call last)
Input In [7], in <cell line: 2>()
1 code_list=[125,56,89,36]
----> 2 code_list[4]
IndexError: list index out of range
KeyError异常
访问字典里不存在的键时引发
dict1[104]
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [9], in <cell line: 1>()
----> 1 dict1[104]
NameError: name 'dict1' is not defined
NameError异常
value1
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [10], in <cell line: 1>()
----> 1 value1
NameError: name 'value1' is not defined
a=value1
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [11], in <cell line: 1>()
----> 1 a=value1
NameError: name 'value1' is not defined
value1=10
TypeError异常
i='2'
print(5/i)
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [14], in <cell line: 1>()
----> 1 print(5/i)
TypeError: unsupported operand type(s) for /: 'int' and 'str'
ValueError异常
i='QWE'
print(5/int(i))
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [16], in <cell line: 1>()
----> 1 print(5/int(i))
ValueError: invalid literal for int() with base 10: 'QWE'
捕获异常
try-except语句
import datetime as dt
def read_date(in_date):
try:
date=dt.datetime.strptime(in_date,'%Y-%m-%d')
return date
except ValueError:
print('处理ValueError异常')
str_date='2018-8-18'
print('日期={0}'.format(read_date(str_date)))
日期=2018-08-18 00:00:00
def read_date(in_date):
try:
date=dt.datetime.strptime(in_date,'%Y-%m-%d')
return date
except ValueError as e:
print('处理ValueError异常')
print(e)
str_date='201B-8-18'
print('日期={0}'.format(read_date(str_date)))
处理ValueError异常
time data '201B-8-18' does not match format '%Y-%m-%d'
日期=None
多except代码块
import datetime as dt
def read_date_from_file(filename):
try:
file=open(filename)
in_date=file.read()
in_date=in_date.strip()
date=dt.datetime.strptime(in_date,'%Y-%m-%d')
return date
except ValueError as e:
print('处理ValueError异常')
print(e)
except FileNotFoundError as e:
print('处理FileNotFoundError异常')
print(e)
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
处理FileNotFoundError异常
[Errno 2] No such file or directory: 'read.txt'
日期=None
import datetime as dt
def read_date_from_file(filename):
try:
file=open(filename)
in_date=file.read()
in_date=in_date.strip()
date=dt.datetime.strptime(in_date,'%Y-%m-%d')
return date
except ValueError as e:
print('处理ValueError异常')
print(e)
except FileNotFoundError as e:
print('处理FileNotFoundError异常')
print(e)
except OSError as e:
print('处理OSError异常')
print(e)
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
处理FileNotFoundError异常
[Errno 2] No such file or directory: 'read.txt'
日期=None
try-except语句嵌套
import datetime as dt
def read_date_from_file(filename):
try:
file=open(filename)
try:
in_date = file.read()
in_date = in_date.strip()
date = dt.datetime.strptime(in_date, '%Y-%m-%d')
return date
except ValueError as e:
print('处理ValueError异常')
print(e)
except FileNotFoundError as e:
print('处理FileNotFoundError异常')
print(e)
except OSError as e:
print('处理OSError异常')
print(e)
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
处理FileNotFoundError异常
[Errno 2] No such file or directory: 'read.txt'
日期=None
多重异常捕获
import datetime as dt
def read_date_from_file(filename):
try:
file=open(filename)
in_date=file.read()
in_date=in_date.strip()
date=dt.datetime.strptime(in_date,'%Y-%m-%d')
return date
except (ValueError,OSError) as e:
print('调用---')
print(e)
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
调用---
[Errno 2] No such file or directory: 'read.txt'
日期=None
异常堆栈跟踪
import datetime as dt
import traceback as tb
def read_date_from_file(filename):
try:
file=open(filename)
in_date=file.read()
in_date=in_date.strip()
date=dt.datetime.strptime(in_date,'%Y-%m-%d')
return date
except (ValueError,OSError) as e:
print('调用---')
print(e)
tb.print_exc()
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
调用---
[Errno 2] No such file or directory: 'read.txt'
日期=None
Traceback (most recent call last):
File "C:\Users\HP\AppData\Local\Temp\ipykernel_8772\538862610.py", line 5, in read_date_from_file
file=open(filename)
FileNotFoundError: [Errno 2] No such file or directory: 'read.txt'
释放资源
finally代码块
import datetime as dt
def read_date_from_file(filename):
try:
file=open(filename)
in_date=file.read()
in_date=in_date.strip()
date=dt.datetime.strptime(in_date,'%Y-%m-%d')
return date
except ValueError as e:
print('处理ValueError异常')
print(e)
except FileNotFoundError as e:
print('处理FileNotFoundError异常')
print(e)
except OSError as e:
print('处理OSError异常')
print(e)
finally:
file.close()
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
处理FileNotFoundError异常
[Errno 2] No such file or directory: 'read.txt'
---------------------------------------------------------------------------
UnboundLocalError Traceback (most recent call last)
Input In [7], in <cell line: 21>()
18 finally:
19 file.close()
---> 21 date=read_date_from_file('read.txt')
22 print('日期={0}'.format(date))
Input In [7], in read_date_from_file(filename)
17 print(e)
18 finally:
---> 19 file.close()
UnboundLocalError: local variable 'file' referenced before assignment
else代码块
import datetime as dt
import traceback as tb
def read_date_from_file(filename):
try:
file=open(filename)
except OSError as e:
print('打开文件失败')
else:
print('打开文件成功')
try:
in_date = file.read()
in_date = in_date.strip()
date = dt.datetime.strptime(in_date, '%Y-%m-%d')
return date
except ValueError as e:
print('处理ValueError异常')
print(e)
except OSError as e:
print('处理OSError异常')
print(e)
finally:
file.close()
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
打开文件失败
日期=None
with as 代码块自动资源管理
import datetime as dt
def read_date_from_file(filename):
try:
with open(filename) as file:
in_date=file.read()
in_date=in_date.strip()
date=dt.datetime.strptime(in_date,'%Y-%m-%d')
return date
except ValueError as e:
print('处理ValueError异常')
print(e)
except OSError as e:
print('处理OSError异常')
print(e)
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
处理OSError异常
[Errno 2] No such file or directory: 'read.txt'
日期=None
自定义异常类
class MyException(Exception):
def __init__(self,message):
super().__init__(message)
显式抛出异常
import datetime as dt
class MyException(Exception):
def __init__(self,message):
super().__init__(message)
def read_date_from_file(filename):
try:
file=open(filename)
in_date=file.read()
in_date=in_date.strip()
date=dt.datetime.strptime(in_date,'%Y-%m-%d')
return date
except ValueError as e:
raise MyException('不是有效日期')
except FileNotFoundError as e:
raise MyException('文件找不到')
except OSError as e:
raise MyException('文件无法打开或无法读取')
date=read_date_from_file('read.txt')
print('日期={0}'.format(date))
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Input In [4], in read_date_from_file(filename)
7 try:
----> 8 file=open(filename)
9 in_date=file.read()
FileNotFoundError: [Errno 2] No such file or directory: 'read.txt'
During handling of the above exception, another exception occurred:
MyException Traceback (most recent call last)
Input In [4], in <cell line: 20>()
17 except OSError as e:
18 raise MyException('文件无法打开或无法读取')
---> 20 date=read_date_from_file('read.txt')
21 print('日期={0}'.format(date))
Input In [4], in read_date_from_file(filename)
14 raise MyException('不是有效日期')
15 except FileNotFoundError as e:
---> 16 raise MyException('文件找不到')
17 except OSError as e:
18 raise MyException('文件无法打开或无法读取')
MyException: 文件找不到
常用模块
math模块
舍入函数
import math
math.ceil(1.4)
2
math.floor(1.4)
1
round(1.4)
1
math.ceil(1.5)
2
math.floor(1.5)
1
math.ceil(1.6)
2
math.floor(1.6)
1
round(1.5)
2
round(1.6)
2
幂和对数函数
math.log(8,2)
3.0
math.pow(2,3)
8.0
math.log(8)
2.0794415416798357
math.sqrt(1.6)
1.2649110640673518
三角函数
math.degrees(0.5*math.pi)
90.0
math.radians(180/math.pi)
1.0
a=math.radians(45/math.pi)
a
0.25
math.sin(a)
0.24740395925452294
math.asin(math.sin(a))
0.25
math.asin(0.2474)
0.24999591371483254
math.asin(0.24740395925452294)
0.25
math.cos(a)
0.9689124217106447
math.acos(0.9689124217106447)
0.2500000000000002
math.acos(math.cos(a))
0.2500000000000002
math.tan(a)
0.25534192122103627
math.atan(math.tan(a))
0.25
math.atan(0.25534192122103627)
0.25
random模块
import random
print('0.0<=x<1.0 random')
for i in range(0,10):
x=random.random()
print(x)
print('0<x<5 random')
for i in range(0,10):
x=random.randrange(5)
print(x)
print('05<=x<10 random')
for i in range(0,10):
x=random.randrange(5,10)
print(x)
print('05<=x<=10 random')
for i in range(0,10):
x=random.randint(5,10)
print(x)
0.0<=x<1.0 random
0.3905863037934756
0.8922407632329942
0.21352047760461534
0.5211523015401928
0.30030870435664747
0.9862984919490358
0.21171993560160762
0.6653280107488534
0.32488043176197134
0.3562099773397064
0<x<5 random
0
0
4
0
2
1
3
3
0
4
05<=x<10 random
7
8
7
6
8
5
9
8
7
7
05<=x<=10 random
5
5
8
7
7
9
9
8
7
5
datetime模块
datetime、date和time类
datetime类
import datetime
dt=datetime.datetime(2018,2,29)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [37], in <cell line: 1>()
----> 1 dt=datetime.datetime(2018,2,29)
ValueError: day is out of range for month
dt=datetime.datetime(2018,2,28)
dt
datetime.datetime(2018, 2, 28, 0, 0)
dt=datetime.datetime(2018,2,28,23,60,59,10000)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [40], in <cell line: 1>()
----> 1 dt=datetime.datetime(2018,2,28,23,60,59,10000)
ValueError: minute must be in 0..59
dt=datetime.datetime(2018,2,28,23,30,59,10000)
dt
datetime.datetime(2018, 2, 28, 23, 30, 59, 10000)
datetime.datetime.today()
datetime.datetime(2023, 3, 21, 18, 2, 6, 436821)
datetime.datetime.now()
datetime.datetime(2023, 3, 21, 18, 2, 32, 837270)
datetime.datetime.utcnow()
datetime.datetime(2023, 3, 21, 10, 2, 48, 100681)
datetime.datetime.fromtimestamp(999999999.999)
datetime.datetime(2001, 9, 9, 9, 46, 39, 999000)
datetime.datetime.utcfromtimestamp(999999999.999)
datetime.datetime(2001, 9, 9, 1, 46, 39, 999000)
date类
d=datetime.date(2018,2,29)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [48], in <cell line: 1>()
----> 1 d=datetime.date(2018,2,29)
ValueError: day is out of range for month
d=datetime.date(2018,2,28)
d
datetime.date(2018, 2, 28)
datetime.date.today()
datetime.date(2023, 3, 21)
datetime.date.fromtimestamp(999999999.999)
datetime.date(2001, 9, 9)
time类
datetime.time(24,59,58,1999)
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [53], in <cell line: 1>()
----> 1 datetime.time(24,59,58,1999)
ValueError: hour must be in 0..23
datetime.time(23,59,58,1999)
datetime.time(23, 59, 58, 1999)
日期时间计算
datetime.date.today()
datetime.date(2023, 3, 21)
d=datetime.date.today()
delta=datetime.timedelta(10)
d+=delta
d
datetime.date(2023, 3, 31)
d=datetime.date(2018,1,1)
delta=datetime.timedelta(weeks=5)
d-=delta
d
datetime.date(2017, 11, 27)
日期时间格式化和解析
d=datetime.datetime.today()
d.strftime('%Y-%m-%d %H:%M:%S')
'2023-03-21 18:10:33'
d.strftime('%Y-%m-%d')
'2023-03-21'
str_date='2018-02-29 10:40:26'
date=datetime.datetime.strptime(in_date,'%Y-%m-%d %H:%M:%S')
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [69], in <cell line: 1>()
----> 1 date=datetime.datetime.strptime(in_date,'%Y-%m-%d %H:%M:%S')
NameError: name 'in_date' is not defined
str_date='2018-02-28 10:40:26'
date=datetime.datetime.strptime(str_date,'%Y-%m-%d %H:%M:%S')
date
datetime.datetime(2018, 2, 28, 10, 40, 26)
date=datetime.datetime.strptime(str_date,'%Y-%m-%d')
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
Input In [74], in <cell line: 1>()
----> 1 date=datetime.datetime.strptime(str_date,'%Y-%m-%d')
File E:\anaconda\lib\_strptime.py:568, in _strptime_datetime(cls, data_string, format)
565 def _strptime_datetime(cls, data_string, format="%a %b %d %H:%M:%S %Y"):
566 """Return a class cls instance based on the input string and the
567 format string."""
--> 568 tt, fraction, gmtoff_fraction = _strptime(data_string, format)
569 tzname, gmtoff = tt[-2:]
570 args = tt[:6] + (fraction,)
File E:\anaconda\lib\_strptime.py:352, in _strptime(data_string, format)
349 raise ValueError("time data %r does not match format %r" %
350 (data_string, format))
351 if len(data_string) != found.end():
--> 352 raise ValueError("unconverted data remains: %s" %
353 data_string[found.end():])
355 iso_year = year = None
356 month = day = 1
ValueError: unconverted data remains: 10:40:26
时区
from datetime import datetime,timezone,timedelta
utc_dt=datetime(2008,8,19,23,59,59,tzinfo=timezone.utc)
utc_dt
datetime.datetime(2008, 8, 19, 23, 59, 59, tzinfo=datetime.timezone.utc)
utc_dt.strftime('%Y-%m-%d %H:%M:%S')
'2008-08-19 23:59:59'
utc_dt.strftime('%Y-%m-%d %H:%M:%S %z')
'2008-08-19 23:59:59 +0000'
bj_tz=timezone(offset=timedelta(hours=8),name='Asia/Beijing')
bj_tz
datetime.timezone(datetime.timedelta(seconds=28800), 'Asia/Beijing')
bj_dt=utc_dt.astimezone(bj_tz)
bj_dt
datetime.datetime(2008, 8, 20, 7, 59, 59, tzinfo=datetime.timezone(datetime.timedelta(seconds=28800), 'Asia/Beijing'))
bj_dt.strftime('%Y-%m-%d %H:%M:%S %Z')
'2008-08-20 07:59:59 Asia/Beijing'
bj_dt.strftime('%Y-%m-%d %H:%M:%S %z')
'2008-08-20 07:59:59 +0800'
bj_tz=timezone(timedelta(hours=8))
bj_dt=utc_dt.astimezone(bj_tz)
bj_dt.strftime('%Y-%m-%d %H:%M:%S %z')
'2008-08-20 07:59:59 +0800'
logging日志模块
日志级别
import logging
logging.basicConfig(level=logging.ERROR)
logging.debug('this is debug')
logging.info('this is info')
logging.warning('this is warning')
logging.error('this is error')
logging.critical('this is critical')
2023-03-21 20:15:10,230-MainThread-root-<cell line: 5>-INFO-this is info
2023-03-21 20:15:10,246-MainThread-root-<cell line: 6>-WARNING-this is warning
2023-03-21 20:15:10,247-MainThread-root-<cell line: 7>-ERROR-this is error
2023-03-21 20:15:10,248-MainThread-root-<cell line: 8>-CRITICAL-this is critical
import logging
logging.basicConfig(level=logging.DEBUG)
logger=logging.getLogger(__name__)
logger.debug('this is debug')
logger.info('this is info')
logger.warning('this is warning')
logger.error('this is error')
logger.critical('this is critical')
2023-03-21 20:14:59,694-MainThread-__main__-<cell line: 6>-INFO-this is info
2023-03-21 20:14:59,694-MainThread-__main__-<cell line: 7>-WARNING-this is warning
2023-03-21 20:14:59,698-MainThread-__main__-<cell line: 8>-ERROR-this is error
2023-03-21 20:14:59,700-MainThread-__main__-<cell line: 9>-CRITICAL-this is critical
日志信息格式化
import logging
logging.basicConfig(level=logging.INFO,
format='%(asctime)s-%(threadName)s-'
'%(name)s-%(funcName)s-%(levelname)s-%(message)s')
logger=logging.getLogger(__name__)
logger.debug('this is debug')
logger.info('this is info')
logger.warning('this is warning')
logger.error('this is error')
logger.critical('this is critical')
def funlog():
logger.info('enter funlog')
logger.info('use funlog')
funlog()
2023-03-21 20:14:51,110-MainThread-__main__-<cell line: 8>-INFO-this is info
2023-03-21 20:14:51,120-MainThread-__main__-<cell line: 9>-WARNING-this is warning
2023-03-21 20:14:51,122-MainThread-__main__-<cell line: 10>-ERROR-this is error
2023-03-21 20:14:51,123-MainThread-__main__-<cell line: 11>-CRITICAL-this is critical
2023-03-21 20:14:51,124-MainThread-__main__-<cell line: 16>-INFO-use funlog
2023-03-21 20:14:51,124-MainThread-__main__-funlog-INFO-enter funlog
日志重定位
import logging
logging.basicConfig(level=logging.INFO,
format='%(asctime)s-%(threadName)s-'
'%(name)s-%(funcName)s-%(levelname)s-%(message)s')
logger=logging.getLogger(__name__)
logger.debug('this is debug')
logger.info('this is info')
logger.warning('this is warning')
logger.error('this is error')
logger.critical('this is critical')
def funlog():
logger.info('enter funlog')
logger.info('use funlog')
funlog()
2023-03-21 20:17:26,157-MainThread-__main__-<cell line: 8>-INFO-this is info
2023-03-21 20:17:26,165-MainThread-__main__-<cell line: 9>-WARNING-this is warning
2023-03-21 20:17:26,166-MainThread-__main__-<cell line: 10>-ERROR-this is error
2023-03-21 20:17:26,167-MainThread-__main__-<cell line: 11>-CRITICAL-this is critical
2023-03-21 20:17:26,169-MainThread-__main__-<cell line: 16>-INFO-use funlog
2023-03-21 20:17:26,171-MainThread-__main__-funlog-INFO-enter funlog
使用配置文件
import logging
import logging.config
logging.config.fileConfig("logger.conf")
logger=logging.getLogger('loggerl')
logger.debug('this is debug')
logger.info('this is info')
logger.warning('this is warning')
logger.error('this is error')
logger.critical('this is critical')
def funlog():
logger.info('enter funlog')
logger.info('use funlog')
funlog()
this is debug
this is info
this is warning
this is error
this is critical
use funlog
enter funlog
正则表达式
正则表达式字符串
元字符
字符转义
开始与结束字符
import re
p1 = r'\w+@zhijieketang\.com'
p2 = r'^\w+@zhijieketang\.com$'
text = "Tony's email is tony_guan588@zhijieketang.com."
m = re.search(p1, text)
print(m)
m = re.search(p2, text)
print(m)
email = 'tony_guan588@zhijieketang.com'
m = re.search(p2, email)
print(m)
<re.Match object; span=(16, 45), match='tony_guan588@zhijieketang.com'>
None
<re.Match object; span=(0, 29), match='tony_guan588@zhijieketang.com'>
字符类
定义字符类
import re
p = r'[Jj]ava'
m = re.search(p, 'I like Java and Python.')
print(m)
m = re.search(p, 'I like JAVA and Python.')
print(m)
m = re.search(p, 'I like java and Python.')
print(m)
<re.Match object; span=(7, 11), match='Java'>
None
<re.Match object; span=(7, 11), match='java'>
字符类取反
import re
p = r'[^0123456789]'
m = re.search(p, '1000')
print(m)
m = re.search(p, 'Python 3')
print(m)
None
<re.Match object; span=(0, 1), match='P'>
区间
import re
m = re.search(r'[A-Za-z0-9]', 'A10.3')
print(m)
m = re.search(r'[0-25-7]', 'A3489C')
print(m)
<re.Match object; span=(0, 1), match='A'>
None
预定义字符类
import re
p = r'\D'
m = re.search(p, '1000')
print(m)
m = re.search(p, 'Python 3')
print(m)
text = '你们好Hello'
m = re.search(r'\w', text)
print(m)
None
<re.Match object; span=(0, 1), match='P'>
<re.Match object; span=(0, 1), match='你'>
量词
量词的使用
import re
m = re.search(r'\d?', '87654321')
print(m)
m = re.search(r'\d?', 'ABC')
print(m)
m = re.search(r'\d*', '87654321')
print(m)
m = re.search(r'\d*', 'ABC')
print(m)
m = re.search(r'\d+', '87654321')
print(m)
m = re.search(r'\d+', 'ABC')
print(m)
m = re.search(r'\d{8}', '87654321')
print('8765432', m)
m = re.search(r'\d{8}', 'ABC')
print(m)
m = re.search(r'\d{7,8}', '87654321')
print(m)
m = re.search(r'\d{9,}', '87654321')
print(m)
<re.Match object; span=(0, 1), match='8'>
<re.Match object; span=(0, 0), match=''>
<re.Match object; span=(0, 8), match='87654321'>
<re.Match object; span=(0, 0), match=''>
<re.Match object; span=(0, 8), match='87654321'>
None
8765432 <re.Match object; span=(0, 8), match='87654321'>
None
<re.Match object; span=(0, 8), match='87654321'>
None
贪婪量词和懒惰量词
import re
m = re.search(r'\d{5,8}', '87654321')
print(m)
m = re.search(r'\d{5,8}?', '87654321')
print(m)
<re.Match object; span=(0, 8), match='87654321'>
<re.Match object; span=(0, 5), match='87654'>
分组
分组的使用
import re
p = r'(121){2}'
m = re.search(p, '121121abcabc')
print(m)
print(m.group())
print(m.group(1))
p = r'(\d{3,4})-(\d{7,8})'
m = re.search(p, '010-87654321')
print(m)
print(m.group())
print(m.groups())
<re.Match object; span=(0, 6), match='121121'>
121121
121
<re.Match object; span=(0, 12), match='010-87654321'>
010-87654321
('010', '87654321')
分组命名
import re
p = r'(?P<area_code>\d{3,4})-(?P<phone_code>\d{7,8})'
m = re.search(p, '010-87654321')
print(m)
print(m.group())
print(m.groups())
print(m.group(1))
print(m.group(2))
print(m.group('area_code'))
print(m.group('phone_code'))
<re.Match object; span=(0, 12), match='010-87654321'>
010-87654321
('010', '87654321')
010
87654321
010
87654321
反向引用分组
import re
p = r'<([\w]+)>.*</\1>'
m = re.search(p, '<a>abc</a>')
print(m)
m = re.search(p, '<a>abc</b>')
print(m)
<re.Match object; span=(0, 10), match='<a>abc</a>'>
None
非捕获分组
import re
s = 'img1.jpg,img2.jpg,img3.bmp'
p = r'\w+(\.jpg)'
mlist = re.findall(p, s)
print(mlist)
p = r'\w+(?:\.jpg)'
mlist = re.findall(p, s)
print(mlist)
['.jpg', '.jpg']
['img1.jpg', 'img2.jpg']
re模块
search()和match()函数
import re
p = r'\w+@zhijieketang\.com'
text = "Tony's email is tony_guan588@zhijieketang.com."
m = re.search(p, text)
print(m)
m = re.match(p, text)
print(m)
email = 'tony_guan588@zhijieketang.com'
m = re.search(p, email)
print(m)
m = re.match(p, email)
print(m)
print('match对象几个方法:')
print(m.group())
print(m.start())
print(m.end())
print(m.span())
<re.Match object; span=(16, 45), match='tony_guan588@zhijieketang.com'>
None
<re.Match object; span=(0, 29), match='tony_guan588@zhijieketang.com'>
<re.Match object; span=(0, 29), match='tony_guan588@zhijieketang.com'>
match对象几个方法:
tony_guan588@zhijieketang.com
0
29
(0, 29)
findall()和finditer()函数
import re
p = r'[Jj]ava'
text = 'I like Java and java.'
match_list = re.findall(p, text)
print(match_list)
match_iter = re.finditer(p, text)
for m in match_iter:
print(m.group())
['Java', 'java']
Java
java
字符串分割
import re
p = r'\d+'
text = 'AB12CD34EF'
clist = re.split(p, text)
print(clist)
clist = re.split(p, text, maxsplit=1)
print(clist)
clist = re.split(p, text, maxsplit=2)
print(clist)
['AB', 'CD', 'EF']
['AB', 'CD34EF']
['AB', 'CD', 'EF']
字符串替换
import re
p = r'\d+'
text = 'AB12CD34EF'
repace_text = re.sub(p, ' ', text)
print(repace_text)
repace_text = re.sub(p, ' ', text, count=1)
print(repace_text)
repace_text = re.sub(p, ' ', text, count=2)
print(repace_text)
AB CD EF
AB CD34EF
AB CD EF
编译正则表达式
re.compile(pattern[,flags=0])
已编译正则表达式对象
import re
p = r'\w+@zhijieketang\.com'
regex = re.compile(p)
text = "Tony's email is tony_guan588@zhijieketang.com."
m = regex.search(text)
print(m)
m = regex.match(text)
print(m)
p = r'[Jj]ava'
regex = re.compile(p)
text = 'I like Java and java.'
match_list = regex.findall(text)
print(match_list)
match_iter = regex.finditer(text)
for m in match_iter:
print(m.group())
p = r'\d+'
regex = re.compile(p)
text = 'AB12CD34EF'
clist = regex.split(text)
print(clist)
repace_text = regex.sub(' ', text)
print(repace_text)
<re.Match object; span=(16, 45), match='tony_guan588@zhijieketang.com'>
None
['Java', 'java']
Java
java
['AB', 'CD', 'EF']
AB CD EF
编译标志
ASCII和Unicode
import re
text = '你们好Hello'
p = r'\w+'
regex = re.compile(p, re.U)
m = regex.search(text)
print(m)
m = regex.match(text)
print(m)
regex = re.compile(p, re.A)
m = regex.search(text)
print(m)
m = regex.match(text)
print(m)
<re.Match object; span=(0, 8), match='你们好Hello'>
<re.Match object; span=(0, 8), match='你们好Hello'>
<re.Match object; span=(3, 8), match='Hello'>
None
忽略大小写
import re
p = r'(java).*(python)'
regex = re.compile(p, re.I)
m = regex.search('I like Java and Python.')
print(m)
m = regex.search('I like JAVA and Python.')
print(m)
m = regex.search('I like java and Python.')
print(m)
<re.Match object; span=(7, 22), match='Java and Python'>
<re.Match object; span=(7, 22), match='JAVA and Python'>
<re.Match object; span=(7, 22), match='java and Python'>
点元字符匹配换行符
import re
p = r'.+'
regex = re.compile(p)
m = regex.search('Hello\nWorld.')
print(m)
regex = re.compile(p, re.DOTALL)
m = regex.search('Hello\nWorld.')
print(m)
<re.Match object; span=(0, 5), match='Hello'>
<re.Match object; span=(0, 12), match='Hello\nWorld.'>
多行模式
import re
p = r'^World'
regex = re.compile(p)
m = regex.search('Hello\nWorld.')
print(m)
regex = re.compile(p, re.M)
m = regex.search('Hello\nWorld.')
print(m)
None
<re.Match object; span=(6, 11), match='World'>
详细模式
import re
p = """(java) #匹配java字符串
.* #匹配任意字符零或多个
(python) #匹配python字符串
"""
regex = re.compile(p, re.I | re.VERBOSE)
m = regex.search('I like Java and Python.')
print(m)
m = regex.search('I like JAVA and Python.')
print(m)
m = regex.search('I like java and Python.')
print(m)
<re.Match object; span=(7, 22), match='Java and Python'>
<re.Match object; span=(7, 22), match='JAVA and Python'>
<re.Match object; span=(7, 22), match='java and Python'>
数据交换格式
CSV数据交换格式
reader()函数
import csv
with open('data/books.csv', 'r', encoding='gbk') as rf:
reader = csv.reader(rf, dialect=csv.excel)
for row in reader:
print('|'.join(row))
1|软件工程|戴国强|机械工业出版社|19980528|2
2|汇编语言|李利光|北京大学出版社|19980318|2
3|计算机基础|王飞|经济科学出版社|19980218|1
4|FLASH精选|刘扬|中国纺织出版社|19990312|2
5|java基础|王一|电子工业出版社|19990528|3
6|世界杯|柳飞|世界出版社|19990412|2
7|JAVA程序设计|张余|人民邮电出版社|19990613|1
8|新概念3|余智|外语出版社|19990723|2
9|军事要闻|张强|解放军出版社|19990722|3
10|大众生活|许阳|电子出版社|19990819|3
11|南方旅游|王爱国|南方出版社|19990930|2
13|幽灵|钱力华|华光出版社|19991008|1
14|期货分析|孙宝|飞鸟出版社|19991122|3
15|人工智能|周未|机械工业出版社|19991223|3
16|数据库系统概念|吴红|机械工业出版社|20000328|3
17|计算机理论基础|戴家|机械工业出版社|20000218|4
18|编译原理|郑键|机械工业出版社|20000415|2
19|通讯与网络|欧阳杰|机械工业出版社|20000517|1
20|现代操作系统|王小国|机械工业出版社|20010128|1
21|网络基础|王大尉|北京大学出版社|20000617|1
22|万紫千红|丛丽|北京大学出版社|20000702|3
23|经济概论|思佳|北京大学出版社|20000819|3
24|经济与科学|毛波|经济科学出版社|20000923|2
25|计算机体系结构|方丹|机械工业出版社|20000328|4
26|软件工程|牛田|经济科学出版社|20000328|4
27|世界语言大观|候丙辉|经济科学出版社|20000814|2
28|高级语言程序设计|寇国华|清华大学出版社|20000117|3
29|操作系统概论|聂元名|清华大学出版社|20001028|1
30|数据库及应用|孙家萧|清华大学出版社|20000619|1
31|软件工程|戴志名|电子工业出版社|20000324|3
32|SOL使用手册|贺民|电子工业出版社|19990425|2
33|模拟电路|邓英才|电子工业出版社|20000527|2
34|集邮爱好者|李云|人民邮电出版社|20000630|1
36|高等数学|李放|人民邮电出版社|20000812|1
37|南方周末|邓光明|南方出版社|20000923|3
38|十大旅游胜地|潭晓明|南方出版社|20000403|2
39|黑幕|李仪|华光出版社|20000508|24
writer()函数
import csv
with open('data/books.csv', 'r', encoding='gbk') as rf:
reader = csv.reader(rf)
with open('data/books2.csv', 'w', newline='', encoding='gbk') as wf:
writer = csv.writer(wf, delimiter='\t')
for row in reader:
print('|'.join(row))
writer.writerow(row)
1|软件工程|戴国强|机械工业出版社|19980528|2
2|汇编语言|李利光|北京大学出版社|19980318|2
3|计算机基础|王飞|经济科学出版社|19980218|1
4|FLASH精选|刘扬|中国纺织出版社|19990312|2
5|java基础|王一|电子工业出版社|19990528|3
6|世界杯|柳飞|世界出版社|19990412|2
7|JAVA程序设计|张余|人民邮电出版社|19990613|1
8|新概念3|余智|外语出版社|19990723|2
9|军事要闻|张强|解放军出版社|19990722|3
10|大众生活|许阳|电子出版社|19990819|3
11|南方旅游|王爱国|南方出版社|19990930|2
13|幽灵|钱力华|华光出版社|19991008|1
14|期货分析|孙宝|飞鸟出版社|19991122|3
15|人工智能|周未|机械工业出版社|19991223|3
16|数据库系统概念|吴红|机械工业出版社|20000328|3
17|计算机理论基础|戴家|机械工业出版社|20000218|4
18|编译原理|郑键|机械工业出版社|20000415|2
19|通讯与网络|欧阳杰|机械工业出版社|20000517|1
20|现代操作系统|王小国|机械工业出版社|20010128|1
21|网络基础|王大尉|北京大学出版社|20000617|1
22|万紫千红|丛丽|北京大学出版社|20000702|3
23|经济概论|思佳|北京大学出版社|20000819|3
24|经济与科学|毛波|经济科学出版社|20000923|2
25|计算机体系结构|方丹|机械工业出版社|20000328|4
26|软件工程|牛田|经济科学出版社|20000328|4
27|世界语言大观|候丙辉|经济科学出版社|20000814|2
28|高级语言程序设计|寇国华|清华大学出版社|20000117|3
29|操作系统概论|聂元名|清华大学出版社|20001028|1
30|数据库及应用|孙家萧|清华大学出版社|20000619|1
31|软件工程|戴志名|电子工业出版社|20000324|3
32|SOL使用手册|贺民|电子工业出版社|19990425|2
33|模拟电路|邓英才|电子工业出版社|20000527|2
34|集邮爱好者|李云|人民邮电出版社|20000630|1
36|高等数学|李放|人民邮电出版社|20000812|1
37|南方周末|邓光明|南方出版社|20000923|3
38|十大旅游胜地|潭晓明|南方出版社|20000403|2
39|黑幕|李仪|华光出版社|20000508|24
XML数据交换格式
XML文档结构
- 声明
- 根元素
- 子元素
- 属性
- 命名空间
- 限定名
解析XML文档
import xml.etree.ElementTree as ET
tree = ET.parse('data1/Notes.xml')
print(type(tree))
root = tree.getroot()
print(type(root))
print(root.tag)
for index, child in enumerate(root):
print('第{0}个{1}元素,属性:{2}'.format(index, child.tag, child.attrib))
for i, child_child in enumerate(child):
print(' 标签:{0},内容:{1}'.format(child_child.tag, child_child.text))
<class 'xml.etree.ElementTree.ElementTree'>
<class 'xml.etree.ElementTree.Element'>
Notes
第0个Note元素,属性:{'id': '1'}
标签:CDate,内容:2018-3-21
标签:Content,内容:发布Python0
标签:UserID,内容:tony
第1个Note元素,属性:{'id': '2'}
标签:CDate,内容:2018-3-22
标签:Content,内容:发布Python1
标签:UserID,内容:tony
第2个Note元素,属性:{'id': '3'}
标签:CDate,内容:2018-3-23
标签:Content,内容:发布Python2
标签:UserID,内容:tony
第3个Note元素,属性:{'id': '4'}
标签:CDate,内容:2018-3-24
标签:Content,内容:发布Python3
标签:UserID,内容:tony
第4个Note元素,属性:{'id': '5'}
标签:CDate,内容:2018-3-25
标签:Content,内容:发布Python4
标签:UserID,内容:tony
XPath
- find(match,namespace=None)
- findall(match,namespace=None)
- findtext(match,default=None,namespace=None)
import xml.etree.ElementTree as ET
tree = ET.parse('data1/Notes.xml')
root = tree.getroot()
node = root.find("./Note")
print(node.tag, node.attrib)
node = root.find("./Note/CDate")
print(node.text)
node = root.find("./Note/CDate/..")
print(node.tag, node.attrib)
node = root.find(".//CDate")
print(node.text)
node = root.find("./Note[@id]")
print(node.tag, node.attrib)
node = root.find("./Note[@id='2']")
print(node.tag, node.attrib)
node = root.find("./Note[2]")
print(node.tag, node.attrib)
node = root.find("./Note[last()]")
print(node.tag, node.attrib)
node = root.find("./Note[last()-2]")
print(node.tag, node.attrib)
Note {'id': '1'}
2018-3-21
Note {'id': '1'}
2018-3-21
Note {'id': '1'}
Note {'id': '2'}
Note {'id': '2'}
Note {'id': '5'}
Note {'id': '3'}
JSON数据交换格式
JSON文档结构
JSON数据编码
import json
py_dict = {'name': 'tony', 'age': 30, 'sex': True}
py_list = [1, 3]
py_tuple = ('A', 'B', 'C')
py_dict['a'] = py_list
py_dict['b'] = py_tuple
print(py_dict)
print(type(py_dict))
json_obj = json.dumps(py_dict)
print(json_obj)
print(type(json_obj))
json_obj = json.dumps(py_dict, indent=4)
print(json_obj)
with open('data2/data1.json', 'w') as f:
json.dump(py_dict, f)
with open('data2/data2.json', 'w') as f:
json.dump(py_dict, f, indent=4)
{'name': 'tony', 'age': 30, 'sex': True, 'a': [1, 3], 'b': ('A', 'B', 'C')}
<class 'dict'>
{"name": "tony", "age": 30, "sex": true, "a": [1, 3], "b": ["A", "B", "C"]}
<class 'str'>
{
"name": "tony",
"age": 30,
"sex": true,
"a": [
1,
3
],
"b": [
"A",
"B",
"C"
]
}
JSON数据解码
import json
json_obj = r'{"name": "tony", "age": 30, "sex": true, "a": [1, 3], "b": ["A", "B", "C"]}'
py_dict = json.loads(json_obj)
print(type(py_dict))
print(py_dict['name'])
print(py_dict['age'])
print(py_dict['sex'])
py_lista = py_dict['a']
print(py_lista)
py_listb = py_dict['b']
print(py_listb)
with open('data2/data2.json', 'r') as f:
data = json.load(f)
print(data)
print(type(data))
<class 'dict'>
tony
30
True
[1, 3]
['A', 'B', 'C']
{'name': 'tony', 'age': 30, 'sex': True, 'a': [1, 3], 'b': ['A', 'B', 'C']}
<class 'dict'>
配置文件
配置文件结构
读取配置文件
import configparser
config = configparser.ConfigParser()
config.read('data3/Setup.ini', encoding='utf-8')
print(config.sections())
section1 = config['Startup']
print(config.options('Startup'))
print(section1['RequireOS'])
print(section1['RequireIE'])
print(config['Product']['msi'])
print(config['Windows 2000']['MajorVersion'])
print(config['Windows 2000']['ServicePackMajor'])
value = config.get('Windows 2000', 'MajorVersion')
print(type(value))
value = config.getint('Windows 2000', 'MajorVersion')
print(type(value))
['Startup', 'Product', 'Windows 2000']
['requireos', 'requiremsi', 'requireie']
Windows 2000
6.0.2600.0
AcroRead.msi
5
4
<class 'str'>
<class 'int'>
写入配置文件
import configparser
config = configparser.ConfigParser()
config.read('data3/Setup.ini', encoding='utf-8')
config['Startup']['RequireMSI'] = '8.0'
config['Product']['RequireMSI'] = '4.0'
config.add_section('Section2')
config.set('Section2', 'name', 'Mac')
with open('data3/Setup.ini', 'w') as fw:
config.write(fw)
数据库编程
数据持久化技术概述
- 文本文件
- 数据库
MySQL数据库管理系统
Python DB-API
建立数据连接
创建游标
案例:MySQL数据库CURD操作
安装PyMySQL模块
数据查询操作
有条件查询实现代码
import pymysql
connection = pymysql.connect(host='localhost',
user='root',
password='986370165',
database='MyDB',
charset='utf8')
try:
with connection.cursor() as cursor:
sql = 'select name, userid from user where userid >%(id)s'
cursor.execute(sql, {'id': 0})
result_set = cursor.fetchall()
for row in result_set:
print('id:{0} - name:{1}'.format(row[1], row[0]))
finally:
connection.close()
id:1 - name:Tom
id:2 - name:Ben
无条件查询实现代码
import pymysql
connection = pymysql.connect(host='localhost',
user='root',
password='986370165',
database='MyDB',
charset='utf8')
try:
with connection.cursor() as cursor:
sql = 'select max(userid) from user'
cursor.execute(sql)
row = cursor.fetchone()
if row is not None:
print('最大用户Id :{0}'.format(row[0]))
finally:
connection.close()
最大用户Id :2
数据修改操作
- 数据插入
import pymysql
def read_max_userid():
connection = pymysql.connect(host='localhost',
user='root',
password='986370165',
database='MyDB',
charset='utf8')
try:
with connection.cursor() as cursor:
sql = 'select max(userid) from user'
cursor.execute(sql)
row = cursor.fetchone()
if row is not None:
print('最大用户Id :{0}'.format(row[0]))
return row[0]
finally:
connection.close()
connection = pymysql.connect(host='localhost',
user='root',
password='986370165',
database='MyDB',
charset='utf8')
maxid = read_max_userid()
try:
with connection.cursor() as cursor:
sql = 'insert into user (userid, name) values (%s,%s)'
nextid = maxid + 1
name = 'Tony' + str(nextid)
affectedcount = cursor.execute(sql, (nextid, name))
print('影响的数据行数:{0}'.format(affectedcount))
connection.commit()
except pymysql.DatabaseError:
connection.rollback()
finally:
connection.close()
最大用户Id :2
影响的数据行数:1
- 数据更新
import pymysql
connection = pymysql.connect(host='localhost',
user='root',
password='986370165',
database='MyDB',
charset='utf8')
try:
with connection.cursor() as cursor:
sql = 'update user set name = %s where userid > %s'
affectedcount = cursor.execute(sql, ('Tom', 2))
print('影响的数据行数:{0}'.format(affectedcount))
connection.commit()
except pymysql.DatabaseError as e:
connection.rollback()
print(e)
finally:
connection.close()
影响的数据行数:1
- 数据删除
import pymysql
def read_max_userid():
connection = pymysql.connect(host='localhost',
user='root',
password='986370165',
database='MyDB',
charset='utf8')
try:
with connection.cursor() as cursor:
sql = 'select max(userid) from user'
cursor.execute(sql)
row = cursor.fetchone()
if row is not None:
print('最大用户Id :{0}'.format(row[0]))
return row[0]
finally:
connection.close()
connection = pymysql.connect(host='localhost',
user='root',
password='986370165',
database='MyDB',
charset='utf8')
maxid = read_max_userid()
try:
with connection.cursor() as cursor:
sql = 'delete from user where userid = %s'
affectedcount = cursor.execute(sql, (maxid))
print('影响的数据行数:{0}'.format(affectedcount))
connection.commit()
except pymysql.DatabaseError:
connection.rollback()
finally:
connection.close()
最大用户Id :3
影响的数据行数:1
NoSQL数据存储
dbm数据库的打开和关闭
dbm.open(file,flag=’r’)
‘r’,’w’,’c’,’n’
with dbm.open(file,’c’) as db:
pass
dbm数据存储
import dbm
with dbm.open('mydb', 'c') as db:
db['name'] = 'tony'
print(db['name'].decode())
age = int(db.get('age', b'18').decode())
print(age)
if 'age' in db:
db['age'] = '20'
del db['name']
tony
18
wxPython图形用户界面编程
Python图形用户界面开发工具包
wxPython安装
wxPython基础
wxPython类层次结构
第一个wxPython程序
import wx
app = wx.App()
frm = wx.Frame(None, title="第一个GUI程序!", size=(400, 300), pos=(100, 100))
frm.Show()
app.MainLoop()
0
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title="第一个GUI程序!", size=(400, 300), pos=(100, 100))
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
def OnExit(self):
print('应用程序退出')
return 0
if __name__ == '__main__':
app = App()
app.MainLoop()
应用程序退出
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title="第一个GUI程序!", size=(400, 300))
self.Centre()
panel = wx.Panel(parent=self)
statictext = wx.StaticText(parent=panel, label='Hello World!', pos=(10, 10))
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
wxPython界面构建层次结构
事件处理
一对一事件处理
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='一对一事件处理', size=(300, 180))
self.Centre()
panel = wx.Panel(parent=self)
self.statictext = wx.StaticText(parent=panel, pos=(110, 20))
b = wx.Button(parent=panel, label='OK', pos=(100, 50))
self.Bind(wx.EVT_BUTTON, self.on_click, b)
def on_click(self, event):
print(type(event))
self.statictext.SetLabelText('Hello, world.')
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
<class 'wx._core.CommandEvent'>
<class 'wx._core.CommandEvent'>
<class 'wx._core.CommandEvent'>
一对多事件处理
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='一对多事件处理', size=(300, 180))
self.Centre()
panel = wx.Panel(parent=self)
self.statictext = wx.StaticText(parent=panel, pos=(110, 15))
b1 = wx.Button(parent=panel, id=10, label='Button1', pos=(100, 45))
b2 = wx.Button(parent=panel, id=11, label='Button2', pos=(100, 85))
self.Bind(wx.EVT_BUTTON, self.on_click, id=10, id2=20)
def on_click(self, event):
event_id = event.GetId()
print(event_id)
if event_id == 10:
self.statictext.SetLabelText('Button1单击')
else:
self.statictext.SetLabelText('Button2单击')
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
10
11
10
11
10
11
10
11
示例:鼠标事件处理
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title="鼠标事件处理", size=(400, 300))
self.Centre()
self.Bind(wx.EVT_LEFT_DOWN, self.on_left_down)
self.Bind(wx.EVT_LEFT_UP, self.on_left_up)
self.Bind(wx.EVT_MOTION, self.on_mouse_move)
def on_left_down(self, evt):
print('鼠标按下')
def on_left_up(self, evt):
print('鼠标释放')
def on_mouse_move(self, event):
if event.Dragging() and event.LeftIsDown():
pos = event.GetPosition()
print(pos)
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
鼠标按下
鼠标释放
鼠标按下
鼠标释放
鼠标按下
鼠标释放
鼠标按下
(129, 99)
鼠标释放
鼠标按下
鼠标释放
鼠标按下
鼠标释放
鼠标按下
(58, 114)
(60, 115)
(61, 116)
(62, 117)
(63, 117)
(64, 117)
(66, 118)
(67, 119)
(69, 119)
(73, 119)
(79, 119)
鼠标释放
鼠标按下
鼠标释放
鼠标按下
(81, 169)
(75, 170)
(72, 170)
(68, 171)
(65, 171)
(63, 171)
(61, 171)
(60, 171)
鼠标释放
鼠标按下
鼠标释放
鼠标按下
鼠标释放
鼠标按下
(201, 55)
(202, 55)
(204, 57)
(206, 59)
(208, 61)
(211, 63)
(214, 65)
(217, 68)
(221, 71)
(224, 74)
(228, 77)
(232, 80)
(235, 83)
(239, 85)
(241, 87)
(243, 88)
(245, 89)
(246, 91)
(249, 92)
(251, 92)
(252, 93)
(253, 93)
(254, 93)
(255, 93)
(256, 93)
(257, 94)
(259, 94)
(260, 94)
(261, 94)
(262, 94)
(264, 94)
(265, 94)
(266, 94)
(267, 94)
(269, 94)
(270, 94)
(272, 94)
(273, 94)
(275, 94)
(276, 94)
(277, 94)
(277, 93)
(278, 92)
(279, 91)
(279, 90)
(279, 89)
(279, 88)
(279, 87)
(280, 86)
(280, 85)
(280, 84)
(280, 83)
(279, 83)
(278, 84)
(277, 85)
(274, 87)
(272, 88)
(268, 91)
(264, 94)
(259, 97)
(253, 102)
(247, 107)
(240, 111)
(233, 116)
(227, 120)
(222, 123)
(219, 125)
(215, 128)
(211, 131)
(207, 133)
(201, 135)
(197, 137)
(194, 138)
(190, 139)
(186, 140)
(184, 141)
(180, 141)
(177, 141)
(175, 141)
(171, 141)
(169, 141)
(166, 140)
(162, 139)
(158, 137)
(154, 135)
(153, 133)
(149, 131)
(143, 127)
(138, 123)
(133, 120)
(129, 116)
(125, 113)
(121, 108)
(117, 104)
(114, 100)
(112, 97)
(111, 94)
(108, 88)
(106, 84)
(105, 80)
(105, 77)
(105, 73)
(105, 70)
(106, 67)
(107, 63)
(108, 61)
(110, 58)
(112, 55)
(114, 53)
(116, 51)
(119, 48)
(122, 46)
(125, 44)
(128, 43)
(132, 41)
(135, 40)
(140, 39)
(145, 38)
(150, 38)
(155, 37)
(161, 37)
(166, 37)
(171, 37)
(175, 37)
(179, 37)
(181, 37)
(185, 38)
(189, 40)
(191, 41)
(194, 43)
(197, 45)
(200, 47)
(202, 48)
(205, 50)
(208, 52)
(209, 55)
(212, 57)
(214, 59)
(216, 62)
(217, 65)
(219, 67)
(221, 70)
(222, 73)
(224, 75)
(224, 78)
(224, 79)
(224, 81)
(224, 83)
(224, 86)
(224, 88)
(224, 90)
(224, 92)
(224, 94)
(223, 96)
(222, 99)
(220, 100)
(219, 102)
(216, 104)
(213, 107)
(209, 109)
(205, 111)
(201, 113)
(196, 114)
(191, 115)
(187, 115)
(182, 116)
(179, 116)
(174, 116)
(167, 116)
(163, 116)
(158, 116)
(153, 116)
(149, 116)
(145, 115)
(143, 114)
(139, 113)
(135, 111)
(132, 110)
(130, 108)
(128, 107)
(126, 106)
(125, 105)
(123, 103)
(122, 102)
(120, 100)
(118, 96)
(116, 92)
(115, 87)
(115, 83)
(114, 79)
(114, 76)
(114, 72)
(115, 68)
(116, 65)
(117, 63)
(119, 59)
(122, 55)
(125, 52)
(128, 48)
(134, 45)
(138, 42)
(145, 39)
(153, 37)
(162, 34)
(168, 34)
(181, 34)
(191, 34)
(200, 34)
(210, 35)
(218, 37)
(228, 41)
(237, 45)
(246, 50)
(253, 54)
(259, 59)
(265, 64)
(270, 69)
(276, 74)
(281, 80)
(283, 84)
(286, 91)
(288, 96)
(292, 103)
(293, 107)
(294, 112)
(294, 116)
(294, 121)
(294, 124)
(294, 127)
(294, 129)
(292, 132)
(291, 135)
(291, 137)
(289, 139)
(287, 142)
(284, 144)
(283, 145)
(280, 147)
(277, 149)
(276, 150)
(273, 151)
(269, 153)
(264, 153)
(259, 154)
(254, 154)
(249, 154)
(244, 154)
(237, 152)
(232, 151)
(228, 150)
(223, 147)
(216, 145)
(212, 142)
(206, 138)
(203, 135)
(200, 133)
(198, 130)
(195, 127)
(194, 123)
(192, 122)
(192, 118)
(191, 116)
(191, 112)
(191, 109)
(192, 106)
(193, 104)
(195, 101)
(196, 100)
(198, 98)
(200, 96)
(201, 95)
(203, 95)
(206, 94)
(208, 93)
(211, 93)
(214, 93)
(216, 93)
(219, 94)
(221, 94)
(223, 96)
(226, 99)
(229, 102)
(232, 107)
(237, 113)
(240, 119)
(242, 126)
(245, 131)
(247, 136)
(247, 140)
(247, 145)
(247, 150)
(247, 153)
(246, 157)
(243, 161)
(240, 165)
(237, 168)
(233, 171)
(227, 175)
(223, 177)
(214, 180)
(203, 182)
(190, 182)
(178, 182)
(166, 182)
(152, 181)
(139, 180)
(126, 177)
(113, 175)
(105, 171)
(96, 167)
(91, 165)
(87, 163)
(85, 163)
(85, 162)
(84, 161)
(84, 160)
(83, 158)
鼠标释放
布局管理
Box布局器
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='Box布局', size=(300, 120))
self.Centre()
panel = wx.Panel(parent=self)
vbox = wx.BoxSizer(wx.VERTICAL)
self.statictext = wx.StaticText(parent=panel, label='Button1单击')
vbox.Add(self.statictext, proportion=2, flag=wx.FIXED_MINSIZE | wx.TOP | wx.CENTER, border=10)
b1 = wx.Button(parent=panel, id=10, label='Button1')
b2 = wx.Button(parent=panel, id=11, label='Button2')
self.Bind(wx.EVT_BUTTON, self.on_click, id=10, id2=20)
hbox = wx.BoxSizer(wx.HORIZONTAL)
hbox.Add(b1, 0, wx.EXPAND | wx.BOTTOM, 5)
hbox.Add(b2, 0, wx.EXPAND | wx.BOTTOM, 5)
vbox.Add(hbox, proportion=1, flag=wx.CENTER)
panel.SetSizer(vbox)
def on_click(self, event):
event_id = event.GetId()
print(event_id)
if event_id == 10:
self.statictext.SetLabelText('Button1单击')
else:
self.statictext.SetLabelText('Button2单击')
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
10
11
10
11
10
11
10
11
StaticBox布局
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='StaticBox布局', size=(300, 120))
self.Centre()
panel = wx.Panel(parent=self)
vbox = wx.BoxSizer(wx.VERTICAL)
self.statictext = wx.StaticText(parent=panel, label='Button1单击')
vbox.Add(self.statictext, proportion=2, flag=wx.FIXED_MINSIZE | wx.TOP | wx.CENTER, border=10)
b1 = wx.Button(parent=panel, id=10, label='Button1')
b2 = wx.Button(parent=panel, id=11, label='Button2')
self.Bind(wx.EVT_BUTTON, self.on_click, id=10, id2=20)
sb = wx.StaticBox(panel, label="按钮框")
hsbox = wx.StaticBoxSizer(sb, wx.HORIZONTAL)
hsbox.Add(b1, 0, wx.EXPAND | wx.BOTTOM, 5)
hsbox.Add(b2, 0, wx.EXPAND | wx.BOTTOM, 5)
vbox.Add(hsbox, proportion=1, flag=wx.CENTER)
panel.SetSizer(vbox)
def on_click(self, event):
event_id = event.GetId()
print(event_id)
if event_id == 10:
self.statictext.SetLabelText('Button1单击')
else:
self.statictext.SetLabelText('Button2单击')
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
11
10
11
Grid布局
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='Grid布局', size=(300, 300))
self.Centre()
panel = wx.Panel(self)
btn1 = wx.Button(panel, label='1')
btn2 = wx.Button(panel, label='2')
btn3 = wx.Button(panel, label='3')
btn4 = wx.Button(panel, label='4')
btn5 = wx.Button(panel, label='5')
btn6 = wx.Button(panel, label='6')
btn7 = wx.Button(panel, label='7')
btn8 = wx.Button(panel, label='8')
btn9 = wx.Button(panel, label='9')
grid = wx.GridSizer(cols=3, rows=3, vgap=0, hgap=0)
grid.Add(btn1, 0, wx.EXPAND)
grid.Add(btn2, 0, wx.EXPAND)
grid.Add(btn3, 0, wx.EXPAND)
grid.Add(btn4, 0, wx.EXPAND)
grid.Add(btn5, 0, wx.EXPAND)
grid.Add(btn6, 0, wx.EXPAND)
grid.Add(btn7, 0, wx.EXPAND)
grid.Add(btn8, 0, wx.EXPAND)
grid.Add(btn9, 0, wx.EXPAND)
panel.SetSizer(grid)
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
FlexGrid布局
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='FlexGrid布局', size=(400, 200))
self.Centre()
panel = wx.Panel(parent=self)
fgs = wx.FlexGridSizer(3, 2, 10, 10)
title = wx.StaticText(panel, label="标题:")
author = wx.StaticText(panel, label="作者名:")
review = wx.StaticText(panel, label="内容:")
tc1 = wx.TextCtrl(panel)
tc2 = wx.TextCtrl(panel)
tc3 = wx.TextCtrl(panel, style=wx.TE_MULTILINE)
fgs.AddMany([title, (tc1, 1, wx.EXPAND),
author, (tc2, 1, wx.EXPAND),
review, (tc3, 1, wx.EXPAND)])
fgs.AddGrowableRow(0, 1)
fgs.AddGrowableRow(1, 1)
fgs.AddGrowableRow(2, 3)
fgs.AddGrowableCol(0, 1)
fgs.AddGrowableCol(1, 2)
hbox = wx.BoxSizer(wx.HORIZONTAL)
hbox.Add(fgs, proportion=1, flag=wx.ALL | wx.EXPAND, border=15)
panel.SetSizer(hbox)
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
wxPython控件
静态文本和按钮
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='静态文本和按钮', size=(300, 200))
self.Centre()
panel = wx.Panel(parent=self)
vbox = wx.BoxSizer(wx.VERTICAL)
self.statictext = wx.StaticText(parent=panel, label='StaticText1', style=wx.ALIGN_CENTRE_HORIZONTAL)
b1 = wx.Button(parent=panel, label='OK')
self.Bind(wx.EVT_BUTTON, self.on_click, b1)
b2 = wx.ToggleButton(panel, -1, 'ToggleButton')
self.Bind(wx.EVT_BUTTON, self.on_click, b2)
bmp = wx.Bitmap('icon/1.png', wx.BITMAP_TYPE_PNG)
b3 = wx.BitmapButton(panel, -1, bmp)
self.Bind(wx.EVT_BUTTON, self.on_click, b3)
vbox.Add(100, 10, proportion=1, flag=wx.CENTER | wx.FIXED_MINSIZE)
vbox.Add(self.statictext, proportion=1, flag=wx.CENTER | wx.FIXED_MINSIZE)
vbox.Add(b1, proportion=1, flag=wx.CENTER | wx.EXPAND)
vbox.Add(b2, proportion=1, flag=wx.CENTER | wx.EXPAND)
vbox.Add(b3, proportion=1, flag=wx.CENTER | wx.EXPAND)
panel.SetSizer(vbox)
def on_click(self, event):
self.statictext.SetLabelText('Hello, world.')
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
文本输入控件
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='文本框', size=(400, 200))
self.Centre()
panel = wx.Panel(self)
hbox = wx.BoxSizer(wx.HORIZONTAL)
fgs = wx.FlexGridSizer(3, 2, 10, 10)
userid = wx.StaticText(panel, label="用户ID:")
pwd = wx.StaticText(panel, label="密码:")
content = wx.StaticText(panel, label="多行文本:")
tc1 = wx.TextCtrl(panel)
tc2 = wx.TextCtrl(panel, style=wx.TE_PASSWORD)
tc3 = wx.TextCtrl(panel, style=wx.TE_MULTILINE)
tc1.SetValue('tony')
print('读取用户ID:{0}'.format(tc1.GetValue()))
fgs.AddMany([userid, (tc1, 1, wx.EXPAND),
pwd, (tc2, 1, wx.EXPAND),
content, (tc3, 1, wx.EXPAND)])
fgs.AddGrowableRow(0, 1)
fgs.AddGrowableRow(1, 1)
fgs.AddGrowableRow(2, 3)
fgs.AddGrowableCol(0, 1)
fgs.AddGrowableCol(1, 2)
hbox.Add(fgs, proportion=1, flag=wx.ALL | wx.EXPAND, border=15)
panel.SetSizer(hbox)
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
读取用户ID:tony
复选框和单选按钮
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='复选框和单选按钮', size=(400, 130))
self.Centre()
panel = wx.Panel(self)
hbox1 = wx.BoxSizer(wx.HORIZONTAL)
statictext = wx.StaticText(panel, label='选择你喜欢的编程语言:')
cb1 = wx.CheckBox(panel, 1, 'Python')
cb2 = wx.CheckBox(panel, 2, 'Java')
cb2.SetValue(True)
cb3 = wx.CheckBox(panel, 3, 'C++')
self.Bind(wx.EVT_CHECKBOX, self.on_checkbox_click, id=1, id2=3)
hbox1.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
hbox1.Add(cb1, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
hbox1.Add(cb2, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
hbox1.Add(cb3, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
hbox2 = wx.BoxSizer(wx.HORIZONTAL)
statictext = wx.StaticText(panel, label='选择性别:')
radio1 = wx.RadioButton(panel, 4, '男', style=wx.RB_GROUP)
radio2 = wx.RadioButton(panel, 5, '女')
self.Bind(wx.EVT_RADIOBUTTON, self.on_radio1_click, id=4, id2=5)
hbox2.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
hbox2.Add(radio1, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
hbox2.Add(radio2, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
hbox3 = wx.BoxSizer(wx.HORIZONTAL)
statictext = wx.StaticText(panel, label='选择你最喜欢吃的水果:')
radio3 = wx.RadioButton(panel, 6, '苹果', style=wx.RB_GROUP)
radio4 = wx.RadioButton(panel, 7, '橘子')
radio5 = wx.RadioButton(panel, 8, '香蕉')
self.Bind(wx.EVT_RADIOBUTTON, self.on_radio2_click, id=6, id2=8)
hbox3.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
hbox3.Add(radio3, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
hbox3.Add(radio4, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
hbox3.Add(radio5, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
vbox = wx.BoxSizer(wx.VERTICAL)
vbox.Add(hbox1, 1, flag=wx.ALL | wx.EXPAND, border=5)
vbox.Add(hbox2, 1, flag=wx.ALL | wx.EXPAND, border=5)
vbox.Add(hbox3, 1, flag=wx.ALL | wx.EXPAND, border=5)
panel.SetSizer(vbox)
def on_checkbox_click(self, event):
cb = event.GetEventObject()
print('选择 {0},状态{1}'.format(cb.GetLabel(), event.IsChecked()))
def on_radio1_click(self, event):
rb = event.GetEventObject()
print('第一组 {0} 被选中'.format(rb.GetLabel()))
def on_radio2_click(self, event):
rb = event.GetEventObject()
print('第二组 {0} 被选中'.format(rb.GetLabel()))
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
第二组 橘子 被选中
第二组 香蕉 被选中
第一组 女 被选中
选择 C++,状态True
选择 Python,状态True
下拉列表
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='下拉列表', size=(400, 130))
self.Centre()
panel = wx.Panel(self)
hbox1 = wx.BoxSizer(wx.HORIZONTAL)
statictext = wx.StaticText(panel, label='选择你喜欢的编程语言:')
list1 = ['Python', 'C++', 'Java']
ch1 = wx.ComboBox(panel, -1, value='C', choices=list1, style=wx.CB_SORT)
self.Bind(wx.EVT_COMBOBOX, self.on_combobox, ch1)
hbox1.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
hbox1.Add(ch1, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
hbox2 = wx.BoxSizer(wx.HORIZONTAL)
statictext = wx.StaticText(panel, label='选择性别:')
list2 = ['男', '女']
ch2 = wx.Choice(panel, -1, choices=list2)
self.Bind(wx.EVT_CHOICE, self.on_choice, ch2)
hbox2.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
hbox2.Add(ch2, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
vbox = wx.BoxSizer(wx.VERTICAL)
vbox.Add(hbox1, 1, flag=wx.ALL | wx.EXPAND, border=5)
vbox.Add(hbox2, 1, flag=wx.ALL | wx.EXPAND, border=5)
panel.SetSizer(vbox)
def on_combobox(self, event):
print('选择 {0}'.format(event.GetString()))
def on_choice(self, event):
print('选择 {0}'.format(event.GetString()))
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
选择 Java
列表
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='下拉列表', size=(350, 180))
self.Centre()
panel = wx.Panel(self)
hbox1 = wx.BoxSizer(wx.HORIZONTAL)
statictext = wx.StaticText(panel, label='选择你喜欢的编程语言:')
list1 = ['Python', 'C++', 'Java']
lb1 = wx.ListBox(panel, -1, choices=list1, style=wx.LB_SINGLE)
self.Bind(wx.EVT_LISTBOX, self.on_listbox1, lb1)
hbox1.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
hbox1.Add(lb1, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
hbox2 = wx.BoxSizer(wx.HORIZONTAL)
statictext = wx.StaticText(panel, label='选择你喜欢吃的水果:')
list2 = ['苹果', '橘子', '香蕉']
lb2 = wx.ListBox(panel, -1, choices=list2, style=wx.LB_EXTENDED)
self.Bind(wx.EVT_LISTBOX, self.on_listbox2, lb2)
hbox2.Add(statictext, 1, flag=wx.LEFT | wx.RIGHT | wx.FIXED_MINSIZE, border=5)
hbox2.Add(lb2, 1, flag=wx.ALL | wx.FIXED_MINSIZE)
vbox = wx.BoxSizer(wx.VERTICAL)
vbox.Add(hbox1, 1, flag=wx.ALL | wx.EXPAND, border=5)
vbox.Add(hbox2, 1, flag=wx.ALL | wx.EXPAND, border=5)
panel.SetSizer(vbox)
def on_listbox1(self, event):
listbox = event.GetEventObject()
print('选择 {0}'.format(listbox.GetSelection()))
def on_listbox2(self, event):
listbox = event.GetEventObject()
print('选择 {0}'.format(listbox.GetSelections()))
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
选择 1
选择 2
静态图片控件
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='静态图片控件', size=(300, 300))
self.bmps = [wx.Bitmap('images/bird5.gif', wx.BITMAP_TYPE_GIF),
wx.Bitmap('images/bird4.gif', wx.BITMAP_TYPE_GIF),
wx.Bitmap('images/bird3.gif', wx.BITMAP_TYPE_GIF)]
self.Centre()
self.panel = wx.Panel(parent=self)
vbox = wx.BoxSizer(wx.VERTICAL)
b1 = wx.Button(parent=self.panel, id=1, label='Button1')
b2 = wx.Button(self.panel, id=2, label='Button2')
self.Bind(wx.EVT_BUTTON, self.on_click, id=1, id2=2)
self.image = wx.StaticBitmap(self.panel, -1, self.bmps[0])
vbox.Add(b1, proportion=1, flag=wx.CENTER | wx.EXPAND)
vbox.Add(b2, proportion=1, flag=wx.CENTER | wx.EXPAND)
vbox.Add(self.image, proportion=3, flag=wx.CENTER)
self.panel.SetSizer(vbox)
def on_click(self, event):
event_id = event.GetId()
if event_id == 1:
self.image.SetBitmap(self.bmps[1])
else:
self.image.SetBitmap(self.bmps[2])
self.panel.Layout()
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
高级窗口
分隔窗口
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='分隔窗口', size=(350, 180))
self.Centre()
splitter = wx.SplitterWindow(self, -1)
leftpanel = wx.Panel(splitter)
rightpanel = wx.Panel(splitter)
splitter.SplitVertically(leftpanel, rightpanel, 100)
splitter.SetMinimumPaneSize(80)
list2 = ['苹果', '橘子', '香蕉']
lb2 = wx.ListBox(leftpanel, -1, choices=list2, style=wx.LB_SINGLE)
self.Bind(wx.EVT_LISTBOX, self.on_listbox, lb2)
vbox1 = wx.BoxSizer(wx.VERTICAL)
vbox1.Add(lb2, 1, flag=wx.ALL | wx.EXPAND, border=5)
leftpanel.SetSizer(vbox1)
vbox2 = wx.BoxSizer(wx.VERTICAL)
self.content = wx.StaticText(rightpanel, label='右侧面板')
vbox2.Add(self.content, 1, flag=wx.ALL | wx.EXPAND, border=5)
rightpanel.SetSizer(vbox2)
def on_listbox(self, event):
s = '选择 {0}'.format(event.GetString())
self.content.SetLabel(s)
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
使用树
import wx
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='树控件', size=(500, 400))
self.Centre()
splitter = wx.SplitterWindow(self)
leftpanel = wx.Panel(splitter)
rightpanel = wx.Panel(splitter)
splitter.SplitVertically(leftpanel, rightpanel, 200)
splitter.SetMinimumPaneSize(80)
self.tree = self.CreateTreeCtrl(leftpanel)
self.Bind(wx.EVT_TREE_SEL_CHANGING, self.on_click, self.tree)
vbox1 = wx.BoxSizer(wx.VERTICAL)
vbox1.Add(self.tree, 1, flag=wx.ALL | wx.EXPAND, border=5)
leftpanel.SetSizer(vbox1)
vbox2 = wx.BoxSizer(wx.VERTICAL)
self.content = wx.StaticText(rightpanel, label='右侧面板')
vbox2.Add(self.content, 1, flag=wx.ALL | wx.EXPAND, border=5)
rightpanel.SetSizer(vbox2)
def on_click(self, event):
item = event.GetItem()
self.content.SetLabel(self.tree.GetItemText(item))
def CreateTreeCtrl(self, parent):
tree = wx.TreeCtrl(parent)
items = []
imglist = wx.ImageList(16, 16, True, 2)
imglist.Add(wx.ArtProvider.GetBitmap(wx.ART_FOLDER, size=wx.Size(16, 16)))
imglist.Add(wx.ArtProvider.GetBitmap(wx.ART_NORMAL_FILE, size=wx.Size(16, 16)))
tree.AssignImageList(imglist)
root = tree.AddRoot("TreeRoot", image=0)
items.append(tree.AppendItem(root, "Item 1", 0))
items.append(tree.AppendItem(root, "Item 2", 0))
items.append(tree.AppendItem(root, "Item 3", 0))
items.append(tree.AppendItem(root, "Item 4", 0))
items.append(tree.AppendItem(root, "Item 5", 0))
for ii in range(len(items)):
id = items[ii]
tree.AppendItem(id, "Subitem 1", 1)
tree.AppendItem(id, "Subitem 2", 1)
tree.AppendItem(id, "Subitem 3", 1)
tree.AppendItem(id, "Subitem 4", 1)
tree.AppendItem(id, "Subitem 5", 1)
tree.Expand(root)
tree.Expand(items[0])
tree.Expand(items[3])
tree.SelectItem(root)
return tree
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
使用网络
import wx
import wx.grid
data = [['0036', '高等数学', '李放', '人民邮电出版社', '20000812', '1'],
['0004', 'FLASH精选', '刘扬', '中国纺织出版社', '19990312', '2'],
['0026', '软件工程', '牛田', '经济科学出版社', '20000328', '4'],
['0015', '人工智能', '周未', '机械工业出版社', '19991223', '3'],
['0037', '南方周末', '邓光明', '南方出版社', '20000923', '3'],
['0008', '新概念3', '余智', '外语出版社', '19990723', '2'],
['0019', '通讯与网络', '欧阳杰', '机械工业出版社', '20000517', '1'],
['0014', '期货分析', '孙宝', '飞鸟出版社', '19991122', '3'],
['0023', '经济概论', '思佳', '北京大学出版社', '20000819', '3'],
['0017', '计算机理论基础', '戴家', '机械工业出版社', '20000218', '4'],
['0002', '汇编语言', '李利光', '北京大学出版社', '19980318', '2'],
['0033', '模拟电路', '邓英才', '电子工业出版社', '20000527', '2'],
['0011', '南方旅游', '王爱国', '南方出版社', '19990930', '2'],
['0039', '黑幕', '李仪', '华光出版社', '20000508', '14'],
['0001', '软件工程', '戴国强', '机械工业出版社', '19980528', '2'],
['0034', '集邮爱好者', '李云', '人民邮电出版社', '20000630', '1'],
['0031', '软件工程', '戴志名', '电子工业出版社', '20000324', '3'],
['0030', '数据库及应用', '孙家萧', '清华大学出版社', '20000619', '1'],
['0024', '经济与科学', '毛波', '经济科学出版社', '20000923', '2'],
['0009', '军事要闻', '张强', '解放军出版社', '19990722', '3'],
['0003', '计算机基础', '王飞', '经济科学出版社', '19980218', '1'],
['0020', '现代操作系统', '王小国', '机械工业出版社', '20010128', '1'],
['0025', '计算机体系结构', '方丹', '机械工业出版社', '20000328', '4'],
['0010', '大众生活', '许阳', '电子出版社', '19990819', '3'],
['0021', '网络基础', '王大尉', '北京大学出版社', '20000617', '1'],
['0006', '世界杯', '柳飞', '世界出版社', '19990412', '2'],
['0028', '高级语言程序设计', '寇国华', '清华大学出版社', '20000117', '3'],
['0038', '十大旅游胜地', '潭晓明', '南方出版社', '20000403', '2'],
['0018', '编译原理', '郑键', '机械工业出版社', '20000415', '2'],
['0007', 'JAVA程序设计', '张余', '人民邮电出版社', '19990613', '1'],
['0013', '幽灵', '钱力华', '华光出版社', '19991008', '1'],
['0022', '万紫千红', '丛丽', '北京大学出版社', '20000702', '3'],
['0027', '世界语言大观', '候丙辉', '经济科学出版社', '20000814', '2'],
['0029', '操作系统概论', '聂元名', '清华大学出版社', '20001028', '1'],
['0016', '数据库系统概念', '吴红', '机械工业出版社', '20000328', '3'],
['0005', 'java基础', '王一', '电子工业出版社', '19990528', '3'],
['0032', 'SQL使用手册', '贺民', '电子工业出版社', '19990425', '2']]
column_names = ['书籍编号', '书籍名称', '作者', '出版社', '出版日期', '库存数量']
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='网格控件', size=(550, 500))
self.Centre()
self.grid = self.CreateGrid(self)
self.Bind(wx.grid.EVT_GRID_LABEL_LEFT_CLICK, self.OnLabelLeftClick)
def OnLabelLeftClick(self, event):
print("RowIdx:{0}".format(event.GetRow()))
print("ColIdx:{0}".format(event.GetCol()))
print(data[event.GetRow()])
event.Skip()
def CreateGrid(self, parent):
grid = wx.grid.Grid(parent)
grid.CreateGrid(len(data), len(data[0]))
for row in range(len(data)):
for col in range(len(data[row])):
grid.SetColLabelValue(col, column_names[col])
grid.SetCellValue(row, col, data[row][col])
grid.AutoSize()
return grid
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
import wx
import wx.grid
data = [['0036', '高等数学', '李放', '人民邮电出版社', '20000812', '1'],
['0004', 'FLASH精选', '刘扬', '中国纺织出版社', '19990312', '2'],
['0026', '软件工程', '牛田', '经济科学出版社', '20000328', '4'],
['0015', '人工智能', '周未', '机械工业出版社', '19991223', '3'],
['0037', '南方周末', '邓光明', '南方出版社', '20000923', '3'],
['0008', '新概念3', '余智', '外语出版社', '19990723', '2'],
['0019', '通讯与网络', '欧阳杰', '机械工业出版社', '20000517', '1'],
['0014', '期货分析', '孙宝', '飞鸟出版社', '19991122', '3'],
['0023', '经济概论', '思佳', '北京大学出版社', '20000819', '3'],
['0017', '计算机理论基础', '戴家', '机械工业出版社', '20000218', '4'],
['0002', '汇编语言', '李利光', '北京大学出版社', '19980318', '2'],
['0033', '模拟电路', '邓英才', '电子工业出版社', '20000527', '2'],
['0011', '南方旅游', '王爱国', '南方出版社', '19990930', '2'],
['0039', '黑幕', '李仪', '华光出版社', '20000508', '14'],
['0001', '软件工程', '戴国强', '机械工业出版社', '19980528', '2'],
['0034', '集邮爱好者', '李云', '人民邮电出版社', '20000630', '1'],
['0031', '软件工程', '戴志名', '电子工业出版社', '20000324', '3'],
['0030', '数据库及应用', '孙家萧', '清华大学出版社', '20000619', '1'],
['0024', '经济与科学', '毛波', '经济科学出版社', '20000923', '2'],
['0009', '军事要闻', '张强', '解放军出版社', '19990722', '3'],
['0003', '计算机基础', '王飞', '经济科学出版社', '19980218', '1'],
['0020', '现代操作系统', '王小国', '机械工业出版社', '20010128', '1'],
['0025', '计算机体系结构', '方丹', '机械工业出版社', '20000328', '4'],
['0010', '大众生活', '许阳', '电子出版社', '19990819', '3'],
['0021', '网络基础', '王大尉', '北京大学出版社', '20000617', '1'],
['0006', '世界杯', '柳飞', '世界出版社', '19990412', '2'],
['0028', '高级语言程序设计', '寇国华', '清华大学出版社', '20000117', '3'],
['0038', '十大旅游胜地', '潭晓明', '南方出版社', '20000403', '2'],
['0018', '编译原理', '郑键', '机械工业出版社', '20000415', '2'],
['0007', 'JAVA程序设计', '张余', '人民邮电出版社', '19990613', '1'],
['0013', '幽灵', '钱力华', '华光出版社', '19991008', '1'],
['0022', '万紫千红', '丛丽', '北京大学出版社', '20000702', '3'],
['0027', '世界语言大观', '候丙辉', '经济科学出版社', '20000814', '2'],
['0029', '操作系统概论', '聂元名', '清华大学出版社', '20001028', '1'],
['0016', '数据库系统概念', '吴红', '机械工业出版社', '20000328', '3'],
['0005', 'java基础', '王一', '电子工业出版社', '19990528', '3'],
['0032', 'SQL使用手册', '贺民', '电子工业出版社', '19990425', '2']]
column_names = ['书籍编号', '书籍名称书籍名称', '作者', '出版社', '出版日期', '库存数量']
class MyGridTable(wx.grid.GridTableBase):
def __init__(self):
super().__init__()
self.colLabels = column_names
def GetNumberRows(self):
return len(data)
def GetNumberCols(self):
return len(data[0])
def GetValue(self, row, col):
return data[row][col]
def GetColLabelValue(self, col):
return self.colLabels[col]
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='网格控件', size=(550, 500))
self.Centre()
self.grid = self.CreateGrid(self)
self.Bind(wx.grid.EVT_GRID_LABEL_LEFT_CLICK, self.OnLabelLeftClick)
def OnLabelLeftClick(self, event):
print("RowIdx:{0}".format(event.GetRow()))
print("ColIdx:{0}".format(event.GetCol()))
print(data[event.GetRow()])
event.Skip()
def CreateGrid(self, parent):
grid = wx.grid.Grid(parent)
tablebase = MyGridTable()
grid.SetTable(tablebase, True)
grid.AutoSize()
return grid
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
使用菜单
import wx
import wx.grid
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='使用菜单', size=(550, 500))
self.Centre()
self.text = wx.TextCtrl(self, -1, style=wx.EXPAND | wx.TE_MULTILINE)
vbox = wx.BoxSizer(wx.VERTICAL)
vbox.Add(self.text, proportion=1, flag=wx.EXPAND | wx.ALL, border=1)
self.SetSizer(vbox)
menubar = wx.MenuBar()
file_menu = wx.Menu()
new_item = wx.MenuItem(file_menu, wx.ID_NEW, text="新建", kind=wx.ITEM_NORMAL)
self.Bind(wx.EVT_MENU, self.on_newitem_click, id=wx.ID_NEW)
file_menu.Append(new_item)
file_menu.AppendSeparator()
edit_menu = wx.Menu()
copy_item = wx.MenuItem(edit_menu, 100, text="复制", kind=wx.ITEM_NORMAL)
edit_menu.Append(copy_item)
cut_item = wx.MenuItem(edit_menu, 101, text="剪切", kind=wx.ITEM_NORMAL)
edit_menu.Append(cut_item)
paste_item = wx.MenuItem(edit_menu, 102, text="粘贴", kind=wx.ITEM_NORMAL)
edit_menu.Append(paste_item)
self.Bind(wx.EVT_MENU, self.on_editmenu_click, id=100, id2=102)
file_menu.Append(wx.ID_ANY, "编辑", edit_menu)
menubar.Append(file_menu, '文件')
self.SetMenuBar(menubar)
def on_newitem_click(self, event):
self.text.SetLabel('单击【新建】菜单')
def on_editmenu_click(self, event):
event_id = event.GetId()
if event_id == 100:
self.text.SetLabel('单击【复制】菜单')
elif event_id == 101:
self.text.SetLabel('单击【剪切】菜单')
else:
self.text.SetLabel('单击【粘贴】菜单')
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
C:\Users\HP\AppData\Local\Temp\ipykernel_21396\3458874188.py:36: DeprecationWarning: Menu.Append() is deprecated
file_menu.Append(wx.ID_ANY, "编辑", edit_menu)
使用工具栏
import wx
import wx.grid
class MyFrame(wx.Frame):
def __init__(self):
super().__init__(parent=None, title='使用工具栏', size=(550, 500))
self.Centre()
self.Show(True)
self.text = wx.TextCtrl(self, -1, style=wx.EXPAND | wx.TE_MULTILINE)
vbox = wx.BoxSizer(wx.VERTICAL)
vbox.Add(self.text, proportion=1, flag=wx.EXPAND | wx.ALL, border=1)
self.SetSizer(vbox)
menubar = wx.MenuBar()
file_menu = wx.Menu()
new_item = wx.MenuItem(file_menu, wx.ID_NEW, text="新建", kind=wx.ITEM_NORMAL)
file_menu.Append(new_item)
file_menu.AppendSeparator()
edit_menu = wx.Menu()
copy_item = wx.MenuItem(edit_menu, 100, text="复制", kind=wx.ITEM_NORMAL)
edit_menu.Append(copy_item)
cut_item = wx.MenuItem(edit_menu, 101, text="剪切", kind=wx.ITEM_NORMAL)
edit_menu.Append(cut_item)
paste_item = wx.MenuItem(edit_menu, 102, text="粘贴", kind=wx.ITEM_NORMAL)
edit_menu.Append(paste_item)
file_menu.Append(wx.ID_ANY, "编辑", edit_menu)
menubar.Append(file_menu, '文件')
self.SetMenuBar(menubar)
tb = wx.ToolBar(self, wx.ID_ANY)
self.ToolBar = tb
tsize = (24, 24)
new_bmp = wx.ArtProvider.GetBitmap(wx.ART_NEW, wx.ART_TOOLBAR, tsize)
open_bmp = wx.ArtProvider.GetBitmap(wx.ART_FILE_OPEN, wx.ART_TOOLBAR, tsize)
copy_bmp = wx.ArtProvider.GetBitmap(wx.ART_COPY, wx.ART_TOOLBAR, tsize)
paste_bmp = wx.ArtProvider.GetBitmap(wx.ART_PASTE, wx.ART_TOOLBAR, tsize)
tb.AddTool(10, "New", new_bmp, kind=wx.ITEM_NORMAL, shortHelp="New")
tb.AddTool(20, "Open", open_bmp, kind=wx.ITEM_NORMAL, shortHelp="Open")
tb.AddSeparator()
tb.AddTool(30, "Copy", copy_bmp, kind=wx.ITEM_NORMAL, shortHelp="Copy")
tb.AddTool(40, "Paste", paste_bmp, kind=wx.ITEM_NORMAL, shortHelp="Paste")
tb.AddSeparator()
tb.AddTool(201, "back", wx.Bitmap("menu_icon/back.png"), kind=wx.ITEM_NORMAL, shortHelp="Back")
tb.AddTool(202, "forward", wx.Bitmap("menu_icon/forward.png"), kind=wx.ITEM_NORMAL, shortHelp="Forward")
self.Bind(wx.EVT_MENU, self.on_click, id=201, id2=202)
tb.AddSeparator()
tb.Realize()
def on_click(self, event):
event_id = event.GetId()
if event_id == 201:
self.text.SetLabel('单击【Back】按钮')
else:
self.text.SetLabel('单击【Forward】按钮')
class App(wx.App):
def OnInit(self):
frame = MyFrame()
frame.Show()
return True
if __name__ == '__main__':
app = App()
app.MainLoop()
C:\Users\HP\AppData\Local\Temp\ipykernel_24844\2637029235.py:34: DeprecationWarning: Menu.Append() is deprecated
file_menu.Append(wx.ID_ANY, "编辑", edit_menu)
项目实战1:网络爬虫余爬取股票数据
网络爬虫基数概述
网络通信技术
多线程技术
数据交换技术
web前端技术
数据存储技术
爬取数据
网页中静态和动态数据
使用urllib爬取数据
- 获得静态数据
import urllib.request
url = "file:///C:/Users/HP/nasdaq-Apple1.html"
req = urllib.request.Request(url)
with urllib.request.urlopen(req) as response:
data = response.read()
htmlstr = data.decode()
print(htmlstr)
<!doctype html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="Generator" content="EditPlus®">
<meta name="Author" content="">
<meta name="Keywords" content="">
<meta name="Description" content="">
<title>Document</title>
</head>
<body>
<div id="quotes_content_left_pnlAJAX">
<table class="historical-data__table">
<thead class="historical-data__table-headings">
<tr class="historical-data__row historical-data__row--headings">
<th class="historical-data__table-heading" scope="col">Date</th>
<th class="historical-data__table-heading" scope="col">Open</th>
<th class="historical-data__table-heading" scope="col">High</th>
<th class="historical-data__table-heading" scope="col">Low</th>
<th class="historical-data__table-heading" scope="col">Close/Last</th>
<th class="historical-data__table-heading" scope="col">Volume</th>
</tr>
</thead>
<tbody class="historical-data__table-body">
<tr class="historical-data__row">
<th>10/04/2019</th>
<td>225.64</td>
<td>227.49</td>
<td>223.89</td>
<td>227.01</td>
<td>34,755,550</td>
</tr>
<tr class="historical-data__row">
<th>10/03/2019</th>
<td>218.43</td>
<td>220.96</td>
<td>215.132</td>
<td>220.82</td>
<td>30,352,690</td>
</tr>
<tr class="historical-data__row">
<th>10/02/2019</th>
<td>223.06</td>
<td>223.58</td>
<td>217.93</td>
<td>218.96</td>
<td>35,767,260</td>
</tr>
<tr class="historical-data__row">
<th>10/01/2019</th>
<td>225.07</td>
<td>228.22</td>
<td>224.2</td>
<td>224.59</td>
<td>36,187,160</td>
</tr>
<tr class="historical-data__row">
<th>09/30/2019</th>
<td>220.9</td>
<td>224.58</td>
<td>220.79</td>
<td>223.97</td>
<td>26,318,580</td>
</tr>
<tr class="historical-data__row">
<th>09/27/2019</th>
<td>220.54</td>
<td>220.96</td>
<td>217.2814</td>
<td>218.82</td>
<td>25,361,290</td>
</tr>
<tr class="historical-data__row">
<th>09/26/2019</th>
<td>220</td>
<td>220.94</td>
<td>218.83</td>
<td>219.89</td>
<td>19,088,310</td>
</tr>
<tr class="historical-data__row">
<th>09/25/2019</th>
<td>218.55</td>
<td>221.5</td>
<td>217.1402</td>
<td>221.03</td>
<td>22,481,010</td>
</tr>
<tr class="historical-data__row">
<th>09/24/2019</th>
<td>221.03</td>
<td>222.49</td>
<td>217.19</td>
<td>217.68</td>
<td>31,434,370</td>
</tr>
<tr class="historical-data__row">
<th>09/23/2019</th>
<td>218.95</td>
<td>219.84</td>
<td>217.65</td>
<td>218.72</td>
<td>19,419,650</td>
</tr>
<tr class="historical-data__row">
<th>09/20/2019</th>
<td>221.38</td>
<td>222.56</td>
<td>217.473</td>
<td>217.73</td>
<td>57,977,090</td>
</tr>
<tr class="historical-data__row">
<th>09/19/2019</th>
<td>222.01</td>
<td>223.76</td>
<td>220.37</td>
<td>220.96</td>
<td>22,187,880</td>
</tr>
<tr class="historical-data__row">
<th>09/18/2019</th>
<td>221.06</td>
<td>222.85</td>
<td>219.44</td>
<td>222.77</td>
<td>25,643,090</td>
</tr>
<tr class="historical-data__row">
<th>09/17/2019</th>
<td>219.96</td>
<td>220.82</td>
<td>219.12</td>
<td>220.7</td>
<td>18,386,470</td>
</tr>
<tr class="historical-data__row">
<th>09/16/2019</th>
<td>217.73</td>
<td>220.13</td>
<td>217.56</td>
<td>219.9</td>
<td>21,158,140</td>
</tr>
<tr class="historical-data__row">
<th>09/13/2019</th>
<td>220</td>
<td>220.79</td>
<td>217.02</td>
<td>218.75</td>
<td>39,763,300</td>
</tr>
<tr class="historical-data__row">
<th>09/12/2019</th>
<td>224.8</td>
<td>226.42</td>
<td>222.86</td>
<td>223.085</td>
<td>32,226,670</td>
</tr>
<tr class="historical-data__row">
<th>09/11/2019</th>
<td>218.07</td>
<td>223.71</td>
<td>217.73</td>
<td>223.59</td>
<td>44,289,650</td>
</tr>
</tbody>
</table>
</div>
</body>
</html>
- 获得动态数据
import re
import urllib.request
url = 'http://q.stock.sohu.com/hisHq?code=cn_600519&stat=1&order=D&period=d&callback=historySearchHandler&rt=jsonp&0.8115656498417958'
req = urllib.request.Request(url)
with urllib.request.urlopen(req) as response:
data = response.read()
htmlstr = data.decode('gbk')
print(htmlstr)
htmlstr = htmlstr.replace('historySearchHandler(', '')
htmlstr = htmlstr.replace(')', '')
print('替换后的:', htmlstr)
historySearchHandler([{"status":0,"hq":[["2023-04-18","1753.00","1758.00","5.00","0.29%","1746.02","1769.00","18314","322010.75","0.15%"],["2023-04-17","1740.00","1753.00","39.58","2.31%","1728.00","1753.00","30467","530340.12","0.24%"],["2023-04-14","1726.00","1713.42","-9.58","-0.56%","1704.80","1733.00","21232","364652.69","0.17%"],["2023-04-13","1690.00","1723.00","28.90","1.71%","1684.01","1723.59","29543","504931.03","0.24%"],["2023-04-12","1747.26","1694.10","-51.40","-2.94%","1692.82","1750.00","51105","873265.75","0.41%"],["2023-04-11","1793.00","1745.50","-26.20","-1.48%","1744.00","1793.00","29209","513885.44","0.23%"],["2023-04-10","1790.88","1771.70","-19.29","-1.08%","1744.00","1790.88","29418","517115.03","0.23%"],["2023-04-07","1795.00","1790.99","-5.97","-0.33%","1788.34","1806.01","13525","242816.05","0.11%"],["2023-04-06","1805.00","1796.96","-17.63","-0.97%","1788.22","1815.90","14874","267625.19","0.12%"],["2023-04-04","1812.00","1814.59","12.52","0.69%","1787.00","1815.17","20066","361427.53","0.16%"],["2023-04-03","1825.00","1802.07","-17.93","-0.99%","1800.08","1827.77","21417","387581.16","0.17%"],["2023-03-31","1825.00","1820.00","20.00","1.11%","1819.00","1848.00","27446","502479.06","0.22%"],["2023-03-30","1793.00","1800.00","10.00","0.56%","1779.00","1805.00","19257","345357.31","0.15%"],["2023-03-29","1799.00","1790.00","8.20","0.46%","1785.07","1800.00","15393","276190.94","0.12%"],["2023-03-28","1770.00","1781.80","14.01","0.79%","1765.02","1790.00","17261","307311.31","0.14%"],["2023-03-27","1778.60","1767.79","-10.83","-0.61%","1756.00","1778.60","15296","270075.59","0.12%"],["2023-03-24","1769.08","1778.62","3.76","0.21%","1766.00","1783.60","12770","226964.92","0.10%"],["2023-03-23","1766.00","1774.86","1.51","0.09%","1765.01","1791.11","17356","308282.16","0.14%"],["2023-03-22","1780.00","1773.35","-1.65","-0.09%","1765.55","1793.00","15330","272764.88","0.12%"],["2023-03-21","1735.00","1775.00","45.40","2.62%","1723.97","1785.85","31142","549105.19","0.25%"],["2023-03-20","1751.00","1729.60","-12.40","-0.71%","1728.00","1755.00","20491","355787.22","0.16%"],["2023-03-17","1770.00","1742.00","-9.99","-0.57%","1736.00","1775.89","27023","474424.94","0.22%"],["2023-03-16","1740.00","1751.99","1.07","0.06%","1739.01","1770.00","17646","309679.09","0.14%"],["2023-03-15","1778.37","1750.92","-15.08","-0.85%","1750.12","1784.88","19213","339269.84","0.15%"],["2023-03-14","1763.78","1766.00","4.00","0.23%","1738.50","1779.88","23705","417728.91","0.19%"],["2023-03-13","1751.00","1762.00","12.00","0.69%","1749.00","1775.00","20560","362647.62","0.16%"],["2023-03-10","1751.57","1750.00","-20.02","-1.13%","1750.00","1781.00","21161","372513.91","0.17%"],["2023-03-09","1768.00","1770.02","-0.40","-0.02%","1740.00","1785.00","27612","488144.28","0.22%"],["2023-03-08","1780.02","1770.42","-17.88","-1.00%","1761.12","1785.94","22764","403578.72","0.18%"],["2023-03-07","1805.98","1788.30","-18.84","-1.04%","1788.00","1816.60","22785","410130.25","0.18%"],["2023-03-06","1818.18","1807.14","-10.90","-0.60%","1796.77","1818.50","20646","373007.94","0.16%"],["2023-03-03","1839.77","1818.04","-9.96","-0.54%","1802.48","1841.61","16198","294684.25","0.13%"],["2023-03-02","1829.00","1828.00","-10.53","-0.57%","1821.10","1838.99","13144","240529.23","0.10%"],["2023-03-01","1813.00","1838.53","24.79","1.37%","1803.23","1848.00","24458","447559.22","0.19%"],["2023-02-28","1819.00","1813.74","3.33","0.18%","1783.30","1822.01","23952","431487.69","0.19%"],["2023-02-27","1778.50","1810.41","22.41","1.25%","1775.02","1815.00","22065","397812.88","0.18%"],["2023-02-24","1810.11","1788.00","-30.00","-1.65%","1782.18","1810.19","24635","441562.16","0.20%"],["2023-02-23","1840.00","1818.00","-18.00","-0.98%","1805.25","1848.80","21881","398399.12","0.17%"],["2023-02-22","1855.01","1836.00","-31.00","-1.66%","1831.80","1863.90","21869","403101.59","0.17%"],["2023-02-21","1874.00","1867.00","-8.00","-0.43%","1851.00","1874.00","18751","349163.34","0.15%"],["2023-02-20","1821.00","1875.00","54.22","2.98%","1817.20","1878.80","29669","548880.00","0.24%"],["2023-02-17","1850.16","1820.78","-41.04","-2.20%","1820.05","1873.00","26443","488032.88","0.21%"],["2023-02-16","1841.34","1861.82","20.82","1.13%","1828.00","1887.00","33246","619691.50","0.26%"],["2023-02-15","1843.78","1841.00","-2.79","-0.15%","1835.81","1855.30","18177","335142.22","0.14%"],["2023-02-14","1856.46","1843.79","-12.56","-0.68%","1835.00","1857.40","19566","360176.94","0.16%"],["2023-02-13","1810.00","1856.35","46.35","2.56%","1810.00","1874.50","38147","705838.25","0.30%"],["2023-02-10","1810.10","1810.00","-8.00","-0.44%","1801.05","1818.49","17985","325385.94","0.14%"],["2023-02-09","1778.00","1818.00","34.00","1.91%","1775.01","1829.75","29754","540139.94","0.24%"],["2023-02-08","1800.01","1784.00","-13.00","-0.72%","1775.00","1805.97","16676","298057.47","0.13%"],["2023-02-07","1808.08","1797.00","2.00","0.11%","1787.73","1808.80","24322","437367.19","0.19%"],["2023-02-06","1780.00","1795.00","-23.00","-1.27%","1760.00","1795.00","42661","759573.94","0.34%"],["2023-02-03","1820.00","1818.00","-18.11","-0.99%","1795.68","1826.00","34945","632463.50","0.28%"],["2023-02-02","1848.38","1836.11","-8.86","-0.48%","1826.00","1859.00","29759","546550.94","0.24%"],["2023-02-01","1854.98","1844.97","-0.79","-0.04%","1811.40","1859.00","33974","624467.94","0.27%"],["2023-01-31","1896.50","1845.76","-42.24","-2.24%","1833.07","1899.95","32991","612831.12","0.26%"],["2023-01-30","1909.00","1888.00","27.99","1.50%","1880.00","1909.00","35923","679975.69","0.29%"],["2023-01-20","1889.19","1860.01","-20.20","-1.07%","1858.00","1898.25","25609","480735.59","0.20%"],["2023-01-19","1892.50","1880.21","-12.79","-0.68%","1866.00","1892.52","23439","440199.44","0.19%"],["2023-01-18","1914.00","1893.00","-15.00","-0.79%","1890.00","1925.30","21063","400866.53","0.17%"],["2023-01-17","1913.16","1908.00","-4.90","-0.26%","1895.00","1923.00","21299","406832.16","0.17%"],["2023-01-16","1886.00","1912.90","25.90","1.37%","1881.00","1935.00","36848","705998.31","0.29%"],["2023-01-13","1844.18","1887.00","53.00","2.89%","1840.00","1888.00","31940","596987.62","0.25%"],["2023-01-12","1848.00","1834.00","-10.95","-0.59%","1833.00","1856.00","17193","316263.72","0.14%"],["2023-01-11","1856.00","1844.95","-9.50","-0.51%","1836.84","1860.00","22720","420148.78","0.18%"],["2023-01-10","1839.06","1854.45","13.25","0.72%","1830.50","1864.50","22732","420478.38","0.18%"],["2023-01-09","1835.00","1841.20","37.43","2.08%","1807.82","1849.98","30977","568418.12","0.25%"],["2023-01-06","1806.12","1803.77","2.77","0.15%","1787.00","1811.90","24904","448083.88","0.20%"],["2023-01-05","1737.00","1801.00","75.99","4.41%","1733.00","1801.00","47943","854158.69","0.38%"],["2023-01-04","1730.00","1725.01","-5.00","-0.29%","1716.00","1738.70","20416","352358.22","0.16%"],["2023-01-03","1731.20","1730.01","3.01","0.17%","1706.01","1738.43","26034","448776.03","0.21%"],["2022-12-30","1736.00","1727.00","8.00","0.47%","1727.00","1752.99","25333","440954.41","0.20%"],["2022-12-29","1717.00","1719.00","-14.00","-0.81%","1701.05","1726.99","22418","384449.97","0.18%"],["2022-12-28","1745.88","1733.00","0.00","0.00%","1708.01","1747.00","21438","369994.91","0.17%"],["2022-12-27","1738.00","1733.00","12.85","0.75%","1725.50","1747.15","17905","310927.03","0.14%"],["2022-12-26","1771.00","1742.06","-28.94","-1.63%","1735.02","1771.00","21384","374912.09","0.17%"],["2022-12-23","1752.40","1771.00","3.00","0.17%","1745.00","1782.00","17319","306360.84","0.14%"],["2022-12-22","1756.70","1768.00","29.00","1.67%","1745.00","1783.00","23175","409386.16","0.18%"],["2022-12-21","1724.00","1739.00","24.00","1.40%","1717.65","1739.00","22816","394892.62","0.18%"],["2022-12-20","1765.33","1715.00","-58.00","-3.27%","1682.45","1765.33","46198","794412.06","0.37%"],["2022-12-19","1798.80","1773.00","-13.87","-0.78%","1760.17","1798.80","24987","444723.66","0.20%"]],"code":"cn_600519","stat":["累计:","2022-12-19至2023-04-18","-28.87","-1.62%",1682.45,1935,1961308,35261288.98,"15.59%"]}])
替换后的: [{"status":0,"hq":[["2023-04-18","1753.00","1758.00","5.00","0.29%","1746.02","1769.00","18314","322010.75","0.15%"],["2023-04-17","1740.00","1753.00","39.58","2.31%","1728.00","1753.00","30467","530340.12","0.24%"],["2023-04-14","1726.00","1713.42","-9.58","-0.56%","1704.80","1733.00","21232","364652.69","0.17%"],["2023-04-13","1690.00","1723.00","28.90","1.71%","1684.01","1723.59","29543","504931.03","0.24%"],["2023-04-12","1747.26","1694.10","-51.40","-2.94%","1692.82","1750.00","51105","873265.75","0.41%"],["2023-04-11","1793.00","1745.50","-26.20","-1.48%","1744.00","1793.00","29209","513885.44","0.23%"],["2023-04-10","1790.88","1771.70","-19.29","-1.08%","1744.00","1790.88","29418","517115.03","0.23%"],["2023-04-07","1795.00","1790.99","-5.97","-0.33%","1788.34","1806.01","13525","242816.05","0.11%"],["2023-04-06","1805.00","1796.96","-17.63","-0.97%","1788.22","1815.90","14874","267625.19","0.12%"],["2023-04-04","1812.00","1814.59","12.52","0.69%","1787.00","1815.17","20066","361427.53","0.16%"],["2023-04-03","1825.00","1802.07","-17.93","-0.99%","1800.08","1827.77","21417","387581.16","0.17%"],["2023-03-31","1825.00","1820.00","20.00","1.11%","1819.00","1848.00","27446","502479.06","0.22%"],["2023-03-30","1793.00","1800.00","10.00","0.56%","1779.00","1805.00","19257","345357.31","0.15%"],["2023-03-29","1799.00","1790.00","8.20","0.46%","1785.07","1800.00","15393","276190.94","0.12%"],["2023-03-28","1770.00","1781.80","14.01","0.79%","1765.02","1790.00","17261","307311.31","0.14%"],["2023-03-27","1778.60","1767.79","-10.83","-0.61%","1756.00","1778.60","15296","270075.59","0.12%"],["2023-03-24","1769.08","1778.62","3.76","0.21%","1766.00","1783.60","12770","226964.92","0.10%"],["2023-03-23","1766.00","1774.86","1.51","0.09%","1765.01","1791.11","17356","308282.16","0.14%"],["2023-03-22","1780.00","1773.35","-1.65","-0.09%","1765.55","1793.00","15330","272764.88","0.12%"],["2023-03-21","1735.00","1775.00","45.40","2.62%","1723.97","1785.85","31142","549105.19","0.25%"],["2023-03-20","1751.00","1729.60","-12.40","-0.71%","1728.00","1755.00","20491","355787.22","0.16%"],["2023-03-17","1770.00","1742.00","-9.99","-0.57%","1736.00","1775.89","27023","474424.94","0.22%"],["2023-03-16","1740.00","1751.99","1.07","0.06%","1739.01","1770.00","17646","309679.09","0.14%"],["2023-03-15","1778.37","1750.92","-15.08","-0.85%","1750.12","1784.88","19213","339269.84","0.15%"],["2023-03-14","1763.78","1766.00","4.00","0.23%","1738.50","1779.88","23705","417728.91","0.19%"],["2023-03-13","1751.00","1762.00","12.00","0.69%","1749.00","1775.00","20560","362647.62","0.16%"],["2023-03-10","1751.57","1750.00","-20.02","-1.13%","1750.00","1781.00","21161","372513.91","0.17%"],["2023-03-09","1768.00","1770.02","-0.40","-0.02%","1740.00","1785.00","27612","488144.28","0.22%"],["2023-03-08","1780.02","1770.42","-17.88","-1.00%","1761.12","1785.94","22764","403578.72","0.18%"],["2023-03-07","1805.98","1788.30","-18.84","-1.04%","1788.00","1816.60","22785","410130.25","0.18%"],["2023-03-06","1818.18","1807.14","-10.90","-0.60%","1796.77","1818.50","20646","373007.94","0.16%"],["2023-03-03","1839.77","1818.04","-9.96","-0.54%","1802.48","1841.61","16198","294684.25","0.13%"],["2023-03-02","1829.00","1828.00","-10.53","-0.57%","1821.10","1838.99","13144","240529.23","0.10%"],["2023-03-01","1813.00","1838.53","24.79","1.37%","1803.23","1848.00","24458","447559.22","0.19%"],["2023-02-28","1819.00","1813.74","3.33","0.18%","1783.30","1822.01","23952","431487.69","0.19%"],["2023-02-27","1778.50","1810.41","22.41","1.25%","1775.02","1815.00","22065","397812.88","0.18%"],["2023-02-24","1810.11","1788.00","-30.00","-1.65%","1782.18","1810.19","24635","441562.16","0.20%"],["2023-02-23","1840.00","1818.00","-18.00","-0.98%","1805.25","1848.80","21881","398399.12","0.17%"],["2023-02-22","1855.01","1836.00","-31.00","-1.66%","1831.80","1863.90","21869","403101.59","0.17%"],["2023-02-21","1874.00","1867.00","-8.00","-0.43%","1851.00","1874.00","18751","349163.34","0.15%"],["2023-02-20","1821.00","1875.00","54.22","2.98%","1817.20","1878.80","29669","548880.00","0.24%"],["2023-02-17","1850.16","1820.78","-41.04","-2.20%","1820.05","1873.00","26443","488032.88","0.21%"],["2023-02-16","1841.34","1861.82","20.82","1.13%","1828.00","1887.00","33246","619691.50","0.26%"],["2023-02-15","1843.78","1841.00","-2.79","-0.15%","1835.81","1855.30","18177","335142.22","0.14%"],["2023-02-14","1856.46","1843.79","-12.56","-0.68%","1835.00","1857.40","19566","360176.94","0.16%"],["2023-02-13","1810.00","1856.35","46.35","2.56%","1810.00","1874.50","38147","705838.25","0.30%"],["2023-02-10","1810.10","1810.00","-8.00","-0.44%","1801.05","1818.49","17985","325385.94","0.14%"],["2023-02-09","1778.00","1818.00","34.00","1.91%","1775.01","1829.75","29754","540139.94","0.24%"],["2023-02-08","1800.01","1784.00","-13.00","-0.72%","1775.00","1805.97","16676","298057.47","0.13%"],["2023-02-07","1808.08","1797.00","2.00","0.11%","1787.73","1808.80","24322","437367.19","0.19%"],["2023-02-06","1780.00","1795.00","-23.00","-1.27%","1760.00","1795.00","42661","759573.94","0.34%"],["2023-02-03","1820.00","1818.00","-18.11","-0.99%","1795.68","1826.00","34945","632463.50","0.28%"],["2023-02-02","1848.38","1836.11","-8.86","-0.48%","1826.00","1859.00","29759","546550.94","0.24%"],["2023-02-01","1854.98","1844.97","-0.79","-0.04%","1811.40","1859.00","33974","624467.94","0.27%"],["2023-01-31","1896.50","1845.76","-42.24","-2.24%","1833.07","1899.95","32991","612831.12","0.26%"],["2023-01-30","1909.00","1888.00","27.99","1.50%","1880.00","1909.00","35923","679975.69","0.29%"],["2023-01-20","1889.19","1860.01","-20.20","-1.07%","1858.00","1898.25","25609","480735.59","0.20%"],["2023-01-19","1892.50","1880.21","-12.79","-0.68%","1866.00","1892.52","23439","440199.44","0.19%"],["2023-01-18","1914.00","1893.00","-15.00","-0.79%","1890.00","1925.30","21063","400866.53","0.17%"],["2023-01-17","1913.16","1908.00","-4.90","-0.26%","1895.00","1923.00","21299","406832.16","0.17%"],["2023-01-16","1886.00","1912.90","25.90","1.37%","1881.00","1935.00","36848","705998.31","0.29%"],["2023-01-13","1844.18","1887.00","53.00","2.89%","1840.00","1888.00","31940","596987.62","0.25%"],["2023-01-12","1848.00","1834.00","-10.95","-0.59%","1833.00","1856.00","17193","316263.72","0.14%"],["2023-01-11","1856.00","1844.95","-9.50","-0.51%","1836.84","1860.00","22720","420148.78","0.18%"],["2023-01-10","1839.06","1854.45","13.25","0.72%","1830.50","1864.50","22732","420478.38","0.18%"],["2023-01-09","1835.00","1841.20","37.43","2.08%","1807.82","1849.98","30977","568418.12","0.25%"],["2023-01-06","1806.12","1803.77","2.77","0.15%","1787.00","1811.90","24904","448083.88","0.20%"],["2023-01-05","1737.00","1801.00","75.99","4.41%","1733.00","1801.00","47943","854158.69","0.38%"],["2023-01-04","1730.00","1725.01","-5.00","-0.29%","1716.00","1738.70","20416","352358.22","0.16%"],["2023-01-03","1731.20","1730.01","3.01","0.17%","1706.01","1738.43","26034","448776.03","0.21%"],["2022-12-30","1736.00","1727.00","8.00","0.47%","1727.00","1752.99","25333","440954.41","0.20%"],["2022-12-29","1717.00","1719.00","-14.00","-0.81%","1701.05","1726.99","22418","384449.97","0.18%"],["2022-12-28","1745.88","1733.00","0.00","0.00%","1708.01","1747.00","21438","369994.91","0.17%"],["2022-12-27","1738.00","1733.00","12.85","0.75%","1725.50","1747.15","17905","310927.03","0.14%"],["2022-12-26","1771.00","1742.06","-28.94","-1.63%","1735.02","1771.00","21384","374912.09","0.17%"],["2022-12-23","1752.40","1771.00","3.00","0.17%","1745.00","1782.00","17319","306360.84","0.14%"],["2022-12-22","1756.70","1768.00","29.00","1.67%","1745.00","1783.00","23175","409386.16","0.18%"],["2022-12-21","1724.00","1739.00","24.00","1.40%","1717.65","1739.00","22816","394892.62","0.18%"],["2022-12-20","1765.33","1715.00","-58.00","-3.27%","1682.45","1765.33","46198","794412.06","0.37%"],["2022-12-19","1798.80","1773.00","-13.87","-0.78%","1760.17","1798.80","24987","444723.66","0.20%"]],"code":"cn_600519","stat":["累计:","2022-12-19至2023-04-18","-28.87","-1.62%",1682.45,1935,1961308,35261288.98,"15.59%"]}]
- 伪装成浏览器
import urllib.request
url = 'http://www.ctrip.com/'
req = urllib.request.Request(url)
req.add_header('User-Agent',
'Mozilla/5.0 (iPhone; CPU iPhone OS 10_2_1 like Mac OS X) AppleWebKit/602.4.6 (KHTML, like Gecko) Version/10.0 Mobile/14D27 Safari/602.1')
with urllib.request.urlopen(req) as response:
data = response.read()
htmlstr = data.decode()
if htmlstr.find('mobile') != -1:
print('移动版')
移动版
使用Selenium爬取数据
from selenium import webdriver
driver = webdriver.Chrome()
driver.get('http://q.stock.sohu.com/cn/600519/lshq.shtml')
em = driver.find_element(By.id,'BIZ_hq_historySearch')
print(em.text)
driver.quit()
---------------------------------------------------------------------------
AttributeError Traceback (most recent call last)
Input In [2], in <cell line: 6>()
3 driver = webdriver.Chrome()
5 driver.get('http://q.stock.sohu.com/cn/600519/lshq.shtml')
----> 6 em = driver.find_element_by_id('BIZ_hq_historySearch')
7 print(em.text)
8 ## driver.close()
AttributeError: 'WebDriver' object has no attribute 'find_element_by_id'
分析数据
使用正则表达式
import urllib.request
import os
import re
url = 'http://p.weather.com.cn/'
def findallimageurl(htmlstr):
"""从HTML代码中查找匹配的字符串"""
pattern = r'http://\S+(?:\.png|\.jpg)'
return re.findall(pattern, htmlstr)
def getfilename(urlstr):
"""根据图片连接地址截取图片名"""
pos = urlstr.rfind('/')
return urlstr[pos + 1:]
url_list = []
req = urllib.request.Request(url)
with urllib.request.urlopen(req) as response:
data = response.read()
htmlstr = data.decode()
url_list = findallimageurl(htmlstr)
for imagesrc in url_list:
req = urllib.request.Request(imagesrc)
with urllib.request.urlopen(req) as response:
data = response.read()
if len(data) < 1024 * 100:
continue
if not os.path.exists('download'):
os.mkdir('download')
filename = getfilename(imagesrc)
filename = 'download/' + filename
with open(filename, 'wb') as f:
f.write(data)
print('下载图片', filename)
下载图片 download/20230412105733E6869CA2C51FC9659543B01BCAD594C0.jpg
下载图片 download/2023041210583373DC4BF4E9ABC5CC8C084D45FB133E3A.jpg
下载图片 download/20230412105932202830A62B6E006C698504271BA9D52C.jpg
下载图片 download/20230406160425985ECFF0D26CB2A423DAECD29141F4EE.jpg
下载图片 download/20220401091431D32C5DA957F3441693885B05E271420C.jpg
下载图片 download/2023041812043228512B6723F81BA42BC286530A7AD859.jpg
下载图片 download/20230416152716215BBBA7CCF443222A245DA84B742444.jpg
下载图片 download/202304160947448C2B8A7CF30225471547902BD50AB088.jpg
下载图片 download/20230316141537671B47C5E4F520E11EE0E489187E624F.png
使用BeautifulSoup库
import os
import urllib.request
from bs4 import BeautifulSoup
url = 'http://p.weather.com.cn/'
def findallimageurl(htmlstr):
"""从HTML代码中查找匹配的字符串"""
sp = BeautifulSoup(htmlstr, 'html.parser')
imgtaglist = sp.find_all('img')
srclist = list(map(lambda u: u.get('src'), imgtaglist))
filtered_srclist = filter(lambda u: u.lower().endswith('.png')
or u.lower().endswith('.jpg'), srclist)
return filtered_srclist
def getfilename(urlstr):
"""根据图片连接地址截取图片名"""
pos = urlstr.rfind('/')
return urlstr[pos + 1:]
url_list = []
req = urllib.request.Request(url)
with urllib.request.urlopen(req) as response:
data = response.read()
htmlstr = data.decode()
url_list = findallimageurl(htmlstr)
for imagesrc in url_list:
req = urllib.request.Request(imagesrc)
with urllib.request.urlopen(req) as response:
data = response.read()
if len(data) < 1024 * 100:
continue
if not os.path.exists('download1'):
os.mkdir('download1')
filename = getfilename(imagesrc)
filename = 'download1/' + filename
with open(filename, 'wb') as f:
f.write(data)
print('下载图片', filename)
下载图片 download1/20230412105733E6869CA2C51FC9659543B01BCAD594C0.jpg
下载图片 download1/2023041210583373DC4BF4E9ABC5CC8C084D45FB133E3A.jpg
下载图片 download1/20230412105932202830A62B6E006C698504271BA9D52C.jpg
下载图片 download1/20230406160425985ECFF0D26CB2A423DAECD29141F4EE.jpg
下载图片 download1/20220401091431D32C5DA957F3441693885B05E271420C.jpg
下载图片 download1/2023041812043228512B6723F81BA42BC286530A7AD859.jpg
下载图片 download1/20230416152716215BBBA7CCF443222A245DA84B742444.jpg
下载图片 download1/202304160947448C2B8A7CF30225471547902BD50AB088.jpg
下载图片 download1/20230316141537671B47C5E4F520E11EE0E489187E624F.png
爬取Nasdaq股票数据
import datetime
import hashlib
import logging
import os
import re
import threading
import time
import urllib.request
from bs4 import BeautifulSoup
from db.db_access import insert_hisq_data
logging.basicConfig(level=logging.INFO,
format='%(asctime)s - %(threadName)s - '
'%(name)s - %(funcName)s - %(levelname)s - %(message)s')
logger = logging.getLogger(__name__)
url = 'file:///C:/Users/HP/nasdaq-Apple1.html'
def validateUpdate(html):
"""验证数据是否更新,更新返回True,未更新返回False"""
md5obj = hashlib.md5()
md5obj.update(html.encode(encoding='utf-8'))
md5code = md5obj.hexdigest()
old_md5code = ''
f_name = 'md5.txt'
if os.path.exists(f_name):
with open(f_name, 'r', encoding='utf-8') as f:
old_md5code = f.read()
if md5code == old_md5code:
logger.info('数据没有更新')
return False
else:
with open(f_name, 'w', encoding='utf-8') as f:
f.write(md5code)
logger.info('数据更新')
return True
isrunning = True
interval = 5
def controlthread_body():
"""控制线程体函数"""
global interval, isrunning
while isrunning:
i = input('输入Bye终止爬虫,输入数字改变爬虫工作间隔,单位秒:')
logger.info('控制输入{0}'.format(i))
try:
interval = int(i)
except ValueError:
if i.lower() == 'bye':
isrunning = False
def istradtime():
"""判断交易时间"""
now = datetime.datetime.now()
df = '%H%M%S'
strnow = now.strftime(df)
starttime = datetime.time(hour=21, minute=30).strftime(df)
endtime = datetime.time(hour=4, minute=0).strftime(df)
if now.weekday() == 5 \
or now.weekday() == 6 \
or (endtime < strnow < starttime):
return False
return True
def workthread_body():
"""工作线程体函数"""
global interval, isrunning
while isrunning:
if istradtime():
logger.info('交易时间,爬虫休眠1小时...')
time.sleep(60 * 60)
continue
logger.info('爬虫开始工作...')
req = urllib.request.Request(url)
with urllib.request.urlopen(req) as response:
data = response.read()
html = data.decode()
sp = BeautifulSoup(html, 'html.parser')
div = sp.select('div#quotes_content_left_pnlAJAX')
divstring = div[0]
if validateUpdate(divstring):
trlist = sp.select('div#quotes_content_left_pnlAJAX table tbody tr')
data = []
for tr in trlist:
trtext = tr.text.strip('\n\r ')
if trtext == '':
continue
rows = re.split(r'\s+', trtext)
fields = {}
try:
df = '%m/%d/%Y'
fields['Date'] = datetime.datetime.strptime(rows[0], df)
except ValueError:
continue
fields['Open'] = float(rows[1])
fields['High'] = float(rows[2])
fields['Low'] = float(rows[3])
fields['Close'] = float(rows[4])
fields['Volume'] = int(rows[5].replace(',', ''))
data.append(fields)
for row in data:
row['Symbol'] = 'AAPL'
insert_hisq_data(row)
logger.info('爬虫休眠{0}秒...'.format(interval))
time.sleep(interval)
def main():
"""主函数"""
global interval, isrunning
workthread = threading.Thread(target=workthread_body, name='WorkThread')
workthread.start()
controlthread = threading.Thread(target=controlthread_body, name='ControlThread')
controlthread.start()
if __name__ == '__main__':
main()
2023-04-19 15:46:27,709 - WorkThread - __main__ - workthread_body - INFO - 爬虫开始工作...
2023-04-19 15:46:28,157 - WorkThread - __main__ - validateUpdate - INFO - 数据更新
2023-04-19 15:46:28,236 - WorkThread - __main__ - workthread_body - INFO - 爬虫休眠5秒...
2023-04-19 15:46:33,247 - WorkThread - __main__ - workthread_body - INFO - 爬虫开始工作...
2023-04-19 15:46:33,255 - WorkThread - __main__ - validateUpdate - INFO - 数据没有更新
2023-04-19 15:46:33,256 - WorkThread - __main__ - workthread_body - INFO - 爬虫休眠5秒...
输入Bye终止爬虫,输入数字改变爬虫工作间隔,单位秒:3600
2023-04-19 15:46:36,048 - ControlThread - __main__ - controlthread_body - INFO - 控制输入3600
输入Bye终止爬虫,输入数字改变爬虫工作间隔,单位秒:
Exception in thread ControlThread:
Traceback (most recent call last):
File "E:\anaconda\lib\threading.py", line 973, in _bootstrap_inner
self.run()
File "E:\anaconda\lib\threading.py", line 910, in run
self._target(*self._args, **self._kwargs)
File "C:\Users\HP\AppData\Local\Temp\ipykernel_22288\985097547.py", line 66, in controlthread_body
EOFError: EOF when reading a line
2023-04-19 15:46:38,259 - WorkThread - __main__ - workthread_body - INFO - 爬虫开始工作...
2023-04-19 15:46:38,267 - WorkThread - __main__ - validateUpdate - INFO - 数据没有更新
2023-04-19 15:46:38,267 - WorkThread - __main__ - workthread_body - INFO - 爬虫休眠3600秒...
Pandas进阶
import numpy as np
import pandas as pd
data = pd.Series(np.random.randn(9), index=[['a', 'a', 'a', 'b', 'b', 'c', 'c', 'd', 'd'], [1, 2, 3, 1, 3, 1, 2, 2, 3]])
data
a 1 -0.018841
2 0.291057
3 -0.869647
b 1 0.500437
3 -1.678710
c 1 -1.957127
2 -0.563527
d 2 0.454833
3 -0.343765
dtype: float64
data.index
MultiIndex([('a', 1),
('a', 2),
('a', 3),
('b', 1),
('b', 3),
('c', 1),
('c', 2),
('d', 2),
('d', 3)],
)
data['b']
1 0.500437
3 -1.678710
dtype: float64
data['b':'c']
b 1 0.500437
3 -1.678710
c 1 -1.957127
2 -0.563527
dtype: float64
data.loc[['b','d']]
b 1 0.500437
3 -1.678710
d 2 0.454833
3 -0.343765
dtype: float64
data.loc[:,2]
a 0.291057
c -0.563527
d 0.454833
dtype: float64
frame = pd.DataFrame(np.arange(12).reshape((4, 3)), index=[['a', 'a', 'b', 'b'], [1, 2, 1, 2]], columns=[['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']])
frame
|
|
Ohio |
Colorado |
|
|
Green |
Red |
Green |
a |
1 |
0 |
1 |
2 |
2 |
3 |
4 |
5 |
b |
1 |
6 |
7 |
8 |
2 |
9 |
10 |
11 |
frame.index.names = ['key1', 'key2']
frame.columns.names = ['state', 'color']
frame
|
state |
Ohio |
Colorado |
|
color |
Green |
Red |
Green |
key1 |
key2 |
|
|
|
a |
1 |
0 |
1 |
2 |
2 |
3 |
4 |
5 |
b |
1 |
6 |
7 |
8 |
2 |
9 |
10 |
11 |
frame['Ohio']
|
color |
Green |
Red |
key1 |
key2 |
|
|
a |
1 |
0 |
1 |
2 |
3 |
4 |
b |
1 |
6 |
7 |
2 |
9 |
10 |
from pandas import *
MultiIndex.from_arrays([['Ohio', 'Ohio', 'Colorado'], ['Green', 'Red', 'Green']], names=['state', 'color'])
MultiIndex([( 'Ohio', 'Green'),
( 'Ohio', 'Red'),
('Colorado', 'Green')],
names=['state', 'color'])
frame.swaplevel('key1', 'key2')
|
state |
Ohio |
Colorado |
|
color |
Green |
Red |
Green |
key2 |
key1 |
|
|
|
1 |
a |
0 |
1 |
2 |
2 |
a |
3 |
4 |
5 |
1 |
b |
6 |
7 |
8 |
2 |
b |
9 |
10 |
11 |
frame.sort_index(level=1)
|
state |
Ohio |
Colorado |
|
color |
Green |
Red |
Green |
key1 |
key2 |
|
|
|
a |
1 |
0 |
1 |
2 |
b |
1 |
6 |
7 |
8 |
a |
2 |
3 |
4 |
5 |
b |
2 |
9 |
10 |
11 |
frame.swaplevel(0, 1).sort_index(level=0)
|
state |
Ohio |
Colorado |
|
color |
Green |
Red |
Green |
key2 |
key1 |
|
|
|
1 |
a |
0 |
1 |
2 |
b |
6 |
7 |
8 |
2 |
a |
3 |
4 |
5 |
b |
9 |
10 |
11 |
frame.sum(level='key2')
C:\Users\HP\AppData\Local\Temp\ipykernel_21392\2004046222.py:1: FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.sum(level=1) should use df.groupby(level=1).sum().
frame.sum(level='key2')
state |
Ohio |
Colorado |
color |
Green |
Red |
Green |
key2 |
|
|
|
1 |
6 |
8 |
10 |
2 |
12 |
14 |
16 |
frame.sum(level='color', axis=1)
C:\Users\HP\AppData\Local\Temp\ipykernel_21392\4133796543.py:1: FutureWarning: Using the level keyword in DataFrame and Series aggregations is deprecated and will be removed in a future version. Use groupby instead. df.sum(level=1) should use df.groupby(level=1).sum().
frame.sum(level='color', axis=1)
|
color |
Green |
Red |
key1 |
key2 |
|
|
a |
1 |
2 |
1 |
2 |
8 |
4 |
b |
1 |
14 |
7 |
2 |
20 |
10 |
frame.describe()
state |
Ohio |
Colorado |
color |
Green |
Red |
Green |
count |
4.000000 |
4.000000 |
4.000000 |
mean |
4.500000 |
5.500000 |
6.500000 |
std |
3.872983 |
3.872983 |
3.872983 |
min |
0.000000 |
1.000000 |
2.000000 |
25% |
2.250000 |
3.250000 |
4.250000 |
50% |
4.500000 |
5.500000 |
6.500000 |
75% |
6.750000 |
7.750000 |
8.750000 |
max |
9.000000 |
10.000000 |
11.000000 |
frame = pd.DataFrame({'a': range(7), 'b': range(7, 0, -1), 'c': ['one', 'one', 'one', 'two', 'two', 'two', 'two'], 'd': [0, 1, 2, 0, 1, 2, 3]})
frame
|
a |
b |
c |
d |
0 |
0 |
7 |
one |
0 |
1 |
1 |
6 |
one |
1 |
2 |
2 |
5 |
one |
2 |
3 |
3 |
4 |
two |
0 |
4 |
4 |
3 |
two |
1 |
5 |
5 |
2 |
two |
2 |
6 |
6 |
1 |
two |
3 |
frame2 = frame.set_index(['c', 'd'])
frame2
|
|
a |
b |
c |
d |
|
|
one |
0 |
0 |
7 |
1 |
1 |
6 |
2 |
2 |
5 |
two |
0 |
3 |
4 |
1 |
4 |
3 |
2 |
5 |
2 |
3 |
6 |
1 |
frame.set_index(['c', 'd'], drop=False)
|
|
a |
b |
c |
d |
c |
d |
|
|
|
|
one |
0 |
0 |
7 |
one |
0 |
1 |
1 |
6 |
one |
1 |
2 |
2 |
5 |
one |
2 |
two |
0 |
3 |
4 |
two |
0 |
1 |
4 |
3 |
two |
1 |
2 |
5 |
2 |
two |
2 |
3 |
6 |
1 |
two |
3 |
frame2.reset_index()
|
c |
d |
a |
b |
0 |
one |
0 |
0 |
7 |
1 |
one |
1 |
1 |
6 |
2 |
one |
2 |
2 |
5 |
3 |
two |
0 |
3 |
4 |
4 |
two |
1 |
4 |
3 |
5 |
two |
2 |
5 |
2 |
6 |
two |
3 |
6 |
1 |
import pandas as pd
import numpy as np
df1 = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1': range(7)})
df2 = pd.DataFrame({'key': ['a', 'b', 'd'], 'data2': range(3)})
df1
|
key |
data1 |
0 |
b |
0 |
1 |
b |
1 |
2 |
a |
2 |
3 |
c |
3 |
4 |
a |
4 |
5 |
a |
5 |
6 |
b |
6 |
df2
|
key |
data2 |
0 |
a |
0 |
1 |
b |
1 |
2 |
d |
2 |
pd.merge(df1,df2)
|
key |
data1 |
data2 |
0 |
b |
0 |
1 |
1 |
b |
1 |
1 |
2 |
b |
6 |
1 |
3 |
a |
2 |
0 |
4 |
a |
4 |
0 |
5 |
a |
5 |
0 |
pd.merge(df1,df2,on='key')
|
key |
data1 |
data2 |
0 |
b |
0 |
1 |
1 |
b |
1 |
1 |
2 |
b |
6 |
1 |
3 |
a |
2 |
0 |
4 |
a |
4 |
0 |
5 |
a |
5 |
0 |
df3 = pd.DataFrame({'lkey': ['b', 'b', 'a', 'c', 'a', 'a', 'b'], 'data1': range(7)})
df4 = pd.DataFrame({'rkey': ['a', 'b', 'd'], 'data2': range(3)})
pd.merge(df3, df4, left_on='lkey', right_on='rkey')
|
lkey |
data1 |
rkey |
data2 |
0 |
b |
0 |
b |
1 |
1 |
b |
1 |
b |
1 |
2 |
b |
6 |
b |
1 |
3 |
a |
2 |
a |
0 |
4 |
a |
4 |
a |
0 |
5 |
a |
5 |
a |
0 |
df3
|
lkey |
data1 |
0 |
b |
0 |
1 |
b |
1 |
2 |
a |
2 |
3 |
c |
3 |
4 |
a |
4 |
5 |
a |
5 |
6 |
b |
6 |
df4
|
rkey |
data2 |
0 |
a |
0 |
1 |
b |
1 |
2 |
d |
2 |
pd.merge(df1,df2,how='outer')
|
key |
data1 |
data2 |
0 |
b |
0.0 |
1.0 |
1 |
b |
1.0 |
1.0 |
2 |
b |
6.0 |
1.0 |
3 |
a |
2.0 |
0.0 |
4 |
a |
4.0 |
0.0 |
5 |
a |
5.0 |
0.0 |
6 |
c |
3.0 |
NaN |
7 |
d |
NaN |
2.0 |
pd.merge(df1,df2,how='left')
|
key |
data1 |
data2 |
0 |
b |
0 |
1.0 |
1 |
b |
1 |
1.0 |
2 |
a |
2 |
0.0 |
3 |
c |
3 |
NaN |
4 |
a |
4 |
0.0 |
5 |
a |
5 |
0.0 |
6 |
b |
6 |
1.0 |
pd.merge(df1,df2,how='right')
|
key |
data1 |
data2 |
0 |
a |
2.0 |
0 |
1 |
a |
4.0 |
0 |
2 |
a |
5.0 |
0 |
3 |
b |
0.0 |
1 |
4 |
b |
1.0 |
1 |
5 |
b |
6.0 |
1 |
6 |
d |
NaN |
2 |
df1
|
key |
data1 |
0 |
b |
0 |
1 |
b |
1 |
2 |
a |
2 |
3 |
c |
3 |
4 |
a |
4 |
5 |
a |
5 |
6 |
b |
6 |
df2
|
key |
data2 |
0 |
a |
0 |
1 |
b |
1 |
2 |
d |
2 |
left = pd.DataFrame({'key1': ['foo', 'foo', 'bar'], 'key2': ['one', 'two', 'one'], 'lval': [1, 2, 3]})
right = pd.DataFrame({'key1': ['foo', 'foo', 'bar', 'bar'], 'key2': ['one', 'one', 'one', 'two'], 'rval': [4, 5, 6, 7]})
pd.merge(left, right, on=['key1', 'key2'], how='outer')
|
key1 |
key2 |
lval |
rval |
0 |
foo |
one |
1.0 |
4.0 |
1 |
foo |
one |
1.0 |
5.0 |
2 |
foo |
two |
2.0 |
NaN |
3 |
bar |
one |
3.0 |
6.0 |
4 |
bar |
two |
NaN |
7.0 |
pd.merge(left, right, on='key1')
|
key1 |
key2_x |
lval |
key2_y |
rval |
0 |
foo |
one |
1 |
one |
4 |
1 |
foo |
one |
1 |
one |
5 |
2 |
foo |
two |
2 |
one |
4 |
3 |
foo |
two |
2 |
one |
5 |
4 |
bar |
one |
3 |
one |
6 |
5 |
bar |
one |
3 |
two |
7 |
left
|
key1 |
key2 |
lval |
0 |
foo |
one |
1 |
1 |
foo |
two |
2 |
2 |
bar |
one |
3 |
right
|
key1 |
key2 |
rval |
0 |
foo |
one |
4 |
1 |
foo |
one |
5 |
2 |
bar |
one |
6 |
3 |
bar |
two |
7 |
pd.merge(left, right, on='key1', suffixes=('_left', '_right'))
|
key1 |
key2_left |
lval |
key2_right |
rval |
0 |
foo |
one |
1 |
one |
4 |
1 |
foo |
one |
1 |
one |
5 |
2 |
foo |
two |
2 |
one |
4 |
3 |
foo |
two |
2 |
one |
5 |
4 |
bar |
one |
3 |
one |
6 |
5 |
bar |
one |
3 |
two |
7 |
import pandas as pd
left1 = pd.DataFrame({'key': ['a', 'b', 'a', 'a', 'b', 'c'], 'value': range(6)})
right1 = pd.DataFrame({'group_val': [3.5, 7]}, index=['a', 'b'])
left1
|
key |
value |
0 |
a |
0 |
1 |
b |
1 |
2 |
a |
2 |
3 |
a |
3 |
4 |
b |
4 |
5 |
c |
5 |
right1
pd.merge(left1, right1, left_on='key', right_index=True)
|
key |
value |
group_val |
0 |
a |
0 |
3.5 |
2 |
a |
2 |
3.5 |
3 |
a |
3 |
3.5 |
1 |
b |
1 |
7.0 |
4 |
b |
4 |
7.0 |
import pandas as pd
import numpy as np
lefth = pd.DataFrame({'key1': ['Ohio', 'Ohio', 'Ohio', 'Nevada', 'Nevada'], 'key2': [2000, 2001, 2002, 2001, 2002], 'data': np.arange(5.)})
righth = pd.DataFrame(np.arange(12).reshape((6, 2)), index=[['Nevada', 'Nevada', 'Ohio', 'Ohio', 'Ohio', 'Ohio'], [2001, 2000, 2000, 2000, 2001, 2002]], columns=['event1', 'event2'])
lefth
|
key1 |
key2 |
data |
0 |
Ohio |
2000 |
0.0 |
1 |
Ohio |
2001 |
1.0 |
2 |
Ohio |
2002 |
2.0 |
3 |
Nevada |
2001 |
3.0 |
4 |
Nevada |
2002 |
4.0 |
righth
|
|
event1 |
event2 |
Nevada |
2001 |
0 |
1 |
2000 |
2 |
3 |
Ohio |
2000 |
4 |
5 |
2000 |
6 |
7 |
2001 |
8 |
9 |
2002 |
10 |
11 |
pd.merge(lefth, righth, left_on=['key1', 'key2'], right_index=True)
|
key1 |
key2 |
data |
event1 |
event2 |
0 |
Ohio |
2000 |
0.0 |
4 |
5 |
0 |
Ohio |
2000 |
0.0 |
6 |
7 |
1 |
Ohio |
2001 |
1.0 |
8 |
9 |
2 |
Ohio |
2002 |
2.0 |
10 |
11 |
3 |
Nevada |
2001 |
3.0 |
0 |
1 |
pd.merge(lefth, righth, left_on=['key1', 'key2'], right_index=True, how='outer')
|
key1 |
key2 |
data |
event1 |
event2 |
0 |
Ohio |
2000 |
0.0 |
4.0 |
5.0 |
0 |
Ohio |
2000 |
0.0 |
6.0 |
7.0 |
1 |
Ohio |
2001 |
1.0 |
8.0 |
9.0 |
2 |
Ohio |
2002 |
2.0 |
10.0 |
11.0 |
3 |
Nevada |
2001 |
3.0 |
0.0 |
1.0 |
4 |
Nevada |
2002 |
4.0 |
NaN |
NaN |
4 |
Nevada |
2000 |
NaN |
2.0 |
3.0 |
left2 = pd.DataFrame([[1., 2.], [3., 4.], [5., 6.]], index=['a', 'c', 'e'], columns=['Ohio', 'Nevada'])
right2 = pd.DataFrame([[7., 8.], [9., 10.], [11., 12.], [13, 14]], index=['b', 'c', 'd', 'e'], columns=['Missouri', 'Alabama'])
left2
|
Ohio |
Nevada |
a |
1.0 |
2.0 |
c |
3.0 |
4.0 |
e |
5.0 |
6.0 |
right2
|
Missouri |
Alabama |
b |
7.0 |
8.0 |
c |
9.0 |
10.0 |
d |
11.0 |
12.0 |
e |
13.0 |
14.0 |
pd.merge(left2, right2, how='outer', left_index=True, right_index=True)
|
Ohio |
Nevada |
Missouri |
Alabama |
a |
1.0 |
2.0 |
NaN |
NaN |
b |
NaN |
NaN |
7.0 |
8.0 |
c |
3.0 |
4.0 |
9.0 |
10.0 |
d |
NaN |
NaN |
11.0 |
12.0 |
e |
5.0 |
6.0 |
13.0 |
14.0 |
left2.join(right2, how='outer')
|
Ohio |
Nevada |
Missouri |
Alabama |
a |
1.0 |
2.0 |
NaN |
NaN |
b |
NaN |
NaN |
7.0 |
8.0 |
c |
3.0 |
4.0 |
9.0 |
10.0 |
d |
NaN |
NaN |
11.0 |
12.0 |
e |
5.0 |
6.0 |
13.0 |
14.0 |
another = pd.DataFrame([[7., 8.], [9., 10.], [11., 12.], [16., 17.]], index=['a', 'c', 'e', 'f'], columns=['New York', 'Oregon'])
another
|
New York |
Oregon |
a |
7.0 |
8.0 |
c |
9.0 |
10.0 |
e |
11.0 |
12.0 |
f |
16.0 |
17.0 |
left2.join(right2)
|
Ohio |
Nevada |
Missouri |
Alabama |
a |
1.0 |
2.0 |
NaN |
NaN |
c |
3.0 |
4.0 |
9.0 |
10.0 |
e |
5.0 |
6.0 |
13.0 |
14.0 |
left2.join([right2, another])
|
Ohio |
Nevada |
Missouri |
Alabama |
New York |
Oregon |
a |
1.0 |
2.0 |
NaN |
NaN |
7.0 |
8.0 |
c |
3.0 |
4.0 |
9.0 |
10.0 |
9.0 |
10.0 |
e |
5.0 |
6.0 |
13.0 |
14.0 |
11.0 |
12.0 |
left2.join([right2, another], how='outer')
|
Ohio |
Nevada |
Missouri |
Alabama |
New York |
Oregon |
a |
1.0 |
2.0 |
NaN |
NaN |
7.0 |
8.0 |
c |
3.0 |
4.0 |
9.0 |
10.0 |
9.0 |
10.0 |
e |
5.0 |
6.0 |
13.0 |
14.0 |
11.0 |
12.0 |
b |
NaN |
NaN |
7.0 |
8.0 |
NaN |
NaN |
d |
NaN |
NaN |
11.0 |
12.0 |
NaN |
NaN |
f |
NaN |
NaN |
NaN |
NaN |
16.0 |
17.0 |
s1 = pd.Series([0, 1], index=['a', 'b'])
s2 = pd.Series([2, 3, 4], index=['c', 'd', 'e'])
s3 = pd.Series([5, 6], index=['f', 'g'])
pd.concat([s1, s2, s3])
a 0
b 1
c 2
d 3
e 4
f 5
g 6
dtype: int64
pd.concat([s1, s2, s3], axis=1)
|
0 |
1 |
2 |
a |
0.0 |
NaN |
NaN |
b |
1.0 |
NaN |
NaN |
c |
NaN |
2.0 |
NaN |
d |
NaN |
3.0 |
NaN |
e |
NaN |
4.0 |
NaN |
f |
NaN |
NaN |
5.0 |
g |
NaN |
NaN |
6.0 |
s4 = pd.concat([s1, s3])
s4
a 0
b 1
f 5
g 6
dtype: int64
pd.concat([s1, s4], axis=1)
|
0 |
1 |
a |
0.0 |
0 |
b |
1.0 |
1 |
f |
NaN |
5 |
g |
NaN |
6 |
pd.concat([s1, s4], axis=1, join='inner')
pd.concat([s1, s4], axis=1, join_axes=[['a', 'c', 'b', 'e']])
---------------------------------------------------------------------------
TypeError Traceback (most recent call last)
Input In [20], in <cell line: 1>()
----> 1 pd.concat([s1, s4], axis=1, join_axes=[['a', 'c', 'b', 'e']])
File E:\anaconda\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
305 if len(args) > num_allow_args:
306 warnings.warn(
307 msg.format(arguments=arguments),
308 FutureWarning,
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
TypeError: concat() got an unexpected keyword argument 'join_axes'
result = pd.concat([s1, s1, s3], keys=['one', 'two', 'three'])
result
one a 0
b 1
two a 0
b 1
three f 5
g 6
dtype: int64
result.unstack()
|
a |
b |
f |
g |
one |
0.0 |
1.0 |
NaN |
NaN |
two |
0.0 |
1.0 |
NaN |
NaN |
three |
NaN |
NaN |
5.0 |
6.0 |
pd.concat([s1, s2, s3], axis=1, keys=['one', 'two', 'three'])
|
one |
two |
three |
a |
0.0 |
NaN |
NaN |
b |
1.0 |
NaN |
NaN |
c |
NaN |
2.0 |
NaN |
d |
NaN |
3.0 |
NaN |
e |
NaN |
4.0 |
NaN |
f |
NaN |
NaN |
5.0 |
g |
NaN |
NaN |
6.0 |
df1 = pd.DataFrame(np.arange(6).reshape(3, 2), index=['a', 'b', 'c'], columns=['one', 'two'])
df2 = pd.DataFrame(5 + np.arange(4).reshape(2, 2), index=['a', 'c'], columns=['three', 'four'])
df1
|
one |
two |
a |
0 |
1 |
b |
2 |
3 |
c |
4 |
5 |
df2
pd.concat([df1, df2], axis=1, keys=['level1', 'level2'])
|
level1 |
level2 |
|
one |
two |
three |
four |
a |
0 |
1 |
5.0 |
6.0 |
b |
2 |
3 |
NaN |
NaN |
c |
4 |
5 |
7.0 |
8.0 |
pd.concat({'level1': df1, 'level2': df2}, axis=1)
|
level1 |
level2 |
|
one |
two |
three |
four |
a |
0 |
1 |
5.0 |
6.0 |
b |
2 |
3 |
NaN |
NaN |
c |
4 |
5 |
7.0 |
8.0 |
pd.concat([df1, df2], axis=1, keys=['level1', 'level2'], names=['upper', 'lower'])
upper |
level1 |
level2 |
lower |
one |
two |
three |
four |
a |
0 |
1 |
5.0 |
6.0 |
b |
2 |
3 |
NaN |
NaN |
c |
4 |
5 |
7.0 |
8.0 |
df1 = pd.DataFrame(np.random.randn(3, 4), columns=['a', 'b', 'c', 'd'])
df2 = pd.DataFrame(np.random.randn(2, 3), columns=['b', 'd', 'a'])
df1
|
a |
b |
c |
d |
0 |
0.527674 |
2.145525 |
1.979097 |
1.702063 |
1 |
-0.350557 |
-0.511584 |
-1.061349 |
-0.702928 |
2 |
-1.239068 |
-1.240555 |
-0.295705 |
0.209181 |
df2
|
b |
d |
a |
0 |
1.718647 |
-2.931403 |
0.129779 |
1 |
1.482412 |
-1.022705 |
-1.186445 |
pd.concat([df1, df2], ignore_index=True)
|
a |
b |
c |
d |
0 |
0.527674 |
2.145525 |
1.979097 |
1.702063 |
1 |
-0.350557 |
-0.511584 |
-1.061349 |
-0.702928 |
2 |
-1.239068 |
-1.240555 |
-0.295705 |
0.209181 |
3 |
0.129779 |
1.718647 |
NaN |
-2.931403 |
4 |
-1.186445 |
1.482412 |
NaN |
-1.022705 |
a = pd.Series([np.nan, 2.5, np.nan, 3.5, 4.5, np.nan], index=['f', 'e', 'd', 'c', 'b', 'a'])
b = pd.Series(np.arange(len(a), dtype=np.float64), index=['f', 'e', 'd', 'c', 'b', 'a'])
b[-1] = np.nan
a
f NaN
e 2.5
d NaN
c 3.5
b 4.5
a NaN
dtype: float64
b
f 0.0
e 1.0
d 2.0
c 3.0
b 4.0
a NaN
dtype: float64
np.where(pd.isnull(a), b, a)
array([0. , 2.5, 2. , 3.5, 4.5, nan])
b[:-2].combine_first(a[2:])
a NaN
b 4.5
c 3.0
d 2.0
e 1.0
f 0.0
dtype: float64
df1 = pd.DataFrame({'a': [1., np.nan, 5., np.nan], 'b': [np.nan, 2., np.nan, 6.], 'c': range(2, 18, 4)})
df2 = pd.DataFrame({'a': [5., 4., np.nan, 3., 7.], 'b': [np.nan, 3., 4., 6., 8.]})
df1
|
a |
b |
c |
0 |
1.0 |
NaN |
2 |
1 |
NaN |
2.0 |
6 |
2 |
5.0 |
NaN |
10 |
3 |
NaN |
6.0 |
14 |
df2
|
a |
b |
0 |
5.0 |
NaN |
1 |
4.0 |
3.0 |
2 |
NaN |
4.0 |
3 |
3.0 |
6.0 |
4 |
7.0 |
8.0 |
df1.combine_first(df2)
|
a |
b |
c |
0 |
1.0 |
NaN |
2.0 |
1 |
4.0 |
2.0 |
6.0 |
2 |
5.0 |
4.0 |
10.0 |
3 |
3.0 |
6.0 |
14.0 |
4 |
7.0 |
8.0 |
NaN |
data = pd.DataFrame(np.arange(6).reshape((2, 3)), index=pd.Index(['Ohio', 'Colorado'], name='state'), columns=pd.Index(['one', 'two', 'three'], name='number'))
data
result = data.stack()
result
result.unstack()
result.unstack(0)
state |
Ohio |
Colorado |
number |
|
|
one |
0 |
3 |
two |
1 |
4 |
three |
2 |
5 |
data = pd.DataFrame(np.arange(6).reshape((2, 3)), index=pd.Index(['Ohio', 'Colorado'], name='state'), columns=pd.Index(['one', 'two', 'three'], name='number'))
data
number |
one |
two |
three |
state |
|
|
|
Ohio |
0 |
1 |
2 |
Colorado |
3 |
4 |
5 |
result = data.stack()
result
state number
Ohio one 0
two 1
three 2
Colorado one 3
two 4
three 5
dtype: int32
result.unstack()
number |
one |
two |
three |
state |
|
|
|
Ohio |
0 |
1 |
2 |
Colorado |
3 |
4 |
5 |
result.unstack(0)
state |
Ohio |
Colorado |
number |
|
|
one |
0 |
3 |
two |
1 |
4 |
three |
2 |
5 |
s1 = pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([4, 5, 6], index=['c', 'd', 'e'])
data2 = pd.concat([s1, s2], keys=['one', 'two'])
data2
data2.unstack()
data2.unstack().stack()
data2.unstack().stack(dropna=False)
one a 0.0
b 1.0
c 2.0
d 3.0
e NaN
two a NaN
b NaN
c 4.0
d 5.0
e 6.0
dtype: float64
s1 = pd.Series([0, 1, 2, 3], index=['a', 'b', 'c', 'd'])
s2 = pd.Series([4, 5, 6], index=['c', 'd', 'e'])
data2 = pd.concat([s1, s2], keys=['one', 'two'])
data2
one a 0
b 1
c 2
d 3
two c 4
d 5
e 6
dtype: int64
data2.unstack()
|
a |
b |
c |
d |
e |
one |
0.0 |
1.0 |
2.0 |
3.0 |
NaN |
two |
NaN |
NaN |
4.0 |
5.0 |
6.0 |
data2.unstack().stack()
one a 0.0
b 1.0
c 2.0
d 3.0
two c 4.0
d 5.0
e 6.0
dtype: float64
data2.unstack().stack(dropna=False)
one a 0.0
b 1.0
c 2.0
d 3.0
e NaN
two a NaN
b NaN
c 4.0
d 5.0
e 6.0
dtype: float64
df = pd.DataFrame({'left': result, 'right': result + 5}, columns=pd.Index(['left', 'right'], name='side'))
df
|
side |
left |
right |
state |
number |
|
|
Ohio |
one |
0 |
5 |
two |
1 |
6 |
three |
2 |
7 |
Colorado |
one |
3 |
8 |
two |
4 |
9 |
three |
5 |
10 |
df.unstack('state')
side |
left |
right |
state |
Ohio |
Colorado |
Ohio |
Colorado |
number |
|
|
|
|
one |
0 |
3 |
5 |
8 |
two |
1 |
4 |
6 |
9 |
three |
2 |
5 |
7 |
10 |
df.unstack('state').stack('side')
|
state |
Colorado |
Ohio |
number |
side |
|
|
one |
left |
3 |
0 |
right |
8 |
5 |
two |
left |
4 |
1 |
right |
9 |
6 |
three |
left |
5 |
2 |
right |
10 |
7 |
data
number |
one |
two |
three |
state |
|
|
|
Ohio |
0 |
1 |
2 |
Colorado |
3 |
4 |
5 |
data
number |
one |
two |
three |
state |
|
|
|
Ohio |
0 |
1 |
2 |
Colorado |
3 |
4 |
5 |
data = pd.DataFrame({'k1': ['one', 'two'] * 3 + ['two'], 'k2': [1, 1, 2, 3, 3, 4, 4]})
data
|
k1 |
k2 |
0 |
one |
1 |
1 |
two |
1 |
2 |
one |
2 |
3 |
two |
3 |
4 |
one |
3 |
5 |
two |
4 |
6 |
two |
4 |
data.duplicated()
0 False
1 False
2 False
3 False
4 False
5 False
6 True
dtype: bool
data.drop_duplicates()
|
k1 |
k2 |
0 |
one |
1 |
1 |
two |
1 |
2 |
one |
2 |
3 |
two |
3 |
4 |
one |
3 |
5 |
two |
4 |
data['v1'] = range(7)
data.drop_duplicates(['k1'])
|
k1 |
k2 |
v1 |
0 |
one |
1 |
0 |
1 |
two |
1 |
1 |
data.drop_duplicates(['k1', 'k2'], keep='last')
|
k1 |
k2 |
v1 |
0 |
one |
1 |
0 |
1 |
two |
1 |
1 |
2 |
one |
2 |
2 |
3 |
two |
3 |
3 |
4 |
one |
3 |
4 |
6 |
two |
4 |
6 |
data = pd.DataFrame({'food': ['bacon', 'pulled pork', 'bacon', 'Pastrami', 'corned beef', 'Bacon',
'pastrami', 'honey ham', 'nova lox'], 'ounces': [4, 3, 12, 6, 7.5, 8, 3, 5, 6]})
data
|
food |
ounces |
0 |
bacon |
4.0 |
1 |
pulled pork |
3.0 |
2 |
bacon |
12.0 |
3 |
Pastrami |
6.0 |
4 |
corned beef |
7.5 |
5 |
Bacon |
8.0 |
6 |
pastrami |
3.0 |
7 |
honey ham |
5.0 |
8 |
nova lox |
6.0 |
meat_to_animal = {
'bacon': 'pig',
'pulled pork': 'pig',
'pastrami': 'cow',
'corned beef': 'cow',
'honey ham': 'pig',
'nova lox': 'salmon'
}
lowercased = data['food'].str.lower()
lowercased
0 bacon
1 pulled pork
2 bacon
3 pastrami
4 corned beef
5 bacon
6 pastrami
7 honey ham
8 nova lox
Name: food, dtype: object
data['animal'] = lowercased.map(meat_to_animal)
data
|
food |
ounces |
animal |
0 |
bacon |
4.0 |
pig |
1 |
pulled pork |
3.0 |
pig |
2 |
bacon |
12.0 |
pig |
3 |
Pastrami |
6.0 |
cow |
4 |
corned beef |
7.5 |
cow |
5 |
Bacon |
8.0 |
pig |
6 |
pastrami |
3.0 |
cow |
7 |
honey ham |
5.0 |
pig |
8 |
nova lox |
6.0 |
salmon |
data['food'].map(lambda x: meat_to_animal[x.lower()])
0 pig
1 pig
2 pig
3 cow
4 cow
5 pig
6 cow
7 pig
8 salmon
Name: food, dtype: object
data = pd.Series([1., -999., 2., -999., -1000., 3.])
data
0 1.0
1 -999.0
2 2.0
3 -999.0
4 -1000.0
5 3.0
dtype: float64
data.replace(-999, np.nan)
0 1.0
1 NaN
2 2.0
3 NaN
4 -1000.0
5 3.0
dtype: float64
data.replace([-999, -1000], [np.nan, 0])
0 1.0
1 NaN
2 2.0
3 NaN
4 0.0
5 3.0
dtype: float64
data.replace({-999: np.nan, -1000: 0})
0 1.0
1 NaN
2 2.0
3 NaN
4 0.0
5 3.0
dtype: float64
data = pd.DataFrame(np.arange(12).reshape((3, 4)), index=['Ohio', 'Colorado', 'New York'], columns=['one', 'two', 'three', 'four'])
data.rename(index=str.title, columns=str.upper)
|
ONE |
TWO |
THREE |
FOUR |
Ohio |
0 |
1 |
2 |
3 |
Colorado |
4 |
5 |
6 |
7 |
New York |
8 |
9 |
10 |
11 |
transform = lambda x: x[:4].upper()
data.index.map(transform)
Index(['OHIO', 'COLO', 'NEW '], dtype='object')
data.index=data.index.map(transform)
data
|
one |
two |
three |
four |
OHIO |
0 |
1 |
2 |
3 |
COLO |
4 |
5 |
6 |
7 |
NEW |
8 |
9 |
10 |
11 |
data.rename(index={'OHIO': 'INDIANA'}, columns={'three': 'peekaboo'})
|
one |
two |
peekaboo |
four |
INDIANA |
0 |
1 |
2 |
3 |
COLO |
4 |
5 |
6 |
7 |
NEW |
8 |
9 |
10 |
11 |
data.rename(index={'OHIO': 'INDIANA'}, inplace=True)
data
|
one |
two |
three |
four |
INDIANA |
0 |
1 |
2 |
3 |
COLO |
4 |
5 |
6 |
7 |
NEW |
8 |
9 |
10 |
11 |
ages = [20, 22, 25, 27, 21, 23, 37, 31, 61, 45, 41, 32]
bins = [18, 25, 35, 60, 100]
cats = pd.cut(ages, bins)
cats
[(18, 25], (18, 25], (18, 25], (25, 35], (18, 25], ..., (25, 35], (60, 100], (35, 60], (35, 60], (25, 35]]
Length: 12
Categories (4, interval[int64, right]): [(18, 25] < (25, 35] < (35, 60] < (60, 100]]
cats.codes
array([0, 0, 0, 1, 0, 0, 2, 1, 3, 2, 2, 1], dtype=int8)
cats.categories
IntervalIndex([(18, 25], (25, 35], (35, 60], (60, 100]], dtype='interval[int64, right]')
pd.value_counts(cats)
(18, 25] 5
(25, 35] 3
(35, 60] 3
(60, 100] 1
dtype: int64
pd.cut(ages, [18, 26, 36, 61, 100], right=False)
[[18, 26), [18, 26), [18, 26), [26, 36), [18, 26), ..., [26, 36), [61, 100), [36, 61), [36, 61), [26, 36)]
Length: 12
Categories (4, interval[int64, left]): [[18, 26) < [26, 36) < [36, 61) < [61, 100)]
group_names = ['Youth', 'YoungAdult', 'MiddleAged', 'Senior']
pd.cut(ages, bins, labels=group_names)
['Youth', 'Youth', 'Youth', 'YoungAdult', 'Youth', ..., 'YoungAdult', 'Senior', 'MiddleAged', 'MiddleAged', 'YoungAdult']
Length: 12
Categories (4, object): ['Youth' < 'YoungAdult' < 'MiddleAged' < 'Senior']
data = np.random.rand(20)
data
array([0.12967787, 0.87168374, 0.24167497, 0.56688941, 0.22964312,
0.30205167, 0.88297675, 0.22349301, 0.18292263, 0.81072534,
0.25054152, 0.99378214, 0.78439125, 0.3970331 , 0.89049743,
0.51677834, 0.76808437, 0.54701119, 0.79386529, 0.25451132])
temp=pd.cut(data, 4, precision=2)
temp
[(0.13, 0.35], (0.78, 0.99], (0.13, 0.35], (0.56, 0.78], (0.13, 0.35], ..., (0.35, 0.56], (0.56, 0.78], (0.35, 0.56], (0.78, 0.99], (0.13, 0.35]]
Length: 20
Categories (4, interval[float64, right]): [(0.13, 0.35] < (0.35, 0.56] < (0.56, 0.78] < (0.78, 0.99]]
pd.value_counts(temp)
(0.13, 0.35] 8
(0.78, 0.99] 7
(0.35, 0.56] 3
(0.56, 0.78] 2
dtype: int64
data = np.random.randn(1000)
cats = pd.qcut(data, 4)
cats
[(-0.726, -0.00747], (-0.00747, 0.636], (-3.057, -0.726], (-3.057, -0.726], (-0.00747, 0.636], ..., (-0.726, -0.00747], (-0.726, -0.00747], (-0.726, -0.00747], (0.636, 2.834], (-3.057, -0.726]]
Length: 1000
Categories (4, interval[float64, right]): [(-3.057, -0.726] < (-0.726, -0.00747] < (-0.00747, 0.636] < (0.636, 2.834]]
pd.value_counts(cats)
(-3.057, -0.726] 250
(-0.726, -0.00747] 250
(-0.00747, 0.636] 250
(0.636, 2.834] 250
dtype: int64
pd.qcut(data, [0, 0.1, 0.5, 0.9, 1.])
[(-1.239, -0.00747], (-0.00747, 1.338], (-3.057, -1.239], (-3.057, -1.239], (-0.00747, 1.338], ..., (-1.239, -0.00747], (-1.239, -0.00747], (-1.239, -0.00747], (1.338, 2.834], (-3.057, -1.239]]
Length: 1000
Categories (4, interval[float64, right]): [(-3.057, -1.239] < (-1.239, -0.00747] < (-0.00747, 1.338] < (1.338, 2.834]]
data = pd.DataFrame(np.random.randn(1000, 4))
data.describe()
|
0 |
1 |
2 |
3 |
count |
1000.000000 |
1000.000000 |
1000.000000 |
1000.000000 |
mean |
-0.068024 |
0.015781 |
0.048655 |
-0.019467 |
std |
1.050557 |
0.963683 |
0.972374 |
1.031390 |
min |
-3.617567 |
-2.550853 |
-3.372664 |
-3.196753 |
25% |
-0.718715 |
-0.591289 |
-0.606569 |
-0.712316 |
50% |
-0.066156 |
0.004574 |
0.068207 |
0.000122 |
75% |
0.627520 |
0.662984 |
0.747493 |
0.673216 |
max |
2.940831 |
2.865724 |
3.369795 |
3.364796 |
col=data[2]
col[np.abs(col) > 3]
340 3.196054
445 -3.159953
533 -3.156547
628 3.369795
698 -3.372664
Name: 2, dtype: float64
data[(np.abs(data) > 3).any(1)]
|
0 |
1 |
2 |
3 |
55 |
-3.157032 |
-0.841691 |
1.018759 |
-0.018302 |
340 |
0.456149 |
0.854559 |
3.196054 |
0.353166 |
343 |
-3.283047 |
-0.316560 |
-0.121576 |
0.584322 |
407 |
-0.089158 |
-0.604724 |
1.028259 |
3.364796 |
445 |
0.300672 |
-0.848071 |
-3.159953 |
0.870023 |
533 |
-0.048864 |
0.152498 |
-3.156547 |
-0.968370 |
628 |
1.119083 |
0.171787 |
3.369795 |
-0.550373 |
698 |
-0.517293 |
-1.208259 |
-3.372664 |
-0.418606 |
824 |
-3.459360 |
-0.702142 |
0.325501 |
0.653165 |
873 |
-3.617567 |
-1.302917 |
-0.577524 |
0.859530 |
923 |
-0.920904 |
-0.103102 |
-0.581829 |
-3.196753 |
981 |
0.672200 |
-0.274157 |
-0.883970 |
-3.038320 |
data[np.abs(data) > 3] = np.sign(data) * 3
data.describe()
|
0 |
1 |
2 |
3 |
count |
1000.000000 |
1000.000000 |
1000.000000 |
1000.000000 |
mean |
-0.066507 |
0.015781 |
0.048779 |
-0.019597 |
std |
1.045975 |
0.963683 |
0.968296 |
1.029555 |
min |
-3.000000 |
-2.550853 |
-3.000000 |
-3.000000 |
25% |
-0.718715 |
-0.591289 |
-0.606569 |
-0.712316 |
50% |
-0.066156 |
0.004574 |
0.068207 |
0.000122 |
75% |
0.627520 |
0.662984 |
0.747493 |
0.673216 |
max |
2.940831 |
2.865724 |
3.000000 |
3.000000 |
df = pd.DataFrame(np.arange(5 * 4).reshape((5, 4)))
sampler = np.random.permutation(5)
sampler
array([1, 2, 0, 4, 3])
df
|
0 |
1 |
2 |
3 |
0 |
0 |
1 |
2 |
3 |
1 |
4 |
5 |
6 |
7 |
2 |
8 |
9 |
10 |
11 |
3 |
12 |
13 |
14 |
15 |
4 |
16 |
17 |
18 |
19 |
df.take(sampler)
|
0 |
1 |
2 |
3 |
1 |
4 |
5 |
6 |
7 |
2 |
8 |
9 |
10 |
11 |
0 |
0 |
1 |
2 |
3 |
4 |
16 |
17 |
18 |
19 |
3 |
12 |
13 |
14 |
15 |
df.sample(n=3)
|
0 |
1 |
2 |
3 |
4 |
16 |
17 |
18 |
19 |
1 |
4 |
5 |
6 |
7 |
2 |
8 |
9 |
10 |
11 |
choices = pd.Series([5, 7, -1, 6, 4])
draws = choices.sample(n=10, replace=True)
draws
1 7
0 5
4 4
0 5
3 6
3 6
1 7
4 4
1 7
1 7
dtype: int64
df = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'b'], 'data1': range(6)})
df
|
key |
data1 |
0 |
b |
0 |
1 |
b |
1 |
2 |
a |
2 |
3 |
c |
3 |
4 |
a |
4 |
5 |
b |
5 |
pd.get_dummies(df['key'])
|
a |
b |
c |
0 |
0 |
1 |
0 |
1 |
0 |
1 |
0 |
2 |
1 |
0 |
0 |
3 |
0 |
0 |
1 |
4 |
1 |
0 |
0 |
5 |
0 |
1 |
0 |
dummies = pd.get_dummies(df['key'], prefix='key')
df_with_dummy = df[['data1']].join(dummies)
df_with_dummy
|
data1 |
key_a |
key_b |
key_c |
0 |
0 |
0 |
1 |
0 |
1 |
1 |
0 |
1 |
0 |
2 |
2 |
1 |
0 |
0 |
3 |
3 |
0 |
0 |
1 |
4 |
4 |
1 |
0 |
0 |
5 |
5 |
0 |
1 |
0 |
data = {'Dave': 'dave@google.com', 'Steve': 'steve@gmail.com', 'Rob': 'rob@gmail.com', 'Wes': np.nan}
data = pd.Series(data)
data
Dave dave@google.com
Steve steve@gmail.com
Rob rob@gmail.com
Wes NaN
dtype: object
data.isnull()
Dave False
Steve False
Rob False
Wes True
dtype: bool
data.str.contains('gmail')
Dave False
Steve True
Rob True
Wes NaN
dtype: object
import re
pattern='([A-Z0-9._%+-]+)@([A-Z0-9._-]+)\\.([A-Z]{2,4})'
data.str.findall(pattern, flags=re.IGNORECASE)
Dave [(dave, google, com)]
Steve [(steve, gmail, com)]
Rob [(rob, gmail, com)]
Wes NaN
dtype: object
df = pd.DataFrame({'key1' : ['a', 'a', 'b', 'b', 'a'], 'key2' : ['one', 'two', 'one', 'two', 'one'],
'data1' : np.random.randn(5), 'data2' : np.random.randn(5)})
df
|
key1 |
key2 |
data1 |
data2 |
0 |
a |
one |
-0.083293 |
0.456279 |
1 |
a |
two |
-0.442362 |
-0.337304 |
2 |
b |
one |
0.244770 |
0.943875 |
3 |
b |
two |
0.862879 |
0.444040 |
4 |
a |
one |
0.858584 |
0.527193 |
grouped = df['data1'].groupby(df['key1'])
grouped
<pandas.core.groupby.generic.SeriesGroupBy object at 0x0000024CEBF7B880>
grouped.mean()
key1
a 0.110977
b 0.553824
Name: data1, dtype: float64
means = df['data1'].groupby([df['key1'], df['key2']]).mean()
means
key1 key2
a one 0.387646
two -0.442362
b one 0.244770
two 0.862879
Name: data1, dtype: float64
means.unstack()
key2 |
one |
two |
key1 |
|
|
a |
0.387646 |
-0.442362 |
b |
0.244770 |
0.862879 |
states = np.array(['Ohio', 'California', 'California', 'Ohio', 'Ohio'])
years = np.array([2005, 2005, 2006, 2005, 2006])
df['data1'].groupby([states, years]).mean()
California 2005 -0.442362
2006 0.244770
Ohio 2005 0.389793
2006 0.858584
Name: data1, dtype: float64
df.groupby('key1').mean()
|
data1 |
data2 |
key1 |
|
|
a |
0.110977 |
0.215390 |
b |
0.553824 |
0.693958 |
df.groupby(['key1', 'key2']).mean()
|
|
data1 |
data2 |
key1 |
key2 |
|
|
a |
one |
0.387646 |
0.491736 |
two |
-0.442362 |
-0.337304 |
b |
one |
0.244770 |
0.943875 |
two |
0.862879 |
0.444040 |
df.groupby(['key1', 'key2']).size()
key1 key2
a one 2
two 1
b one 1
two 1
dtype: int64
for name, group in df.groupby('key1'):
print(name)
print(group)
a
key1 key2 data1 data2
0 a one -0.083293 0.456279
1 a two -0.442362 -0.337304
4 a one 0.858584 0.527193
b
key1 key2 data1 data2
2 b one 0.244770 0.943875
3 b two 0.862879 0.444040
for (k1, k2), group in df.groupby(['key1', 'key2']):
print((k1, k2))
print(group)
('a', 'one')
key1 key2 data1 data2
0 a one -0.083293 0.456279
4 a one 0.858584 0.527193
('a', 'two')
key1 key2 data1 data2
1 a two -0.442362 -0.337304
('b', 'one')
key1 key2 data1 data2
2 b one 0.24477 0.943875
('b', 'two')
key1 key2 data1 data2
3 b two 0.862879 0.44404
pieces = dict(list(df.groupby('key1')))
pieces['b']
|
key1 |
key2 |
data1 |
data2 |
2 |
b |
one |
0.244770 |
0.943875 |
3 |
b |
two |
0.862879 |
0.444040 |
df.dtypes
key1 object
key2 object
data1 float64
data2 float64
dtype: object
grouped = df.groupby(df.dtypes, axis=1)
for dtype, group in grouped:
print(dtype)
print(group)
float64
data1 data2
0 -0.083293 0.456279
1 -0.442362 -0.337304
2 0.244770 0.943875
3 0.862879 0.444040
4 0.858584 0.527193
object
key1 key2
0 a one
1 a two
2 b one
3 b two
4 a one
people = pd.DataFrame(np.random.randn(5, 5), columns=['a', 'b', 'c', 'd', 'e'], index=['Joe', 'Steve', 'Wes', 'Jim', 'Travis'])
people.iloc[2:3, [1, 2]] = np.nan
people
|
a |
b |
c |
d |
e |
Joe |
0.231080 |
-0.440371 |
0.409642 |
-0.114867 |
0.328406 |
Steve |
-0.775944 |
-1.258328 |
-2.723042 |
0.615950 |
-1.263696 |
Wes |
1.965413 |
NaN |
NaN |
-1.284734 |
0.204553 |
Jim |
-0.097869 |
0.182042 |
0.061867 |
-0.648661 |
-0.217448 |
Travis |
-0.006042 |
-0.612533 |
0.537186 |
0.646037 |
1.339316 |
mapping = {'a': 'red', 'b': 'red', 'c': 'blue', 'd': 'blue', 'e': 'red', 'f' : 'orange'}
by_column = people.groupby(mapping, axis=1)
by_column.sum()
|
blue |
red |
Joe |
0.294776 |
0.119115 |
Steve |
-2.107092 |
-3.297967 |
Wes |
-1.284734 |
2.169965 |
Jim |
-0.586794 |
-0.133276 |
Travis |
1.183222 |
0.720741 |
map_series = pd.Series(mapping)
map_series
a red
b red
c blue
d blue
e red
f orange
dtype: object
people.groupby(map_series, axis=1).count()
|
blue |
red |
Joe |
2 |
3 |
Steve |
2 |
3 |
Wes |
1 |
2 |
Jim |
2 |
3 |
Travis |
2 |
3 |
people.groupby(len).sum()
|
a |
b |
c |
d |
e |
3 |
2.098624 |
-0.258329 |
0.471509 |
-2.048262 |
0.315510 |
5 |
-0.775944 |
-1.258328 |
-2.723042 |
0.615950 |
-1.263696 |
6 |
-0.006042 |
-0.612533 |
0.537186 |
0.646037 |
1.339316 |
key_list = ['one', 'one', 'one', 'two', 'two']
people.groupby([len, key_list]).min()
|
|
a |
b |
c |
d |
e |
3 |
one |
0.231080 |
-0.440371 |
0.409642 |
-1.284734 |
0.204553 |
two |
-0.097869 |
0.182042 |
0.061867 |
-0.648661 |
-0.217448 |
5 |
one |
-0.775944 |
-1.258328 |
-2.723042 |
0.615950 |
-1.263696 |
6 |
two |
-0.006042 |
-0.612533 |
0.537186 |
0.646037 |
1.339316 |
columns = pd.MultiIndex.from_arrays([['US', 'US', 'US', 'JP', 'JP'], [1, 3, 5, 1, 3]], names=['cty', 'tenor'])
hier_df = pd.DataFrame(np.random.randn(4, 5), columns=columns)
hier_df
cty |
US |
JP |
tenor |
1 |
3 |
5 |
1 |
3 |
0 |
0.860698 |
-0.379994 |
0.644758 |
-0.231480 |
0.346634 |
1 |
1.237142 |
0.038387 |
0.600247 |
0.431467 |
0.137392 |
2 |
-2.211133 |
1.528952 |
0.056726 |
-0.629724 |
-0.125510 |
3 |
-1.272170 |
-1.088555 |
-1.950819 |
-0.253229 |
0.910727 |
hier_df.groupby(level='cty', axis=1).count()
cty |
JP |
US |
0 |
2 |
3 |
1 |
2 |
3 |
2 |
2 |
3 |
3 |
2 |
3 |
df
|
key1 |
key2 |
data1 |
data2 |
0 |
a |
one |
-0.083293 |
0.456279 |
1 |
a |
two |
-0.442362 |
-0.337304 |
2 |
b |
one |
0.244770 |
0.943875 |
3 |
b |
two |
0.862879 |
0.444040 |
4 |
a |
one |
0.858584 |
0.527193 |
grouped = df.groupby('key1')
grouped['data1'].quantile(0.9)
key1
a 0.670209
b 0.801068
Name: data1, dtype: float64
def peak_to_peak(arr): return arr.max() - arr.min()
grouped.agg(peak_to_peak)
C:\Users\HP\AppData\Local\Temp\ipykernel_20412\238647417.py:2: FutureWarning: ['key2'] did not aggregate successfully. If any error is raised this will raise in a future version of pandas. Drop these columns/ops to avoid this warning.
grouped.agg(peak_to_peak)
|
data1 |
data2 |
key1 |
|
|
a |
1.300946 |
0.864497 |
b |
0.618109 |
0.499836 |
grouped.describe()
|
data1 |
data2 |
|
count |
mean |
std |
min |
25% |
50% |
75% |
max |
count |
mean |
std |
min |
25% |
50% |
75% |
max |
key1 |
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
a |
3.0 |
0.110977 |
0.671878 |
-0.442362 |
-0.262827 |
-0.083293 |
0.387646 |
0.858584 |
3.0 |
0.215390 |
0.479958 |
-0.337304 |
0.059488 |
0.456279 |
0.491736 |
0.527193 |
b |
2.0 |
0.553824 |
0.437069 |
0.244770 |
0.399297 |
0.553824 |
0.708352 |
0.862879 |
2.0 |
0.693958 |
0.353437 |
0.444040 |
0.568999 |
0.693958 |
0.818916 |
0.943875 |
tips = pd.read_csv('examples/tips.csv')
tips['tip_pct'] = tips['tip'] / tips['total_bill']
tips[:4]
---------------------------------------------------------------------------
FileNotFoundError Traceback (most recent call last)
Input In [135], in <cell line: 1>()
----> 1 tips = pd.read_csv('examples/tips.csv')
2 tips['tip_pct'] = tips['tip'] / tips['total_bill']
3 tips[:4]
File E:\anaconda\lib\site-packages\pandas\util\_decorators.py:311, in deprecate_nonkeyword_arguments.<locals>.decorate.<locals>.wrapper(*args, **kwargs)
305 if len(args) > num_allow_args:
306 warnings.warn(
307 msg.format(arguments=arguments),
308 FutureWarning,
309 stacklevel=stacklevel,
310 )
--> 311 return func(*args, **kwargs)
File E:\anaconda\lib\site-packages\pandas\io\parsers\readers.py:680, in read_csv(filepath_or_buffer, sep, delimiter, header, names, index_col, usecols, squeeze, prefix, mangle_dupe_cols, dtype, engine, converters, true_values, false_values, skipinitialspace, skiprows, skipfooter, nrows, na_values, keep_default_na, na_filter, verbose, skip_blank_lines, parse_dates, infer_datetime_format, keep_date_col, date_parser, dayfirst, cache_dates, iterator, chunksize, compression, thousands, decimal, lineterminator, quotechar, quoting, doublequote, escapechar, comment, encoding, encoding_errors, dialect, error_bad_lines, warn_bad_lines, on_bad_lines, delim_whitespace, low_memory, memory_map, float_precision, storage_options)
665 kwds_defaults = _refine_defaults_read(
666 dialect,
667 delimiter,
(...)
676 defaults={"delimiter": ","},
677 )
678 kwds.update(kwds_defaults)
--> 680 return _read(filepath_or_buffer, kwds)
File E:\anaconda\lib\site-packages\pandas\io\parsers\readers.py:575, in _read(filepath_or_buffer, kwds)
572 _validate_names(kwds.get("names", None))
574 ## Create the parser.
--> 575 parser = TextFileReader(filepath_or_buffer, **kwds)
577 if chunksize or iterator:
578 return parser
File E:\anaconda\lib\site-packages\pandas\io\parsers\readers.py:933, in TextFileReader.__init__(self, f, engine, **kwds)
930 self.options["has_index_names"] = kwds["has_index_names"]
932 self.handles: IOHandles | None = None
--> 933 self._engine = self._make_engine(f, self.engine)
File E:\anaconda\lib\site-packages\pandas\io\parsers\readers.py:1217, in TextFileReader._make_engine(self, f, engine)
1213 mode = "rb"
1214 ## error: No overload variant of "get_handle" matches argument types
1215 ## "Union[str, PathLike[str], ReadCsvBuffer[bytes], ReadCsvBuffer[str]]"
1216 ## , "str", "bool", "Any", "Any", "Any", "Any", "Any"
-> 1217 self.handles = get_handle( ## type: ignore[call-overload]
1218 f,
1219 mode,
1220 encoding=self.options.get("encoding", None),
1221 compression=self.options.get("compression", None),
1222 memory_map=self.options.get("memory_map", False),
1223 is_text=is_text,
1224 errors=self.options.get("encoding_errors", "strict"),
1225 storage_options=self.options.get("storage_options", None),
1226 )
1227 assert self.handles is not None
1228 f = self.handles.handle
File E:\anaconda\lib\site-packages\pandas\io\common.py:789, in get_handle(path_or_buf, mode, encoding, compression, memory_map, is_text, errors, storage_options)
784 elif isinstance(handle, str):
785 ## Check whether the filename is to be opened in binary mode.
786 ## Binary mode does not support 'encoding' and 'newline'.
787 if ioargs.encoding and "b" not in ioargs.mode:
788 ## Encoding
--> 789 handle = open(
790 handle,
791 ioargs.mode,
792 encoding=ioargs.encoding,
793 errors=errors,
794 newline="",
795 )
796 else:
797 ## Binary mode
798 handle = open(handle, ioargs.mode)
FileNotFoundError: [Errno 2] No such file or directory: 'examples/tips.csv'
frame = pd.DataFrame({'data1': np.random.randn(1000), 'data2': np.random.randn(1000)})
quartiles = pd.cut(frame.data1, 4)
quartiles[:10]
0 (-0.436, 1.211]
1 (-0.436, 1.211]
2 (1.211, 2.858]
3 (-0.436, 1.211]
4 (1.211, 2.858]
5 (-2.083, -0.436]
6 (-2.083, -0.436]
7 (1.211, 2.858]
8 (-0.436, 1.211]
9 (-2.083, -0.436]
Name: data1, dtype: category
Categories (4, interval[float64, right]): [(-3.737, -2.083] < (-2.083, -0.436] < (-0.436, 1.211] < (1.211, 2.858]]
def get_stats(group):
return {'min': group.min(), 'max': group.max(), 'count': group.count(), 'mean': group.mean()}
grouped = frame.data2.groupby(quartiles)
grouped.apply(get_stats).unstack()
|
min |
max |
count |
mean |
data1 |
|
|
|
|
(-3.737, -2.083] |
-1.417666 |
1.053207 |
15.0 |
-0.021193 |
(-2.083, -0.436] |
-2.815877 |
2.712397 |
296.0 |
-0.097675 |
(-0.436, 1.211] |
-2.950480 |
3.093977 |
568.0 |
0.006707 |
(1.211, 2.858] |
-2.621023 |
2.433423 |
121.0 |
-0.102975 |
grouping = pd.qcut(frame.data1, 10, labels=False)
grouped = frame.data2.groupby(grouping)
grouped.apply(get_stats).unstack()
|
min |
max |
count |
mean |
data1 |
|
|
|
|
0 |
-1.979885 |
2.546603 |
100.0 |
0.015369 |
1 |
-2.815877 |
2.560098 |
100.0 |
-0.189069 |
2 |
-2.367227 |
2.290479 |
100.0 |
-0.097540 |
3 |
-2.057884 |
3.093977 |
100.0 |
0.000976 |
4 |
-2.314728 |
2.157829 |
100.0 |
0.125600 |
5 |
-2.944465 |
1.991280 |
100.0 |
-0.092507 |
6 |
-2.503720 |
2.415097 |
100.0 |
-0.045530 |
7 |
-2.950480 |
2.553021 |
100.0 |
0.057286 |
8 |
-2.688502 |
2.356049 |
100.0 |
-0.059030 |
9 |
-2.339079 |
2.433423 |
100.0 |
-0.094356 |
tips.pivot_table(index=['day', 'smoker'])
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [139], in <cell line: 1>()
----> 1 tips.pivot_table(index=['day', 'smoker'])
NameError: name 'tips' is not defined
from io import StringIO
data = """\
Sample Nationality Handedness
1 USA Right-handed
2 Japan Left-handed
3 USA Right-handed
4 Japan Right-handed
5 Japan Left-handed
6 Japan Right-handed
7 USA Right-handed
8 USA Left-handed
9 Japan Right-handed
10 USA Right-handed"""
data = pd.read_table(StringIO(data), sep='\s+')
pd.crosstab(data.Nationality, data.Handedness, margins=True)
Handedness |
Left-handed |
Right-handed |
All |
Nationality |
|
|
|
Japan |
2 |
3 |
5 |
USA |
1 |
4 |
5 |
All |
3 |
7 |
10 |
pd.crosstab([tips.time, tips.day], tips.smoker, margins=True)
---------------------------------------------------------------------------
NameError Traceback (most recent call last)
Input In [142], in <cell line: 1>()
----> 1 pd.crosstab([tips.time, tips.day], tips.smoker, margins=True)
NameError: name 'tips' is not defined