Python IO

File opening and closing##

File opening and closing are two functions, an open function and a close function

Prototype of open function

open(file, mode='r', buffering=-1, encoding=None, errors=None, newline=None, closefd=True, opener=None)

As mentioned earlier, the open function returns a file-like object, but this file-like object is not fixed, and the type of this object will change with the open mode.

Open the file in text mode ('w','r','wt','rt', etc.), and return a TextIOWrapper.
When the file is opened in binary mode, the returned object will also change.
In binary reading mode, a BufferedReader is returned.
In binary write mode and binary append mode, a BufferedWriter is returned.
In binary read/write mode, a BufferedRandom is returned.

In [1]: f =open('./hello.py')	#Open directly with the open function, if the file does not exist, FileNotFoundError will occur
---------------------------------------------------------------------------
FileNotFoundError                         Traceback(most recent call last)<ipython-input-1-b6df97277b77>in<module>()---->1 f =open('./hello.py')

FileNotFoundError:[Errno 2] No such file or directory:'./hello.py'

In [2]: f =open('./hello.py')	#After creating the file, you can open it and return a file-like object

In [3]: f.read()	#Read out the entire contents of the file
Out[3]:"#!/usr/bin/env python\n# coding=utf-8\nprint('hello world')\n"

In [4]: f.close()	#Close file

File read and write##

File reading and writing are mainly read and write and their variants. File reading and writing depends on the mode parameter of the open function.

The mode parameter of the open function###

The specific meaning of Mode is as follows

‘r’ open for reading (default)
‘w’ open for writing, truncating the file first
‘x’ create a new file and open it for writing
‘a’ open for writing, appending to the end of the file if it exists
‘b’ binary mode
’t’ text mode (default)
‘+’ open a disk file for updating (reading and writing)
‘U’ universal newline mode (deprecated)

Description:

When mode='x', if the file does not exist, an exception FileExistsError will be thrown.
When mode='w', as long as the file is opened, even if no content is written, the file will be cleared first.
When the mode contains +, additional read and write operations will be added, that is, it was originally read-only, and writable operations will be added. It turned out to be write-only, and read operations will be added, but + does not change other behaviors.

mode=t&mode=b

mode=t Operate by character
mode=b Operate by byte

In [1]: f =open('./hello.py', mode='rt')	# mode=t The content read is a string

In [2]: s = f.read()

In [3]: s
Out[3]:"#!/usr/bin/env python\n# coding=utf-8\nprint('hello world')\n"

In [4]:type(s)	#s is of type str
Out[4]: str

In [5]: f.close()

In [6]: f =open('./hello.py', mode='rb')	# mode=b read bytes

In [7]: s = f.read()

In [8]: s
Out[8]: b"#!/usr/bin/env python\n# coding=utf-8\nprint('hello world')\n"

In [9]:type(s)
Out[9]: bytes

File pointer###

When opening a file, the interpreter will hold a pointer to a certain location in the file. When we read and write files, we always start from the pointer and move the pointer backwards. When mode=r, the pointer points to 0 (start of file), when mode=a, the pointer points to EOF (end of file)

The two functions related to file pointers are tell function and seek function

tell function

Returns the position of the current stream. For a file, it is the position of the file stream, that is, the position of the file pointer.

seek function

Change the position of the file stream and return the new absolute position.

seek(cookie, whence=0,/) method of _io.TextIOWrapper instance

Summary of file pointers

When seek exceeds the end of the file, there will be no exceptions, and tell will also exceed the end of the file, but when writing data, it will still write from the end of the file.

The write operation starts at min(EOF, tell())

File pointer is operated by byte (no matter in character mode or byte mode)
The tell method returns the current file pointer position
seek method to move the file pointer
The whence parameter SEEK_SET(0) starts from 0 and moves backward by offset bytes, SEEK_CUR(1) moves backward from the current position by offset bytes, and SEEK_END(2) moves backward from EOF by offset bytes
offset is an integer
When mode is t and whence is SEEK_CUR or SEEK_END, offset can only be 0
File pointer cannot be negative
When reading a file, start reading backward from the file pointer (pos)
When writing files, start writing backwards from min(EOF,pos)
When opening in append mode, no matter where the file pointer is, it starts writing from EOF

File buffer###

The file buffer is determined by the buffering parameter of the open function, buffering represents the buffering mode, and the default value of the parameter is -1, which means that both text mode and binary mode use the default buffer.

buffering=-1

Binary mode: DEFAULT_BUFFER_SIZE
Text mode: DEFAULT_BUFFER_SIZE

buffering=0

Binary mode: unbuffered
Text mode: Not allowed

buffering=1

Binary mode: 1
Text mode: line buffering

buffering>1

Binary mode: buffering
Text mode: DEFAULT_BUFFER_SIZE

to sum up

Binary mode: Determine whether the remaining position of the buffer is enough to store the current byte. If not, flush first, and then write the current byte into the buffer. If the current byte is larger than the buffer size, flush directly.
Text mode: line buffering, flush when encountering a newline, non-line buffering, if the current byte plus the bytes in the buffer exceeds the buffer size, directly flush both the buffer and the current byte.
Flush and close can force the buffer to be flushed.

Context management##

Context management will automatically close the file when leaving, but will not open a new scope.

In [1]:withopen('./hello.py')as f:...:     pass
    ...: 

In [2]: f.readable()	#After leaving the context management, the file has been closed and can no longer be I/O operation
---------------------------------------------------------------------------
ValueError                                Traceback(most recent call last)<ipython-input-18-97a5eee249a2>in<module>()---->1 f.readable()

ValueError: I/O operation on closed file	

In [3]: f
Out[3]:<_io.TextIOWrapper name='./hello.py' mode='r' encoding='UTF-8'>

In [4]: f.closed	#f is closed
Out[4]: True

In addition to with open('./hello.py') as f: for context management, there is another way of writing

In [21]: f =open('./hello.py')

In [22]:with f:...:     pass
    ...:

File-like object##

Objects with a read() method returned by the open() function are collectively called file-like objects in Python. In addition to file, it can also be a byte stream of memory, a network stream, a custom stream, and so on. Common ones are StringIO and BytesIO.

StringIO

StringIO, as its name implies, reads and writes str in memory.

To write str to StringIO, we need to create a StringIO object first, and then write and read it as an item file. The operations supported by file are basically supported by StringIO.

In [1]:from io import StringIO

In [2]:help(StringIO)

In [3]: sio =StringIO()	#Create a StringIO object, you can also use str to initialize StringIO

In [4]: sio.write('hello world')
Out[4]:11

In [5]: sio.write(' !')
Out[5]:2

In [6]: sio.getvalue()	# getvalue()The method is used to obtain the written str.
Out[6]:'hello world !'

In [7]: sio.closed
Out[7]: False

In [8]: sio.readline()
Out[8]:''

In [9]: sio.seekable()
Out[9]: True

In [10]: sio.seek(0,0)	#Support seek operation
Out[10]:0

In [11]: sio.readline()
Out[11]:'hello world !'

To read StringIO, you can initialize StringIO with a str, and then read it like a file:

In [1]:from io import StringIO

In [2]: sio =StringIO('I\nlove\npython!')

In [3]:for line in sio.readlines():...:print(line.strip())...:     
I
love
python!

BytesIO

StringIO can only operate on str. If you want to manipulate binary data, you need to use BytesIO.

BytesIO realizes reading and writing bytes in memory, we create a BytesIO, and then write some bytes:

In [1]:from io import BytesIO

In [2]: bio =BytesIO()

In [3]: bio.write(b'abcd')
Out[3]:4

In [4]: bio.seek(0)
Out[4]:0

In [5]: bio.read()
Out[5]: b'abcd'

In [6]: bio.getvalue()	#getvalue can have everything alone at once, no matter where the file pointer is
Out[6]: b'abcd'

Similar to StringIO, BytesIO can be initialized with one bytes, and then read like a file:

In [1]:from io import BytesIO

In [2]: bio =BytesIO(b'abcd')

In [3]: bio.read()
Out[3]: b'abcd'

Path manipulation pathlib

There are two ways of path manipulation, os.path and pathlib.

os.path is the way to manipulate the path in a string: import os
pathlib is an object-oriented design file system path: import pathlib

Pathlib is supported by default since python3.2 and above. If you want to use pathlib in python2.7, you need to install it

pip install pathlib

For the source code of the pathlib module, see: Lib/pathlib.py

Directory operations###

The basic use of the pathlib directory is the Path class in the pathlib module.

In [1]:import pathlib	#Introduce the pathlib module

In [2]: cwd = pathlib.Path('.')	#Use the Path class of the pathlib module to initialize the current path, the parameter is a PurePath

In [3]: cwd	#The return value is a PosixPath, if it is a windows environment, it will return a WindowsPath
Out[3]:PosixPath('.')

Through help(pathlib.Path), you can view the various Methods of the Path class.

Help on classPathin module pathlib:classPath(PurePath)|  PurePath represents a filesystem path and offers operations which
 | don't imply any actual filesystem I/O.  Depending on your system,|  instantiating a PurePath will return either a PurePosixPath or a
 | PureWindowsPath object.  You can also instantiate either of these classes
 | directly, regardless of your system.||  Method resolution order:|      Path
 |  PurePath
 |  builtins.object
 || Methods defined here:||__enter__(self)||__exit__(self, t, v, tb)|...

Several functions for directory operations:

is_dir(self): Determine whether the path is a directory
iterdir(self): A generator that generates all files (including folders) under the current path, but will not yield the two paths of'.' and'..'
mkdir(self, mode=511, parents=False, exist_ok=False): delete the current directory, you can specify mode
rmdir(self): delete the directory, and the directory must be empty, otherwise an error will be reported

Examples of use are as follows

In [4]: cwd.is_dir()
Out[4]: True

In [5]: cwd.iterdir()	#The iterdir function returns a generator
Out[5]:<generator object Path.iterdir at 0x7f6727d926d0>

In [6]:for f in cwd.iterdir():	#Will not generate'.'with'..'...:print(type(f))...:print(f)...:<class'pathlib.PosixPath'>
hello.py
< class'pathlib.PosixPath'>
aa.py

In [7]: cwd.mkdir('abc')	#mkdir of pathlib is a method of path object
---------------------------------------------------------------------------
TypeError                                 Traceback(most recent call last)<ipython-input-7-3b48dd61eb0f>in<module>()---->1 cwd.mkdir('abc')/home/clg/.pyenv/versions/3.5.2/lib/python3.5/pathlib.py inmkdir(self, mode, parents, exist_ok)1212if not parents:1213try:->1214                 self._accessor.mkdir(self, mode)1215             except FileExistsError:1216if not exist_ok or not self.is_dir():/home/clg/.pyenv/versions/3.5.2/lib/python3.5/pathlib.py inwrapped(pathobj,*args)369         @functools.wraps(strfunc)370         def wrapped(pathobj,*args):-->371returnstrfunc(str(pathobj),*args)372returnstaticmethod(wrapped)373 

TypeError: an integer is required(got type str)

In [8]: d = pathlib.Path('./abc')

In [9]: d.exists()
Out[9]: False

In [10]: d.mkdir(755)	 #Create a folder, but 755 is not equal to 0o755(Octal)

In [11]:%ls
aa.py  abc/  hello.py

In [12]:%ls -ld ./abc
d-wxrw---t.2 clg clg 6 Feb 1321:01./abc/	#There is a problem with the mode specified, so the permissions are not normal

In [13]: d.rmdir()

In [14]: d.exists()
Out[14]: False

In [15]: d.mkdir(0o755)	#Specify mode using octal

In [16]:%ls -ld ./abc
drwxr-xr-x.2 clg clg 6 Feb 1321:03./abc/

General operations###

Mainly general operations of some paths

In [17]: f = pathlib.Path('./ab/cd/a.txt')

In [18]: f.exists()
Out[18]: False

In [19]: f.is_file()
Out[19]: False

In [20]: f.is_absolute()
Out[20]: False

In [21]: f = pathlib.Path('./hello.py')

In [22]: f.is_file()
Out[22]: True

In [23]: f.is_absolute()
Out[23]: False

In [24]: f.absolute()	#Get the absolute path of the path
Out[24]:PosixPath('/home/clg/workspace/subworkspace/hello.py')

In [25]: f.chmod(0o755)	#Permission to change path

In [26]:%ls -ld ./hello.py
- rwxr-xr-x.1 clg clg 58 Feb  813:32./hello.py*

In [27]: f.cwd()	#Return a new path to the current working directory
Out[27]:PosixPath('/home/clg/workspace/subworkspace')

In [28]: f.home()
Out[28]:PosixPath('/home/clg')

In [29]: pathlib.Path('~').expanduser()	#will~Absolute path of successful conversion
Out[29]:PosixPath('/home/clg')

In [30]: f.name()	#name is an attribute, not a method
---------------------------------------------------------------------------
TypeError                                 Traceback(most recent call last)<ipython-input-30-f0ea48ccc8ff>in<module>()---->1 f.name()

TypeError:'str' object is not callable

In [31]: f.name	#Get the base name basename
Out[31]:'hello.py'

In [32]: f.home().name
Out[32]:'clg'

In [33]: f.owner()	#Get owner
Out[33]:'clg'

In [34]: f.home().parent
Out[34]:PosixPath('/home')

In [35]: f.parts
Out[35]:('hello.py',)

In [36]: f.absolute().parts	#Get path split
Out[36]:('/','home','clg','workspace','subworkspace','hello.py')

In [37]: f.root	#Get the root directory, but'./hello.py'What you get is'.'
Out[37]:''

In [38]: f.home().root	#Get the root directory
Out[38]:'/'

In [39]: f.suffix	#Get suffix
Out[39]:'.py'

In [40]: f.stat()	#Similar to os.stat(), Return various information of the path
Out[40]: os.stat_result(st_mode=33261, st_ino=34951327, st_dev=64768, st_nlink=1, st_uid=1000, st_gid=1000, st_size=58, st_atime=1486531928, st_mtime=1486531926, st_ctime=1486995977)

In [41]: f.stat().st_mode	#Get stat()How to return each information in the result: use'.'
Out[41]:33261

In [42]: d = pathlib.Path('..')

In [43]:for x in d.glob(*.py):	# rglob(self, pattern)Parameter is a pattern
 File "<ipython-input-43-3fdfb8e408ac>", line 1for x in d.glob(*.py):^
SyntaxError: invalid syntax

In [44]:for x in d.glob('*.py'):	#Return the wildcard file in the current path
 ...: print(x)...:../judge.py
.. /progress.py
.. /zipperMethod.py
.. /decorator.py

In [45]:for x in d.rglob('*.py'):	#Return wildcard files under the current path and its sub-paths (recursively)
 ...: print(x)...:../judge.py
.. /progress.py
.. /zipperMethod.py
.. /decorator.py
.. /subworkspace/hello.py
.. /subworkspace/aa.py

File Copy, Move and Delete###

Use the shutil module

import shutil

shutil.copyfileobj # The operation object is a file object
shutil.copyfile # Only copy content
shutil.copymode # Only copy permissions
shutil.copystat # Only copy metadata
shutil.copy # Copy file content and permissions copyfile + copymode
shutil.copy2 # Copy file content and metadata copyfile + copystat
shutil.copytree # copy directories recursively
shutil.rmtree # Used to delete directories recursively
shutil.move # The specific implementation depends on the operating system. If the operating system implements the rename system call, go directly to the rename system call. If not, use copytree to copy first, and then use rmtree to delete the source file

Serialization and Deserialization##

Serialization: Convert objects into data
Deserialization: Convert data into objects

Python private protocol pickle

pickle is a private serialization protocol for Python

See the pickle source code: lib/python3.5/pickle.py

Main function

dumps The object is exported as data, that is, serialized
loads data is loaded as an object, that is, deserialized. When an object is deserialized, the object class must exist

In [1]:import pickle

In [2]:classA:	#Declare a class A
 ...:  def print(self):...:print('aaaa')...:         

In [3]: a =A()	#Define an object a of class A

In [4]: pickle.dumps(a)	#Object export as data
Out[4]: b'\x80\x03c__main__\nA\nq\x00)\x81q\x01.'

In [5]: b = pickle.dumps(a)

In [6]: pickle.loads(b)	#Export data as objects
Out[6]:<__main__.A at 0x7f5dcdc71dd8>

In [7]: a
Out[7]:<__main__.A at 0x7f5dcdd28be0>	#The addresses of the two objects are different, but the contents of the two objects are indeed the same

In [8]: aa = pickle.loads(b)

In [9]: a.print()	#The print function of the original object
aaaa

In [10]: aa.print()	#The print function of the deserialized object
aaaa

General json protocol###

The data types supported by JSON format are as follows

Type	Description
Number	Double precision floating point format in JavaScript
String	Unicode backslash escaped double quotation marks, corresponding to str
Boolean	true or false
Array	An ordered sequence of values, corresponding to list
Value	It can be a string, a number, true or false (true/false), empty (null), etc.
Object	Unordered collection of key-value pairs, corresponding to dict in python
Whitespace	Can use tokens in any pair
null	empty

Examples of use are as follows

In [1]:import json

In [2]: d ={'a':1,'b':[1,2,3]}

In [3]: json.dumps(d)
Out[3]:'{"a": 1, "b": [1, 2, 3]}'

In [4]: json.loads('{"a": 1, "b": [1, 2, 3]}')
Out[4]:{'a':1,'b':[1,2,3]}

json reference: JSON data format