Python file read and write operations

Read file

Open a file using the open() method (open() returns a file object, which is iterable):

>>> f =open('test.txt','r')

r means a text file, rb is a binary file. (The default value of the mode parameter is r)

If the file does not exist, the open() function will throw an error of IOError, and give an error code and detailed information to tell you that the file does not exist:

>>> f=open('test.txt','r')Traceback(most recent call last):
 File "<stdin>", line 1,in<module>
FileNotFoundError:[Errno 2] No such file or directory:'test.txt'

The file must be closed after use, because the file object will occupy the resources of the operating system, and the number of files that the operating system can open at the same time is also limited

>>> f.close()

Since IOError may be generated during file reading and writing, once an error occurs, the following f.close() will not be called. Therefore, in order to ensure that the file can be closed correctly regardless of whether there is an error, we can use try ... finally to achieve:

try:
 f =open('/path/to/file','r')print(f.read())finally:if f:
  f.close()

But it is too cumbersome to be so realistic every time, so Python introduced the with statement to automatically call the close() method for us:

withopen('/path/to/file','r')as f:print(f.read())

The python file object provides three "read" methods: read(), readline() and readlines(). Each method can accept a variable to limit the amount of data read each time.

read() reads the entire file each time, it is usually used to put the contents of the file into a string variable. If the file is larger than the available memory, to be on the safe side, you can call the read(size) method repeatedly, and read at most size bytes each time.
The difference between readlines() is that the latter reads the entire file at once, just like .read(). .readlines() automatically analyzes the contents of the file into a list of lines, which can be processed by Python's for ... in ... structure.
readline() only reads one line at a time, which is usually much slower than readlines(). Readline() should be used only when there is not enough memory to read the entire file at once.

Note: These three methods are to read in the'\n' at the end of each line, it will not remove the'\n' by default, we need to remove it manually.

In[2]:withopen('test1.txt','r')as f1:
 list1 = f1.readlines()
In[3]: list1
Out[3]:['111\n','222\n','333\n','444\n','555\n','666\n']

Remove'\n'

In[4]:withopen('test1.txt','r')as f1:
 list1 = f1.readlines()for i inrange(0,len(list1)):
 list1[i]= list1[i].rstrip('\n')
In[5]: list1
Out[5]:['111','222','333','444','555','666']

For read() and readline(),'\n' is also read in, but it can be displayed normally during print (because the'\n' in print is considered to mean a newline)

In[7]:withopen('test1.txt','r')as f1:
 list1 = f1.read()
In[8]: list1
Out[8]:'111\n222\n333\n444\n555\n666\n'
In[9]:print(list1)111222333444555666

In[10]:withopen('test1.txt','r')as f1:
 list1 = f1.readline()
In[11]: list1
Out[11]:'111\n'
In[12]:print(list1)111

An example of a python interview question:

There are two files, each with many lines of ip address, find the same ip address in the two files:

# coding:utf-8import bisect

withopen('test1.txt','r')as f1:
 list1 = f1.readlines()for i inrange(0,len(list1)):
 list1[i]= list1[i].strip('\n')withopen('test2.txt','r')as f2:
 list2 = f2.readlines()for i inrange(0,len(list2)):
 list2[i]= list2[i].strip('\n')

list2.sort()
length_2 =len(list2)
same_data =[]for i in list1:
 pos = bisect.bisect_left(list2, i)if pos <len(list2) and list2[pos]== i:
  same_data.append(i)
same_data =list(set(same_data))print(same_data)

The main points are: (1) Use with (2) Process the'\n' at the end of the line (3) Use binary search to improve algorithm efficiency. (4) Use set to quickly remove duplicates.

Write file###

Writing a file is the same as reading a file. The only difference is that when the open() function is called, the identifier 'w' or 'wb' is passed in to indicate writing a text file or writing a binary file:

>>> f =open('test.txt','w') #if'wb'Means writing binary files
>>> f.write('Hello, world!')>>> f.close()

Note: The mode of'w' is Jiangzi: if there is no such file, create one; if there is, then the contents of the original file will be cleared first and then new things will be written. So if you don't want to clear the original content but append new content directly, use the'a' mode.

We can call write() repeatedly to write the file, but we must call f.close() to close the file. When we write a file, the operating system often does not write the data to the disk immediately, but puts it in the memory cache, and then writes it slowly when it is free. Only when the close() method is called, the operating system guarantees that all unwritten data is written to the disk. The consequence of forgetting to call close() is that only part of the data may be written to the disk, and the rest is lost. So, use the with statement to be insured:

withopen('test.txt','w')as f:
 f.write('Hello, world!')

The python file object provides two "write" methods: write() and writelines().

The write() method corresponds to the read() and readline() methods, and writes the string to the file.
The writelines() method corresponds to the readlines() method and is also an operation for list. It receives a list of strings as parameters and writes them to the file. Newline characters will not be automatically added. Therefore, you need to explicitly add newline characters.

f1 =open('test1.txt','w')
f1.writelines(["1","2","3"])
# At this time test1.The content of txt is:123

f1 =open('test1.txt','w')
f1.writelines(["1\n","2\n","3\n"])
# At this time test1.The content of txt is:
#    1
#    2        
#    3

Regarding the mode parameter of open():

' r': read

' w': write

' a': Append

' r+' == r+w (read and write, if the file does not exist, an error (IOError) will be reported)

' w+' == w+r (read and write, if the file does not exist, create it)

' a+' ==a+r (can be appended and writable, if the file does not exist, it will be created)

Correspondingly, if it is a binary file, just add a b:

' rb'　　'wb'　　'ab'　　'rb+'　　'wb+'　　'ab+'

JSON

JSON (JavaScript Object Notation, JS Object Notation) is a lightweight data exchange format. The JSON data format is actually the dictionary format in python, which can contain an array enclosed in square brackets, which is a list in python.

In python, there are special modules for processing json format-json and picle modules

The Json module provides four methods: dumps, dump, loads, load

The pickle module also provides four functions: dumps, dump, loads, load

Dumps and dump:

dumps and dump serialization methods

dumps only completed serialization to str,

dump must pass the file descriptor and save the serialized str to the file

View source code:

def dumps(obj, skipkeys=False, ensure_ascii=True, check_circular=True,
  allow_nan=True, cls=None, indent=None, separators=None,default=None, sort_keys=False,**kw):
 # Serialize ``obj`` to a JSON formatted ``str``.
 # The serial number &quot;obj&quot; data type is converted to a string in JSON format

def dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True,
  allow_nan=True, cls=None, indent=None, separators=None,default=None, sort_keys=False,**kw):"""Serialize ``obj``as a JSON formatted stream to ``fp``(a
 ``. write()``-supporting file-like object).
  I understand it as two actions, one action is to convert &quot;obj&quot; into a string in JSON format, and the other action is to write a string into a file, which means that the file descriptor fp is a required parameter"""

Sample code:

>>> import json
>>> json.dumps([])    #dumps can format all basic data types as strings
'[]'>>> json.dumps(1)    #digital
'1'>>> json.dumps('1')   #String
'"1"'>>> dict ={"name":"Tom","age":23}>>> json.dumps(dict)     #dictionary
'{" name": "Tom", "age": 23}'

a ={"name":"Tom","age":23}withopen("test.json","w", encoding='utf-8')as f:
 # indent is super easy to use, formatted to save the dictionary, the default is None, less than 0 is zero spaces
 f.write(json.dumps(a, indent=4))
 # json.dump(a,f,indent=4)   #Same effect as above

Saved file effect:

loads and load

Load and load deserialization method

loads only completed deserialization,

load only receives file descriptors, completes file reading and deserialization

View source code:

def loads(s, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None,**kw):"""Deserialize ``s``(a ``str`` instance containing a JSON document) to a Python object.
  Deserialize a JSON document containing str type into a python object"""

def load(fp, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None,**kw):"""Deserialize ``fp``(a ``.read()``-supporting file-like object containing a JSON document) to a Python object.
  Serialize a readable file containing JSON format data into a python object"""

Examples:

>>> json.loads('{"name":"Tom", "age":23}'){'age':23,'name':'Tom'}import json
withopen("test.json","r", encoding='utf-8')as f:
 aa = json.loads(f.read())
 f.seek(0)
 bb = json.load(f)    #With json.loads(f.read())print(aa)print(bb)

# Output:
{' name':'Tom','age':23}{'name':'Tom','age':23}

json and picle modules

Both the json module and the picle module have four methods: dumps, dump, loads, and load, and the usage is the same.

What's not necessary is that the json module serializes in a common format, which is recognized by other programming languages, which is a normal string.

The picle module serialized only python can recognize, and other programming languages don’t recognize it as garbled characters

But picle can serialize the function, but other files want to use the function, and the file definition is required in the file (the definition and parameters must be the same, and the content can be different)

The correspondence between python object (obj) and json object

+- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - +| Python            | JSON          |+===================+===============+| dict              | object        |+-------------------+---------------+| list, tuple       | array         |+-------------------+---------------+| str               | string        |+-------------------+---------------+| int, float        | number        |+-------------------+---------------+| True              |true|+-------------------+---------------+| False             |false|+-------------------+---------------+| None              |null|+-------------------+---------------+

V. Summary

json serialization method:

dumps: No file operation dump: serialization + write file

json deserialization method:

loads: no file operation load: read file + deserialization

The data serialized by the json module is more general

The data serialized by the picle module is only available in python, but it is powerful and can be a serial number function

For the data types that the json module can serialize and deserialize, see the correspondence table of python objects (obj) and json objects
Format and write files using indent = 4

OS.PATH

split

Split the directory name and return a tuple given by its directory name and base name
Split a pathname.  Returns tuple "(head, tail)" where "tail" is
everything after the final slash.  Either part may be empty.

>>> os.path.split("/tmp/f1.txt")('/tmp','f1.txt')>>> os.path.split("/home/test.sh")('/home','test.sh')

splitext

Split file name and return a tuple consisting of file name and extension
Split the extension from a pathname.
Extension is everything from the last dot to the end, ignoring
leading dots.  Returns "(root, ext)"; ext may be empty.

>>> os.path.splitext("/home/test.sh")('/home/test','.sh')>>> os.path.splitext("/tmp/f1.txt")('/tmp/f1','.txt')

# Rename file:>>> os.rename('test.txt','test.py')
# Delete file:>>> os.remove('test.py')

# View the absolute path of the current directory:>>> os.path.abspath('.')'/Users/michael'
# Create a new directory in a directory, first show the full path of the new directory:>>> os.path.join('/Users/michael','testdir')'/Users/michael/testdir'
# Then create a directory:>>> os.mkdir('/Users/michael/testdir')
# Delete a directory:>>> os.rmdir('/Users/michael/testdir')