Read file
Open a file using the open() method (open() returns a file object, which is iterable):
>>> f =open('test.txt','r')
r means a text file, rb is a binary file. (The default value of the mode parameter is r)
If the file does not exist, the open()
function will throw an error of IOError
, and give an error code and detailed information to tell you that the file does not exist:
>>> f=open('test.txt','r')Traceback(most recent call last):
File "<stdin>", line 1,in<module>
FileNotFoundError:[Errno 2] No such file or directory:'test.txt'
The file must be closed after use, because the file object will occupy the resources of the operating system, and the number of files that the operating system can open at the same time is also limited
>>> f.close()
Since IOError
may be generated during file reading and writing, once an error occurs, the following f.close()
will not be called. Therefore, in order to ensure that the file can be closed correctly regardless of whether there is an error, we can use try ... finally
to achieve:
try:
f =open('/path/to/file','r')print(f.read())finally:if f:
f.close()
But it is too cumbersome to be so realistic every time, so Python introduced the with
statement to automatically call the close()
method for us:
withopen('/path/to/file','r')as f:print(f.read())
The python file object provides three "read" methods: read(), readline() and readlines(). Each method can accept a variable to limit the amount of data read each time.
read(size)
method repeatedly, and read at most size bytes each time.Note: These three methods are to read in the'\n' at the end of each line, it will not remove the'\n' by default, we need to remove it manually.
In[2]:withopen('test1.txt','r')as f1:
list1 = f1.readlines()
In[3]: list1
Out[3]:['111\n','222\n','333\n','444\n','555\n','666\n']
Remove'\n'
In[4]:withopen('test1.txt','r')as f1:
list1 = f1.readlines()for i inrange(0,len(list1)):
list1[i]= list1[i].rstrip('\n')
In[5]: list1
Out[5]:['111','222','333','444','555','666']
For read() and readline(),'\n' is also read in, but it can be displayed normally during print (because the'\n' in print is considered to mean a newline)
In[7]:withopen('test1.txt','r')as f1:
list1 = f1.read()
In[8]: list1
Out[8]:'111\n222\n333\n444\n555\n666\n'
In[9]:print(list1)111222333444555666
In[10]:withopen('test1.txt','r')as f1:
list1 = f1.readline()
In[11]: list1
Out[11]:'111\n'
In[12]:print(list1)111
An example of a python interview question:
There are two files, each with many lines of ip address, find the same ip address in the two files:
# coding:utf-8import bisect
withopen('test1.txt','r')as f1:
list1 = f1.readlines()for i inrange(0,len(list1)):
list1[i]= list1[i].strip('\n')withopen('test2.txt','r')as f2:
list2 = f2.readlines()for i inrange(0,len(list2)):
list2[i]= list2[i].strip('\n')
list2.sort()
length_2 =len(list2)
same_data =[]for i in list1:
pos = bisect.bisect_left(list2, i)if pos <len(list2) and list2[pos]== i:
same_data.append(i)
same_data =list(set(same_data))print(same_data)
The main points are: (1) Use with (2) Process the'\n' at the end of the line (3) Use binary search to improve algorithm efficiency. (4) Use set to quickly remove duplicates.
Writing a file is the same as reading a file. The only difference is that when the open()
function is called, the identifier 'w'
or 'wb'
is passed in to indicate writing a text file or writing a binary file:
>>> f =open('test.txt','w') #if'wb'Means writing binary files
>>> f.write('Hello, world!')>>> f.close()
Note: The mode of'w' is Jiangzi: if there is no such file, create one; if there is, then the contents of the original file will be cleared first and then new things will be written. So if you don't want to clear the original content but append new content directly, use the'a' mode.
We can call write()
repeatedly to write the file, but we must call f.close()
to close the file. When we write a file, the operating system often does not write the data to the disk immediately, but puts it in the memory cache, and then writes it slowly when it is free. Only when the close()
method is called, the operating system guarantees that all unwritten data is written to the disk. The consequence of forgetting to call close()
is that only part of the data may be written to the disk, and the rest is lost. So, use the with
statement to be insured:
withopen('test.txt','w')as f:
f.write('Hello, world!')
The python file object provides two "write" methods: write() and writelines().
f1 =open('test1.txt','w')
f1.writelines(["1","2","3"])
# At this time test1.The content of txt is:123
f1 =open('test1.txt','w')
f1.writelines(["1\n","2\n","3\n"])
# At this time test1.The content of txt is:
# 1
# 2
# 3
Regarding the mode parameter of open():
' r': read
' w': write
' a': Append
' r+' == r+w (read and write, if the file does not exist, an error (IOError) will be reported)
' w+' == w+r (read and write, if the file does not exist, create it)
' a+' ==a+r (can be appended and writable, if the file does not exist, it will be created)
Correspondingly, if it is a binary file, just add a b:
' rb' 'wb' 'ab' 'rb+' 'wb+' 'ab+'
JSON
JSON (JavaScript Object Notation, JS Object Notation) is a lightweight data exchange format. The JSON data format is actually the dictionary format in python, which can contain an array enclosed in square brackets, which is a list in python.
In python, there are special modules for processing json format-json and picle modules
The Json module provides four methods: dumps, dump, loads, load
The pickle module also provides four functions: dumps, dump, loads, load
dumps and dump serialization methods
dumps only completed serialization to str,
dump must pass the file descriptor and save the serialized str to the file
View source code:
def dumps(obj, skipkeys=False, ensure_ascii=True, check_circular=True,
allow_nan=True, cls=None, indent=None, separators=None,default=None, sort_keys=False,**kw):
# Serialize ``obj`` to a JSON formatted ``str``.
# The serial number "obj" data type is converted to a string in JSON format
def dump(obj, fp, skipkeys=False, ensure_ascii=True, check_circular=True,
allow_nan=True, cls=None, indent=None, separators=None,default=None, sort_keys=False,**kw):"""Serialize ``obj``as a JSON formatted stream to ``fp``(a
``. write()``-supporting file-like object).
I understand it as two actions, one action is to convert "obj" into a string in JSON format, and the other action is to write a string into a file, which means that the file descriptor fp is a required parameter"""
Sample code:
>>> import json
>>> json.dumps([]) #dumps can format all basic data types as strings
'[]'>>> json.dumps(1) #digital
'1'>>> json.dumps('1') #String
'"1"'>>> dict ={"name":"Tom","age":23}>>> json.dumps(dict) #dictionary
'{" name": "Tom", "age": 23}'
a ={"name":"Tom","age":23}withopen("test.json","w", encoding='utf-8')as f:
# indent is super easy to use, formatted to save the dictionary, the default is None, less than 0 is zero spaces
f.write(json.dumps(a, indent=4))
# json.dump(a,f,indent=4) #Same effect as above
Saved file effect:
Load and load deserialization method
loads only completed deserialization,
load only receives file descriptors, completes file reading and deserialization
View source code:
def loads(s, encoding=None, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None,**kw):"""Deserialize ``s``(a ``str`` instance containing a JSON document) to a Python object.
Deserialize a JSON document containing str type into a python object"""
def load(fp, cls=None, object_hook=None, parse_float=None, parse_int=None, parse_constant=None, object_pairs_hook=None,**kw):"""Deserialize ``fp``(a ``.read()``-supporting file-like object containing a JSON document) to a Python object.
Serialize a readable file containing JSON format data into a python object"""
Examples:
>>> json.loads('{"name":"Tom", "age":23}'){'age':23,'name':'Tom'}import json
withopen("test.json","r", encoding='utf-8')as f:
aa = json.loads(f.read())
f.seek(0)
bb = json.load(f) #With json.loads(f.read())print(aa)print(bb)
# Output:
{' name':'Tom','age':23}{'name':'Tom','age':23}
Both the json module and the picle module have four methods: dumps, dump, loads, and load, and the usage is the same.
What's not necessary is that the json module serializes in a common format, which is recognized by other programming languages, which is a normal string.
The picle module serialized only python can recognize, and other programming languages don’t recognize it as garbled characters
But picle can serialize the function, but other files want to use the function, and the file definition is required in the file (the definition and parameters must be the same, and the content can be different)
+- - - - - - - - - - - - - - - - - - - +- - - - - - - - - - - - - - - +| Python | JSON |+===================+===============+| dict | object |+-------------------+---------------+| list, tuple | array |+-------------------+---------------+| str | string |+-------------------+---------------+| int, float | number |+-------------------+---------------+| True |true|+-------------------+---------------+| False |false|+-------------------+---------------+| None |null|+-------------------+---------------+
V. Summary
dumps: No file operation dump: serialization + write file
loads: no file operation load: read file + deserialization
The data serialized by the picle module is only available in python, but it is powerful and can be a serial number function
For the data types that the json module can serialize and deserialize, see the correspondence table of python objects (obj) and json objects
Format and write files using indent = 4
OS.PATH
Split the directory name and return a tuple given by its directory name and base name
Split a pathname. Returns tuple "(head, tail)" where "tail" is
everything after the final slash. Either part may be empty.
>>> os.path.split("/tmp/f1.txt")('/tmp','f1.txt')>>> os.path.split("/home/test.sh")('/home','test.sh')
Split file name and return a tuple consisting of file name and extension
Split the extension from a pathname.
Extension is everything from the last dot to the end, ignoring
leading dots. Returns "(root, ext)"; ext may be empty.
>>> os.path.splitext("/home/test.sh")('/home/test','.sh')>>> os.path.splitext("/tmp/f1.txt")('/tmp/f1','.txt')
# Rename file:>>> os.rename('test.txt','test.py')
# Delete file:>>> os.remove('test.py')
# View the absolute path of the current directory:>>> os.path.abspath('.')'/Users/michael'
# Create a new directory in a directory, first show the full path of the new directory:>>> os.path.join('/Users/michael','testdir')'/Users/michael/testdir'
# Then create a directory:>>> os.mkdir('/Users/michael/testdir')
# Delete a directory:>>> os.rmdir('/Users/michael/testdir')
Recommended Posts