Function Signature:
unified_diff(a, b, fromfile='', tofile='', fromfiledate='', tofiledate='', n=3, lineterm='\n')
Parameters:
a – a list of strings corresponding to version one of the text.
b – a list of strings corresponding to version two of the text.
fromfile – denotes the file name from which the first list of strings “a” came from.
tofile – denotes the file name from which the first list of strings “b” came from.
Note: The Parameters fromfile and tofile are strings specifying the file name for version one and two of the text. This parameter does not mean an existing file nor it opens an existing file. This is a placeholder where the developer can specify a filename for clarity. Also these parameters are as per the output of the diff command.
fromfiledate - The timestamp for version one of the text specified through parameter "a".
tofiledate - The timestamp for version two of the text specified through parameter "a".
n - number of context lines to be emitted in the differences.
lineterm - The line terminator. The default value is "\n".
Return value:
Returns a generator object containing the differences between two versions of text as specified by the parameters a and b.
Function overview:
-
Given two versions of text strings through Python lists or files, the function unified_diff() finds the difference between them and returns the difference in unified diff format.
-
The differences are returned as a generator object.
The unified diff format:
The diff command provides the difference between two versions of texts. The differences can be obtained in two formats – the unified diff format and the context diff format.
Unified diff format describes the changes as hunks or blocks of differences.
The context diff format lists changes as pairs of blocks. The first block of the pair contains changes pertaining to fromfile and the second block contains changes belonging to tofile. In this format, the context lines appear in both the blocks of a pair. Hence the unified diff format provides a compact way of showing the context lines only once.
The unified diff format has three parts.
- The file names with last modified timestamps
- The hunk headers
- The blocks of differences found called hunks.
The file names:
- In unified diff format, the file names and their timestamps precede the differences between the two texts. The first file name has a preceding line containing the characters “---” and the second file name has a line preceding with characters “+++”.
- If no file names are passed to the Python function unified_diff(), the above lines appear blank.
The hunk header:
The differences begin with a hunk header which has the prefix and suffix of “@@”.
The hunk line has the format: "-FromFileLine, fromcount, +ToFileLine, tocount" which states that the text from FromFileLine1 counting upto fromcount lines has changed to lines from ToFileLine counting upto tocount lines.
The differences or the contents of each hunk:
The differences that follow the hunk header consists of three parts.
- The fromfile lines removed in tofile are marked with a prefix “-”.
- The lines added to tofile are marked with a prefix “+”.
- Lines that are not changed in fromfile and tofile are given next. These lines are also called context lines. The number of context lines that appear in the hunks can be controlled through the "n" parameter of the unified_diff() function.
Example:
|
# Example Python program that finds the import difflib # Lists of strings # Find the differences between two lists of strings # Print the differences |
Output:
|
<generator object unified_diff at 0x104d253c0> +++ @@ -1,3 +1,5 @@ Norway |