How to find corrupted PDF files in C# easily

sureshmohan

Suresh Mohan

Posted on October 3, 2019

How to find corrupted PDF files in C# easily

You might have a lot of PDF files on your disc or database; to automate or process the PDF files, you need to find any corrupted files and take necessary actions. But it’s tedious for anyone to open every single file with a PDF reader to check whether it is corrupt or not.

To save effort and time, Syncfusion PDF Library provides you support to identify corrupted PDF files using C# or VB.NET by checking the PDF format syntax from the 2019 Volume 3 beta release.

Let’s dive into the details of how to find corrupted PDF files:

  • PdfDocumentAnalyzer class can be used to find corrupted PDF files by analyzing the PDF document structure and syntax.
  • AnalyzeSyntax() method of PdfDocumentAnalyzer class will initiate analysis of the PDF document structure and syntax and return the result (an instance of SyntaxAnalyzerResult).
  • IsCorrupted property of SyntaxAnalyzerResult is used to identify whether the processed PDF file is corrupted or not.

Using these APIs, you can ensure that a PDF document is not corrupted and start processing it.

For example:

  • To avoid uploading any corrupted PDF report or resume to your web applications.
  • To avoid unexpected behavior or application hanging when invoking PDF print programmatically.

The following C# code example will check whether the given PDF file is corrupted or not.

static void Main(string[] args)
{
   //Load the PDF file as stream.
   using (FileStream pdfStream = new FileStream(inputFile.pdf", FileMode.Open, FileAccess.Read))
   {
        //Create a new instance of PDF document syntax analyzer.
        PdfDocumentAnalyzer analyzer = new PdfDocumentAnalyzer(pdfStream);
        //Analyze the syntax and return the results.
        SyntaxAnalyzerResult analyzerResult = analyzer.AnalyzeSyntax();

        //Check whether the document is corrupted or not.
        if (analyzerResult.IsCorrupted)
        {
            StringBuilder strBuilder = new StringBuilder();
            strBuilder.AppendLine("The PDF document is corrupted.");
            int count = 1;
            foreach (PdfException exception in analyzerResult.Errors)
            {
                strBuilder.AppendLine(count++.ToString() + ": " + exception.Message);
            }
            Console.WriteLine(strBuilder);
        }
        else
        {
            Console.WriteLine("No syntax error found in the provided PDF document");
        }
        analyzer.Close();
    }   
}

Open and repair PDF file

Syncfusion PDF Library can repair basic cross-reference offset issues in PDF files and open them for further processing. This is done using the overloads of PdfLoadedDocument constructors with openAndRepair parameters.

The following code example will repair the basic cross-reference offset issues and open the PDF document.

static void Main(string[] args)
{
    using (FileStream pdfStream = new FileStream(@"input.pdf", FileMode.Open, FileAccess.Read))
    {
        //load the corrupted document by setting the openAndRepair flag to true to repair the document.
        PdfLoadedDocument loadedPdfDocument = new PdfLoadedDocument(pdfStream, true);

        //Do PDF processing.

        //Save the document.
        using (FileStream outputStream = new FileStream(@"result.pdf", FileMode.Create))
        {
            loadedPdfDocument.Save(outputStream);
        }
        //Close the document.
        loadedPdfDocument.Close(true);
    }
}

Note: It cannot repair complex document corruption.

You can use these PDF corruption validation and repair APIs in .NET Framework, .NET Core, UWP, and Xamarin applications.

GitHub Sample

You can download the samples to check for the corrupted PDF files and repair the PDF file from this location.

Conclusion

As you can see, Syncfusion PDF Library provides APIs to find out whether a PDF file is corrupt or not by analyzing its structure and syntax. It also provides APIs to repair basic cross-reference offset-level corruption in PDF files. You can use these to avoid unexpected behavior while processing the PDF files in your .NET applications.

If you are new to our PDF Library, we highly recommend you to follow our Getting Started guide.

If you’re already a Syncfusion user, you can download the product setup here. Otherwise, you can download a free, 30-day trial here.

Have any questions or require clarification about these features? Please let us know in the comments below. You can also contact us through our support forum, Direct-Trac or feedback portal. We are happy to assist you!

If you liked this blog post, we think you’ll also enjoy the following related blog posts:

The post How to detect corrupted PDF files in C# easily appeared first on Syncfusion Blogs.

💖 💪 🙅 🚩
sureshmohan
Suresh Mohan

Posted on October 3, 2019

Join Our Newsletter. No Spam, Only the good stuff.

Sign up to receive the latest update from our blog.

Related