The zakupki.gov.ru website provides a vast amount of data on government procurement, updated almost daily. The data is presented as pages, each containing links to FTP servers. These servers store hundreds of archives with the necessary information. For our project at Agentom.ru, we needed to automate the process of downloading and processing this data to ensure timely updates and availability for users. Manually performing this task would have been incredibly complex and time-consuming, so we developed a script that automates the entire process.
Development Stages
Collecting FTP Links
The first step was creating a script to collect all FTP links from the zakupki.gov.ru website. The site contains approximately 86 pages, each relating to different regions of Russia. We wrote a Python script that iterates through each page, extracts all links to FTP servers, and saves them to a list for further processing.Automatic Connection and File Download
After collecting all FTP links, our script automatically connects to each server and downloads all files. We used theftplib library in Python, which provides functionality for working with FTP. The script connects to the server, finds all available files, and downloads them to a local storage.
File Unarchiving
The downloaded files are mainly in archive format. We used thezipfile library to unarchive them. A special script iterates through each downloaded archive and unpacks its contents into a separate directory. This process was also automated to save time and reduce the risk of errors.
Data Processing and Structuring
After unarchiving the files, the script processes the XML files. These files contain structured data on government procurement. We wrote a Python parser that extracts the necessary data from the XML files and loads it into a MySQL database. We used themysql-connector-python library to work with the database.
Data Update and Display on Agentom.ru
The final step was to integrate the updated data on the Agentom.ru website. We set up a system that automatically updates the website's data based on the information loaded into the database. This ensures that Agentom.ru users always have access to current government procurement information.Benefits of Automation
Automating the process of downloading and processing government procurement data has brought many advantages:- Time Savings: The automated script performs the task much faster than a person could.
- Accuracy: The risk of human error is reduced when collecting and processing data.
- Timeliness: Website data is updated promptly, providing users with fresh information.
- Efficiency: Reduced labor costs allow us to focus on other important tasks.
Conclusion
The development of a script for the automatic download and processing of government procurement data from zakupki.gov.ru for Agentom.ru has been an important step in improving the functionality and efficiency of our service. This project demonstrated the possibilities of automating complex and time-consuming tasks, allowing us to provide our users with high-quality and up-to-date information in a convenient format.Frequently Asked Questions
What is zakupki.gov.ru?
zakupki.gov.ru is the official website where data on government procurement in Russia is published.
How do you automate downloading data from zakupki.gov.ru?
To automate data download, we developed a Python script that collects FTP links, connects to servers, downloads and unarchives files, and then processes and loads the data into a MySQL database.
What is the benefit of automating government procurement data download?
Automation saves time, reduces the risk of errors, ensures data timeliness, and increases overall efficiency.
How is government procurement data used on Agentom.ru?
Government procurement data loaded into the database is automatically updated on the Agentom.ru website, providing users with current information on procurements.