Software Engineer, Data Acquisition - Paris/London
Mistral AI
This job is no longer accepting applications
See open jobs at Mistral AI.See open jobs similar to "Software Engineer, Data Acquisition - Paris/London" General Catalyst.Software Engineering  
France
Posted 6+ months ago
About Mistral 
  - At Mistral AI, we are a tight-knit, nimble team dedicated to bringing our cutting-edge AI technology to the world.  
  - Our mission is to make AI ubiquitous and open.  
  - We are creative, low-ego, team-spirited, and have been passionate about AI for years.  
  - We hire people that foster in competitive environments, because they find them more fun to work in.  
  - We hire passionate women and men from all over the world. 
  - Our teams are distributed between France, UK and USA  
  Role Summary 
  - We are seeking a skilled and motivated Web Crawling and Data Indexing Engineer to join our dynamic engineering team.  
  - The ideal candidate will have a strong background in web scraping, data extraction and indexing, with a focus on leveraging advanced tools and technologies to gather and process large-scale data from various web sources. 
  - The role is based in Paris or London  
  Key Responsibilities 
  - Develop and maintain web crawlers using Python libraries such as Beautiful Soup to extract data from target websites. 
  - Utilize headless browsing techniques, such as Chrome DevTools, to automate and optimize data collection processes. 
  - Collaborate with cross-functional teams to identify, scrape, and integrate data from APIs to support business objectives. 
  - Create and implement efficient parsing patterns using regular expressions, XPaths, and CSS selectors to ensure accurate data extraction. 
  - Design and manage distributed job queues using technologies such as Redis, Kubernetes, and Postgres to handle large-scale data processing tasks. 
  - Develop strategies to monitor and ensure data quality, accuracy, and integrity throughout the crawling and indexing process. 
  - Continuously improve and optimize existing web crawling infrastructure to maximize efficiency and adapt to new challenges. 
  Qualifications & profile 
  - Bachelor’s or master’s degree in computer science, information systems, or information technology 
  - Strong understanding of web technologies, data structures, and algorithms.  
  - They should have knowledge of database management systems and data warehousing. 
  - Programming Languages: Proficiency in programming languages such as Python, Java, or C++ is essential.  
  - Masterings of Web Technologies: Understanding of HTML, CSS, and JavaScript is crucial to navigate and scrape data from websites.  
  - Knowledge of HTTP and HTTPS protocols 
  - A good understanding of data structures (like queues, stacks, and hash maps) and algorithms is necessary  
  - Knowledge of databases (SQL or NoSQL) is important to store and manage the crawled data. 
  - Understanding distributed systems and technologies like Hadoop or Spark Experience using web Scraping Libraries and Frameworks like Scrapy, BeautifulSoup, Selenium, or MechanicalSoup 
  - Understanding how search engines work and how to optimize web crawling. 
  - Experience in Machine Learning to improve the efficiency and accuracy of web crawling 
  - Familiar with tools such as Pandas, NumPy, and Matplotlib to analyze and visualize data.  
  Benefits 
  - Daily lunch vouchers  
  - Contribution to a Gympass subscription  
  - Monthly contribution to a mobility pass  
  - Full health insurance for you and your family  
  - Generous parental leave policy  
This job is no longer accepting applications
See open jobs at Mistral AI.See open jobs similar to "Software Engineer, Data Acquisition - Paris/London" General Catalyst.