Skip to main content
Open In ColabOpen on GitHub

Apify Actor

Overview

Apify Actors are cloud programs designed for a wide range of web scraping, crawling, and data extraction tasks. These actors facilitate automated data gathering from the web, enabling users to extract, process, and store information efficiently. Actors can be used to perform tasks like scraping e-commerce sites for product details, monitoring price changes, or gathering search engine results. They integrate seamlessly with Apify Datasets, allowing the structured data collected by actors to be stored, managed, and exported in formats like JSON, CSV, or Excel for further analysis or use.

Setup

This integration lives in the langchain-apify package. The package can be installed using pip.

%pip install langchain-apify

Prerequisites

  • Apify account: Register your free Apify account here.
  • Apify API token: Learn how to get your API token in the Apify documentation.
import os

os.environ["APIFY_API_TOKEN"] = "your-apify-api-token"
os.environ["OPENAI_API_KEY"] = "your-openai-api-key"

Instantiation

Here we instantiate the ApifyActorsTool to be able to call RAG Web Browser Apify Actor. This Actor provides web browsing functionality for AI and LLM applications, similar to the web browsing feature in ChatGPT. Any Actor from the Apify Store can be used in this way.

from langchain_apify import ApifyActorsTool

tool = ApifyActorsTool("apify/rag-web-browser")

Invocation

The ApifyActorsTool takes a single argument, which is run_input - a dictionary that is passed as a run input to the Actor. Run input schema documentation can be found in the input section of the Actor details page. See RAG Web Browser input schema.

tool.invoke({"run_input": {"query": "what is apify?", "maxResults": 2}})

Chaining

We can provide the created tool to an agent. When asked to search for information, the agent will call the Apify Actor, which will search the web, and then retrieve the search results.

%pip install langgraph langchain-openai
from langchain_core.messages import ToolMessage
from langchain_openai import ChatOpenAI
from langgraph.prebuilt import create_react_agent

model = ChatOpenAI(model="gpt-4o")
tools = [tool]
graph = create_react_agent(model, tools=tools)
inputs = {"messages": [("user", "search for what is Apify")]}
for s in graph.stream(inputs, stream_mode="values"):
message = s["messages"][-1]
# skip tool messages
if isinstance(message, ToolMessage):
continue
message.pretty_print()
================================ Human Message =================================

search for what is Apify
================================== Ai Message ==================================
Tool Calls:
apify_actor_apify_rag-web-browser (call_27mjHLzDzwa5ZaHWCMH510lm)
Call ID: call_27mjHLzDzwa5ZaHWCMH510lm
Args:
run_input: {"run_input":{"query":"Apify","maxResults":3,"outputFormats":["markdown"]}}
================================== Ai Message ==================================

Apify is a comprehensive platform for web scraping, browser automation, and data extraction. It offers a wide array of tools and services that cater to developers and businesses looking to extract data from websites efficiently and effectively. Here's an overview of Apify:

1. **Ecosystem and Tools**:
- Apify provides an ecosystem where developers can build, deploy, and publish data extraction and web automation tools called Actors.
- The platform supports various use cases such as extracting data from social media platforms, conducting automated browser-based tasks, and more.

2. **Offerings**:
- Apify offers over 3,000 ready-made scraping tools and code templates.
- Users can also build custom solutions or hire Apify's professional services for more tailored data extraction needs.

3. **Technology and Integration**:
- The platform supports integration with popular tools and services like Zapier, GitHub, Google Sheets, Pinecone, and more.
- Apify supports open-source tools and technologies such as JavaScript, Python, Puppeteer, Playwright, Selenium, and its own Crawlee library for web crawling and browser automation.

4. **Community and Learning**:
- Apify hosts a community on Discord where developers can get help and share expertise.
- It offers educational resources through the Web Scraping Academy to help users become proficient in data scraping and automation.

5. **Enterprise Solutions**:
- Apify provides enterprise-grade web data extraction solutions with high reliability, 99.95% uptime, and compliance with SOC2, GDPR, and CCPA standards.

For more information, you can visit [Apify's official website](https://apify.com/) or their [GitHub page](https://github.com/apify) which contains their code repositories and further details about their projects.

API reference

For more information on how to use this integration, see the git repository or the Apify integration documentation.


Was this page helpful?