Skip to content

About SearchLinkGraph #915

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
salihai opened this issue Feb 3, 2025 · 6 comments
Open

About SearchLinkGraph #915

salihai opened this issue Feb 3, 2025 · 6 comments
Labels
bug Something isn't working

Comments

@salihai
Copy link

salihai commented Feb 3, 2025

Even though I applied the filtering parameter, SearchLinkGraph continues to give links with extensions such as .css and .js. What should I do?

graph_config = {

"llm": {
    "model": "ollama/llama3.2",  
    "temperature": 0,  
    "format": "json",  
    "base_url": "http://localhost:11434",  
},
"verbose": True, 
"headless": False, 
"filter_config": {  
    "diff_domain_filter": True,
    "img_exts": [".jpg", ".jpeg", ".png", ".gif", ".bmp", ".svg", ".webp", ".ico"],
    "lang_indicators": ["lang=", "/tr", "/en", "/fr", "/pt", "/es", "/de", "/jp", "/it"],
    "irrelevant_keywords": [
        "/login",
        "/signup",
        "/register",
        "/contact",
        "facebook.net",
        "twitter.com",
        "linkedin.com",
        "instagram.com",
        ".js",
        ".css",
    ],
},

}

Copy link

dosubot bot commented Feb 3, 2025

Hey there, @salihai! 👋 I'm here to help you with any bugs, questions, or contributions while waiting for a human maintainer.

To ensure that the SearchLinkGraph filters out links with extensions such as .css and .js, you should verify that the filter_links setting is enabled. The filtering for irrelevant keywords, including file extensions, is controlled by the filter_links flag. Make sure it is set to True in your configuration.

Here's how you can adjust your configuration:

graph_config = {
    "llm": {
        "model": "ollama/llama3.2",  
        "temperature": 0,  
        "format": "json",  
        "base_url": "<link url='http://localhost:11434'>Webpage description not available.</link>",  
    },
    "verbose": True, 
    "headless": False, 
    "filter_links": True,  # Ensure this is set to True
    "filter_config": {  
        "diff_domain_filter": True,
        "img_exts": [".jpg", ".jpeg", ".png", ".gif", ".bmp", ".svg", ".webp", ".ico"],
        "lang_indicators": ["lang=", "/tr", "/en", "/fr", "/pt", "/es", "/de", "/jp", "/it"],
        "irrelevant_keywords": [
            "/login",
            "/signup",
            "/register",
            "/contact",
            "facebook.net",
            "twitter.com",
            "linkedin.com",
            "instagram.com",
            ".js",
            ".css",
        ],
    },
}

By setting filter_links to True, the filtering mechanism will be activated, and links containing .js and .css should be filtered out as specified in the irrelevant_keywords list [1].

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@dosubot dosubot bot added the bug Something isn't working label Feb 3, 2025
@salihai
Copy link
Author

salihai commented Feb 3, 2025

Adding this didn't work.

@salihai
Copy link
Author

salihai commented Feb 5, 2025

from smart_scraper_multi_graph import SmartScraperMultiGraph

prompt = "What is Chioggia famous for?"
source = ["https://en.wikipedia.org/wiki/Chioggia", "https://example.com"]
config = {
"llm": {"model": "gpt-3.5-turbo"}
}

smart_scraper_multi = SmartScraperMultiGraph(prompt, source, config)

result = smart_scraper_multi.run()

print(result)

When i ran this code, i got this error message:

[WinError 10061] No connection could be made because the target machine actively refused it

What should i do to solve this?

Copy link

dosubot bot commented Feb 5, 2025

The error message "[WinError 10061] No connection could be made because the target machine actively refused it" typically indicates that the server you are trying to connect to is not accepting connections. Here are some steps you can take to resolve this issue:

  1. Check Server Availability: Ensure that the server you are trying to connect to is running and accessible. If you are running a local server, make sure it is started and listening on the correct port.

  2. Verify URL and Port: Double-check the URL and port you are using in your code. Ensure that they are correct and that the server is configured to accept connections on that port.

  3. Network and Firewall Settings: Check your network connection and firewall settings. Ensure that your firewall or any security software is not blocking the connection to the server.

  4. Localhost Configuration: If you are using a local server (e.g., localhost), ensure that it is configured to accept connections from your application. Sometimes, local servers are configured to only accept connections from certain IP addresses.

  5. Service Status: If you are using a third-party service or API, check their status page to see if there are any ongoing issues or maintenance that might be affecting connectivity.

By addressing these areas, you should be able to resolve the connection issue.

To continue talking to Dosu, mention @dosu.


Help Dosu learn! Give it feedback: Great Response | Irrelevant Answer | Incorrect Sources | Too Verbose | Hallucination | Other

@VinciGit00
Copy link
Collaborator

Hi @salihai, I will update

@VinciGit00
Copy link
Collaborator

Hi @salihai please update to the new beta

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
bug Something isn't working
Projects
None yet
Development

No branches or pull requests

2 participants