Home > Software engineering >  I am trying to scrape on content from the article but its div includes many other things in scrapy s
I am trying to scrape on content from the article but its div includes many other things in scrapy s

Time:10-23

I am trying to scrape on content from the article but its div includes many other things

I am getting this whole run the code Article_content = '' can anyone fix this code to get only the article's content?

Here is the URL
https://www.fodors.com/world/asia/india/experiences/news/things-you-need-to-know-before-you-visit-india

This is my Code

 # Content = {}
    # header,paragraphs  = "",[]
    # for element in response.xpath('//*[@]/*'):
    #     tag = element.re(r"<(\w )\s")  # get the tag name
    #     # if its a paragraph add it to the paragraph list
    #     if tag[0] == "p":              
    #         paragraphs  = element.xpath(".//text()").getall()
    #     # if it's a heading place the heading and paragraphs in the
    #     # dictionary and start a new heading with the current text.
    #     elif tag[0] == "h3":
    #         Content[header] = ''.join(paragraphs).strip()
    #         header = ' '.join(element.xpath(".//text()").getall()).strip()
    #         paragraphs = []

    
    Article_Content = response.xpath('//*[@]/text()')
    Content = '\n'.join(Article_Content.getall()).strip()

    yield{
        'Category':Category,
        'Headlines':Headlines,
        'Author': Author,
        'Source': Source,
        'Publication Date': Published_Date,
        'Feature_Image': Feature_Image,
        'Article Content': Content
    }

CodePudding user response:

You need to use more precise locator.
Instead of the parent block element locator try using the following locator:

'//*[@]//p | //*[@]//h2'

This locator matches all the text elements inside that block.
This will give you a list of web objects. Now, you will have to iterate over that list end extract each text content separately.

CodePudding user response:

You can try the next example:

import scrapy

class TestSpider(scrapy.Spider):
    name = 'test'  
    def start_requests(self):
         yield scrapy.Request ('https://www.fodors.com/world/asia/india/experiences/news/things-you-need-to-know-before-you-visit-india',
         callback = self.parse
         )

    def parse(self, response):
        content = response.xpath('//*[@] | //*[@]/following-sibling::p')
        yield {'content':''.join([x.get() for x in content.xpath('.//text()')])}

Output:

{'content': 'Saying that India is vast is an understatement. Its population of 1.38 billion ascribes to many different religions and beliefs. People are spread across 28 states and eight union territories; they communicate in hundreds of languages and local dialects and practice diverse cultural norms.From bustling modern cities and iconic architectural wonders like the Taj Mahal and Hawa Mahal to natural attractions, such as Araku Valley’s Borra Caves and the massive salt marshes of the Great Rann of Kutch, this South Asian nation is a wondrous place to visit. However, it 
can also be daunting for a first-time traveler, especially one whose culture and belief systems drastically differ from the average Indian.The myriad cultural and religious differences, and the contrast between the urban and rural areas, mean there are a lot of dos and don’ts in order to be respectful and considerate of the local people and their customs. This etiquette could be as simple as removing your shoes when entering sacred places or as nuanced as the 
way you dress and express public displays of affection in certain circumstances. To make things easier, we have put 
together ten things to remember when traveling to India.\xa0When visiting a temple, mosque, or Sikh temple (known as gurudwara), you may witness many shoes lined up outside the doors. This is because footwear is not allowed in sacred places, which also applies to most homes. According to Shoba Mohan, Founder of RARE India, “The shoe adorns the lowest part of your body—where everything collects and settles— you take that off in reverence and leave the dust of the world outside.” RARE India offers travelers boutique hotel and destination recommendations that promote conservation and community-based tourism.“In many of the religious spaces like Hindu temples and Sikh gurudwaras, you have to wash your feet or walk through a constantly replenished shallow tank with water,” she adds. Upon entering the religious space, Mohan recommends bowing one’s head to pay respect. “Apex is your head, the resting place of all your ego and thought, and you bow your head to pay obeisance, to set your ego aside and acknowledge a force greater than oneself.”It’s also recommended not to talk loudly or to take photographs of the sanctum, as they are considered to be the energy center and highly powerful. Also, refrain from photographing others in the space out of respect for privacy. If you are visiting temples that are no longer active, where there is no deity or rituals conducted, then these rules don’t apply.Many Indian households leave shoes and sandals outside to keep their homes clean from the dirt from the streets. This cultural norm may not be as common in bigger cities, but it’s good to either ask or observe where footwear is left before entering someone’s home.In all places of worship, wearing conservative clothing is considered a sign of respect. Entering a mosque or a gurudwara requires a head covering. Even in churches, which generally do not have this rule, women can be seen doing so.If your itinerary brings you to different locations throughout the 
day, Mohan recommends wearing conservative clothing to be on the safe side. “The rule book is how much skin are you 
showing and how exposed are you–the less, the better,” she shares. “Even as an Indian, carrying a shrug or a scarf when there is a multi-interest activity has always helped me.”While women can generally be seen wearing clothes as they please in hotels, bars, and restaurants in urban centers, it is advisable to dress modestly in public places, trains, historical places, and other areas where you may encounter others. Often, smaller towns and villages are not used to seeing foreigners, and an inappropriate dress code could bring unwanted attention.Lisa Alam Shah, Executive Director of Micato India, based in Delhi, says it is common to see short skirts and shorts in her cosmopolitan city. However, there are still conservative homes in the city where women are expected to dress conservatively. “Some would say no shorts/short skirts, arms covered; the very conservative homes may expect women to wear Indian clothes.” Shah states this is also done out of respect for the elders in the home. “You may dress as you like when out with friends, but be a little more conservative with your grandparents.”Another reason to cover up, according to Mohan, is the annoyance of mosquitoes; depending on the season, your exposed skin is a target for attack.The Hindu caste system, 
which originated around 1000-1500 B.C., dictates a person’s standing, their earning capacity, and whom they can marry. This hierarchical system kept the population segregated and resulted in prejudice towards those in the lower castes (such as the Dalits).In modern India, these caste divisions are still relevant in people’s day-to-day lives, and 
caste discrimination remains a systemic problem. Tensions run high when the topic comes up, says Ruksana Hussain, a 
Communication Consultant based in California. “It is very possible your host is of one caste and the maid working in their house, or their driver is of a lesser caste, and they may all have strong, opposing views about this,” she says. Given the topic’s sensitivity, it is best to refrain from asking about someone’s caste or even discussing it when visiting.Another sensitive topic to avoid is politics, whether you are at a party or visiting someone. Indian politics isn’t just relegated to the ruling government, it includes state governments at the local level, whereby multiple parties, each with their own agenda, vie for votes.“Should you find yourself in a room where people invested in several opposing political parties are present, and choose to discuss neighborhood poverty or lack of infrastructure 
or rising crime, then it could lead to a tense situation very quickly,” warns Hussain. As a guest, she adds, “you will visit and leave, but that [discussion] can continue to linger as a sour spot for your hosts for a long time after.”A majority of Indians are still not comfortable with public displays of affection between couples, and, to that effect, PDA is still rare. India is also quite contradictory, with some kinds of intimate touch culturally normalized, says Dr. Anu Taranath, professor and racial equity consultant and author of the award-winning book Beyond Guilt Trips: Mindful Travel in an Unequal World. During your travels, you may witness close intimacy between male friends, who casually put their arms around each other’s shoulders or hold hands while walking or hanging out in a public park.“While this kind of touch isn’t necessarily seen as sexual, public displays of partner intimacy [e.g., kissing] between couples–both same-sex and heterosexual–are discouraged and will elicit immediate unwelcome attention,” says Dr. 
Taranath.Any kind of PDA will get noticed, sometimes commented upon, or gawked at. This is “less so perhaps in a nightclub in Delhi or Mumbai, but, in general, PDA is not common even in cities,” says Shah. The extent of PDA you will likely see is a couple holding hands or in a loose embrace, but it stops there. Be extra careful in villages and small towns when expressing affection for your loved ones.In 2018, the Supreme Court of India repealed a British colonial-era penal code known as Section 377 that criminalized homosexuality. While this was a win for the LGBTQ community, long-standing prejudices and cultural attitudes remain, says Dr. Taranath. LGBTQ couples are welcomed by the well-established hospitality industry and will feel safe traveling in big cities. However, that may not be the case in small rural areas. “LGBTQ couples may experience curiosity, continuous staring, or active prejudice,” she says.One caveat is that foreigners are considered a novelty and, thus, are allowed and even expected to do things that local people would not. This gives these visitors–LGBTQ and otherwise–more leeway, says Dr. Taranath. However, it is recommended to abide by the cultural norms and tone down the PDA.Similar to not entering a home or sacred place with shoes 
on because feet are perceived as unclean, many homes observe another custom whereby they don’t touch any objects or 
people with their feet. “One wouldn’t touch anything in a home temple with a foot. Or a book which is associated with knowledge,” says Shah.In some households, even touching someone with your feet is considered inappropriate and disrespectful, according to Mohan. “It is not only a Hindu custom but prevalent among all cultures in India. The sentiment appears as metaphors in our myths, folklore, and proverbs. To be under one’s feet is the lowest position for anyone,” she adds. If you inadvertently touch someone or a book with your feet, apologize to show that you didn’t mean 
disrespect.Hindus make up close to 80% of India’s population, and Muslims account for 14.2%, according to a Pew research study. Typically, Hindus refrain from eating beef as the cow is considered holy, and Muslims do not consume pork as it’s forbidden in Islam. When dining out, it’s important to be cognizant and respectful of these religious ideologies.In major Indian cities, pork is available, but beef is still a matter of much debate. If it is offered on the menu, it’s acceptable to order, but keep in mind that it could be buffalo meat or of not good quality. This may not be an issue if you are at a popular, high-end hotel or restaurant. In smaller restaurants and small towns, if beef 
or pork is not on the menu, it could be offensive to order it.“Indians are usually inquisitive, even among themselves. The elderly will freely ask you about your married status and anything else they want to know,” shares Mohan. For people not used to sharing personal information, this may feel like an invasion of privacy. However, it’s important to know that this comes from a place of curiosity and never ill will.“In a social setting, one can smile and ignore or simply laugh it off and say one doesn’t normally get asked a question like that. Or that you’re not comfortable 
to answer–with a smile,” recommends Shah.In another scenario, white tourists may encounter villagers who have never 
seen someone like them before in their life, and ask to be photographed with them. While this may be an ego boost, this friendly interaction, Dr. Taranath explains, “is actually rooted in the problematic colonial ‘white is better’ dynamics.” She recommends simply smiling and shaking your head as you decline. “Learn how to say ‘no’ in the local dialect of wherever you are. Say, ‘Thank you, no’ [in the Indian language] and move on.”If they insist or want to chat, it’s perfectly acceptable to refuse or point to your watch and say you have to go. Traveling with a tour operator 
like Micato can ensure that there is an Indian guide with you who can help divert or block inappropriate questioning and requests.The phrase “Thank you” is not commonly used in Indian rhetoric. Indians generally express their gratitude using body language, with a smile and a nod, or sometimes with a namaste. On a typical traveler’s journey, especially in hotels and tourist areas, saying thank you is well accepted. This is not the case when you visit an extremely remote place where people don’t understand English and/or are shy.“Using it [the term thank you] with humility, showing gratitude through body language and facial expressions, is very effective and could be used with a thank you,” recommends Shah.Similarly, shaking hands or hugging is not widely performed when greeting someone, especially in a world still experiencing a pandemic. “Even hotel staff have been trained not to shake hands, therefore, in this new normal, I’d say simply folding your hands in a namaste is best,” suggests Shah.Overall, a basic rule of thumb when 
visiting India is to be kind, understanding, and humble. Indian people are extremely welcoming of visitors from different cultures and are happy to counsel without judgment. The rest of it, says Mohan, “is your cultural intelligence and acceptance of a new culture and how respectful you wish to be while you are there.”'}
  • Related