Home > OS >  Remove all punctuation from string
Remove all punctuation from string

Time:11-27

I am currently working on a pandas dataframe and trying to extract the value from the column that consists of a string within a list, but I am kinda stuck on how to only keep the text I want.

This is how one of list looks like:

["{'BusinessAcceptsCreditCards': 'True'",
 "'RestaurantsPriceRange2': '2'",
 "'ByAppointmentOnly': 'False'",
 "'BikeParking': 'False'",
 '\'BusinessParking\': "{\'garage\': False',
 "'street': True",
 "'validated': False",
 "'lot': False",
 '\'valet\': False}"}']

On the left of the colon, it is the attribute and on the right of the colon, it is the corresponding value. Is there a way for me to go over this list and get rid of all the punctuations in each string and obtain the text only for both the attribute and the corresponding value?

So my idea is to first break with the colon by using the following code:

txt = df_business['attributes'][2]
y = txt.split(", ")
y
y1 = y[0].split(":")
y1
y1[1].strip()

But with the code I have above, I am only able to get the following result:

Attribute = "{'BusinessAcceptsCreditCards'"
Value = "'True'"

The result I want is:

Attribute = "BusinessAcceptsCreditCards"
Value = "True"

Example of the dataframe:

{'business_id': {0: '6iYb2HFDywm3zjuRg0shjw',
  1: 'tCbdrRPZA0oiIYSmHG3J0w',
  2: 'bvN78flM8NLprQ1a1y5dRg',
  3: 'oaepsyvc0J17qwi8cfrOWg',
  4: 'PE9uqAjdw0E4-8mjGl3wVA',
  5: 'D4JtQNTI4X3KcbzacDJsMw',
  6: 't35jsh9YnMtttm69UCp7gw',
  7: 'jFYIsSb7r1QeESVUnXPHBw',
  8: 'N3_Gs3DnX4k9SgpwJxdEfw'},
 'name': {0: 'Oskar Blues Taproom',
  1: 'Flying Elephants at PDX',
  2: 'The Reclaimory',
  3: 'Great Clips',
  4: 'Crossfit Terminus',
  5: 'Bob Likes Thai Food',
  6: 'Escott Orthodontics',
  7: 'Boxwood Biscuit',
  8: 'Lane Wells Jewelry Repair'},
 'address': {0: '921 Pearl St',
  1: '7000 NE Airport Way',
  2: '4720 Hawthorne Ave',
  3: '2566 Enterprise Rd',
  4: '1046 Memorial Dr SE',
  5: '3755 Main St',
  6: '2511 Edgewater Dr',
  7: '740 S High St',
  8: '7801 N Lamar Blvd, Ste A140'},
 'city': {0: 'Boulder',
  1: 'Portland',
  2: 'Portland',
  3: 'Orange City',
  4: 'Atlanta',
  5: 'Vancouver',
  6: 'Orlando',
  7: 'Columbus',
  8: 'Austin'},
 'state': {0: 'CO',
  1: 'OR',
  2: 'OR',
  3: 'FL',
  4: 'GA',
  5: 'BC',
  6: 'FL',
  7: 'OH',
  8: 'TX'},
 'postal_code': {0: '80302',
  1: '97218',
  2: '97214',
  3: '32763',
  4: '30316',
  5: 'V5V',
  6: '32804',
  7: '43206',
  8: '78752'},
 'latitude': {0: 40.0175444,
  1: 45.5889058992,
  2: 45.5119069956,
  3: 28.9144823,
  4: 33.7470274,
  5: 49.2513423,
  6: 28.573998,
  7: 39.947006523,
  8: 30.346169},
 'longitude': {0: -105.2833481,
  1: -122.5933307507,
  2: -122.6136928797,
  3: -81.2959787,
  4: -84.3534244,
  5: -123.101333,
  6: -81.3892841,
  7: -82.997471,
  8: -97.711458},
 'stars': {0: 4.0,
  1: 4.0,
  2: 4.5,
  3: 3.0,
  4: 4.0,
  5: 3.5,
  6: 4.5,
  7: 4.5,
  8: 5.0},
 'review_count': {0: 86,
  1: 126,
  2: 13,
  3: 8,
  4: 14,
  5: 169,
  6: 7,
  7: 11,
  8: 30},
 'is_open': {0: 1, 1: 1, 2: 1, 3: 1, 4: 1, 5: 1, 6: 1, 7: 1, 8: 1},
 'attributes': {0: '{\'RestaurantsTableService\': \'True\', \'WiFi\': "u\'free\'", \'BikeParking\': \'True\', \'BusinessParking\': "{\'garage\': False, \'street\': True, \'validated\': False, \'lot\': False, \'valet\': False}", \'BusinessAcceptsCreditCards\': \'True\', \'RestaurantsReservations\': \'False\', \'WheelchairAccessible\': \'True\', \'Caters\': \'True\', \'OutdoorSeating\': \'True\', \'RestaurantsGoodForGroups\': \'True\', \'HappyHour\': \'True\', \'BusinessAcceptsBitcoin\': \'False\', \'RestaurantsPriceRange2\': \'2\', \'Ambience\': "{\'touristy\': False, \'hipster\': False, \'romantic\': False, \'divey\': False, \'intimate\': False, \'trendy\': False, \'upscale\': False, \'classy\': False, \'casual\': True}", \'HasTV\': \'True\', \'Alcohol\': "\'beer_and_wine\'", \'GoodForMeal\': "{\'dessert\': False, \'latenight\': False, \'lunch\': False, \'dinner\': False, \'brunch\': False, \'breakfast\': False}", \'DogsAllowed\': \'False\', \'RestaurantsTakeOut\': \'True\', \'NoiseLevel\': "u\'average\'", \'RestaurantsAttire\': "\'casual\'", \'RestaurantsDelivery\': \'None\'}',
  1: '{\'RestaurantsTakeOut\': \'True\', \'RestaurantsAttire\': "u\'casual\'", \'GoodForKids\': \'True\', \'BikeParking\': \'False\', \'OutdoorSeating\': \'False\', \'Ambience\': "{\'romantic\': False, \'intimate\': False, \'touristy\': False, \'hipster\': False, \'divey\': False, \'classy\': False, \'trendy\': False, \'upscale\': False, \'casual\': True}", \'Caters\': \'True\', \'RestaurantsReservations\': \'False\', \'RestaurantsDelivery\': \'False\', \'HasTV\': \'False\', \'RestaurantsGoodForGroups\': \'False\', \'BusinessAcceptsCreditCards\': \'True\', \'NoiseLevel\': "u\'average\'", \'ByAppointmentOnly\': \'False\', \'RestaurantsPriceRange2\': \'2\', \'WiFi\': "u\'free\'", \'BusinessParking\': "{\'garage\': True, \'street\': False, \'validated\': False, \'lot\': False, \'valet\': False}", \'Alcohol\': "u\'beer_and_wine\'", \'GoodForMeal\': "{\'dessert\': False, \'latenight\': False, \'lunch\': True, \'dinner\': False, \'brunch\': False, \'breakfast\': True}"}',
  2: '{\'BusinessAcceptsCreditCards\': \'True\', \'RestaurantsPriceRange2\': \'2\', \'ByAppointmentOnly\': \'False\', \'BikeParking\': \'False\', \'BusinessParking\': "{\'garage\': False, \'street\': True, \'validated\': False, \'lot\': False, \'valet\': False}"}',
  3: "{'RestaurantsPriceRange2': '1', 'BusinessAcceptsCreditCards': 'True', 'GoodForKids': 'True', 'ByAppointmentOnly': 'False'}",
  4: '{\'GoodForKids\': \'False\', \'BusinessParking\': "{\'garage\': False, \'street\': False, \'validated\': False, \'lot\': False, \'valet\': False}", \'BusinessAcceptsCreditCards\': \'True\'}',
  5: '{\'GoodForKids\': \'True\', \'Alcohol\': "u\'none\'", \'RestaurantsGoodForGroups\': \'True\', \'RestaurantsReservations\': \'True\', \'BusinessParking\': "{\'garage\': False, \'street\': True, \'validated\': False, \'lot\': False, \'valet\': False}", \'RestaurantsAttire\': "u\'casual\'", \'BikeParking\': \'True\', \'RestaurantsPriceRange2\': \'2\', \'HasTV\': \'False\', \'NoiseLevel\': "u\'average\'", \'WiFi\': "u\'no\'", \'RestaurantsTakeOut\': \'True\', \'Caters\': \'False\', \'OutdoorSeating\': \'False\', \'Ambience\': "{\'romantic\': False, \'intimate\': False, \'classy\': False, \'hipster\': False, \'divey\': False, \'touristy\': False, \'trendy\': False, \'upscale\': False, \'casual\': True}", \'GoodForMeal\': "{\'dessert\': False, \'latenight\': False, \'lunch\': True, \'dinner\': True, \'brunch\': False, \'breakfast\': False}", \'DogsAllowed\': \'False\', \'RestaurantsDelivery\': \'True\'}',
  6: "{'AcceptsInsurance': 'True', 'BusinessAcceptsCreditCards': 'True', 'ByAppointmentOnly': 'True'}",
  7: nan,
  8: '{\'RestaurantsPriceRange2\': \'1\', \'ByAppointmentOnly\': \'False\', \'BusinessParking\': "{\'garage\': False, \'street\': False, \'validated\': False, \'lot\': True, \'valet\': False}", \'BusinessAcceptsCreditCards\': \'True\', \'DogsAllowed\': \'True\', \'RestaurantsDelivery\': \'None\', \'BusinessAcceptsBitcoin\': \'False\', \'BikeParking\': \'True\', \'RestaurantsTakeOut\': \'None\', \'WheelchairAccessible\': \'True\'}'},
 'categories': {0: 'Gastropubs, Food, Beer Gardens, Restaurants, Bars, American (Traditional), Beer Bar, Nightlife, Breweries',
  1: 'Salad, Soup, Sandwiches, Delis, Restaurants, Cafes, Vegetarian',
  2: 'Antiques, Fashion, Used, Vintage & Consignment, Shopping, Furniture Stores, Home & Garden',
  3: 'Beauty & Spas, Hair Salons',
  4: 'Gyms, Active Life, Interval Training Gyms, Fitness & Instruction',
  5: 'Restaurants, Thai',
  6: 'Dentists, Health & Medical, Orthodontists',
  7: 'Breakfast & Brunch, Restaurants',
  8: 'Shopping, Jewelry Repair, Appraisal Services, Local Services, Jewelry, Engraving, Gold Buyers'},
 'hours': {0: "{'Monday': '11:0-23:0', 'Tuesday': '11:0-23:0', 'Wednesday': '11:0-23:0', 'Thursday': '11:0-23:0', 'Friday': '11:0-23:0', 'Saturday': '11:0-23:0', 'Sunday': '11:0-23:0'}",
  1: "{'Monday': '5:0-18:0', 'Tuesday': '5:0-17:0', 'Wednesday': '5:0-18:0', 'Thursday': '5:0-18:0', 'Friday': '5:0-18:0', 'Saturday': '5:0-18:0', 'Sunday': '5:0-18:0'}",
  2: "{'Thursday': '11:0-18:0', 'Friday': '11:0-18:0', 'Saturday': '11:0-18:0', 'Sunday': '11:0-18:0'}",
  3: nan,
  4: "{'Monday': '16:0-19:0', 'Tuesday': '16:0-19:0', 'Wednesday': '16:0-19:0', 'Thursday': '16:0-19:0', 'Friday': '16:0-19:0', 'Saturday': '9:0-11:0'}",
  5: "{'Monday': '17:0-21:0', 'Tuesday': '17:0-21:0', 'Wednesday': '17:0-21:0', 'Thursday': '17:0-21:0', 'Friday': '17:0-21:0', 'Saturday': '17:0-21:0', 'Sunday': '17:0-21:0'}",
  6: "{'Monday': '0:0-0:0', 'Tuesday': '8:0-17:30', 'Wednesday': '8:0-17:30', 'Thursday': '8:0-17:30', 'Friday': '8:0-17:30'}",
  7: "{'Saturday': '8:0-14:0', 'Sunday': '8:0-14:0'}",
  8: "{'Monday': '12:15-17:0', 'Tuesday': '12:15-17:0', 'Wednesday': '12:15-17:0', 'Thursday': '12:15-17:0', 'Friday': '12:15-17:0'}"}}

CodePudding user response:

I want to count the number of time that True and False shows up in each restaurant attribute

You can concatenate all elements of you list and search for the '\bTrue\b' /'\bFalse\b' patterns (\b denotes word boundaries):

s = df['attributes'].fillna('').apply(''.join)
df['nb_True'] = s.str.count(r'\bTrue\b')
df['nb_False'] = s.str.count(r'\bFalse\b')

output:

>>> df[['nb_True', 'nb_False']]
   nb_True  nb_False
0       12        21
1        8        23
2        2         6
3        2         1
4        1         6
5       10        20
6        3         0
7        0         0
8        5         6
  • Related