-
-
Notifications
You must be signed in to change notification settings - Fork 179
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Possibility for integrating with automated conda build and deployments as well #633
Comments
Hi @sgbaird thank you very much for proposing this discussion. I currently know just the very basics about packaging with conda.
Therefore, at that moment in time our conclusion was that Maybe something have changed in the ecosystem since then so it might be worth to revisit this decision. I tried to have a look on the linked examples/discussions, but I don't know if I understood them correctly. Maybe you can help with that by summarizing what is the approach you are taking? Is the idea here to not use conda-forge but instead a custom channel and therefore have the recipe files inside the same repository as the main code? |
IMO, many people are nowadays using the If you use This is why |
Hi @mfhepp, thank you very much. The needs of the community and the use cases are very clear now. However I don't understand what is the original suggestion on how we can start to tackle this challenge. In my limited understanding, I have the impression that what the community uses the most is conda-forge, but conda-forge seems to require 2 separated repositories... I still have the same doubts I posted in my previous comment. It would be nice if we can have some clarification so we can start understanding better what would be a possible implementation. |
Hi @abravalheri: Thanks for taking this up so swiftly! I have not yet packaged any work for But as far as I understand from the YAML files linked by @sgbaird, one would basically have to run
https://github.com/sparks-baird/chem_wasserstein seems to be available via PyPi and
In this particular case, @sgbaird seems to use a personal As for the "two repositories" issue: As far a I understand, the build files for This looks like a straightforward and rewarding approach to me, but hopefully @sgbaird or someone from the group can help. |
@abravalheri thanks for the patience with me rejoining the discussion. @mfhepp great summary of both the needs and the basic workflow involved with my suggestion. Uploading to a personal channel via For reference, here is what one of the """Touch up the conda recipe from grayskull using conda-souschef."""
import os
from os.path import join
from souschef.recipe import Recipe
import chem_wasserstein
os.system(
"grayskull pypi {0}=={1}".format(
chem_wasserstein.__name__, chem_wasserstein.__version__
)
)
fpath = join("chem_wasserstein", "meta.yaml")
fpath2 = join("scratch", "meta.yaml")
my_recipe = Recipe(load_file=fpath)
my_recipe["requirements"]["host"].append("flit")
my_recipe.save(fpath)
my_recipe.save(fpath2) Which would produce e.g. the following YAML file based on the corresponding PyPI metadata (in this case for v {% set name = "chem_wasserstein" %}
{% set version = "1.0.8" %}
package:
name: {{ name|lower }}
version: {{ version }}
source:
url: https://pypi.io/packages/source/{{ name[0] }}/{{ name }}/chem_wasserstein-{{ version }}.tar.gz
sha256: 61dfd2069eb2579870f31d16ca156553757cc84824e2356e038cf1ae0bbf2fcf
build:
number: 0
noarch: python
script: {{ PYTHON }} -m pip install . -vv
requirements:
host:
- pip
- flit
- python >=3.6,<3.10
run:
- colorama
- dist_matrix >=1.0.2
- elmd ==0.4.8
- numba >=0.53.1
- pandas
- plotly
- pqdm ==0.1.0
- python >=3.6,<3.10
- scikit-learn
- scipy
- tqdm
- umap-learn
test:
imports:
- chem_wasserstein
commands:
- pip check
requires:
- pip
about:
home: https://pypi.org/project/chem_wasserstein/
summary: A high performance mapping class to construct ElM2D plots from large datasets of inorganic compositions.
license: GPL-3.0
license_file: LICENSE
extra:
recipe-maintainers:
- sgbaird (Note that the One possibility for incorporating this into PyScaffold would be to make a |
Hi @sgbaird, thank you very much for the clarifications.
|
In that case, running grayskull via CLI should be enough.
See pypa/flit#461. Probably not super important here since I doubt many people using PyScaffold would also be using flit. It seemed to be something fairly specific to |
Thank you very much @sgbaird. The following are issues in grayskull that (when/if solved) would simplify also this process a lot: |
Figured these were worth xref-ing: #422 and https://pyscaffold.org/en/stable/dependencies.html#creating-a-conda-package |
@abravalheri I got around to trying out packaging on There's also a conflict between grayskull's
First two lines from
So, it seems to be referring to the first instance of the Might just ignore the |
For now, I can workaround by ignoring |
Hi @sgbaird, regarding the ...
- id: check-yaml
exclude: 'meta.yaml$'
... In terms of
I personally never handle this scenario directly, but I believe |
@abravalheri thanks for the clarifications and suggestions! These are very helpful. |
Mention pyscaffold#633 in the documentation regarding deployment to `conda-forge`, as it contains useful information on current tooling.
Mention #633 in the documentation regarding deployment to `conda-forge`, as it contains useful information on current tooling.
FYI: I personally think that the syncing of PyPi and Conda packages will become a much lesser issue in the future, because of a fundamental shift in the ecosystems for Data Science: In the past, having Conda packages available was mainly motivated by the need to keep your local system/machine clean, because PIP and Conda do not work well in parallel; overtime, you will clutter your local environments etc. In the past months, however, the risk of Supply Chain attacks on the open Python ecosystem has grown so tremendously (the risk has actually always been there, but is now more likely to be exploited) that nobody should install external Python packages etc. lightheartedly on her or his local machine, because
Without jumping into this too deeply: IMO the only viable consequence is to run all Python Data Science workflows (applies to other languages like R, too, btw) in constrained Docker containers (no access to the full file-system, no outbound/inbound network access, ...) or on virtual or actual external machines. Otherwise, a single Now, with this having said, mixing Pip and Conda and Homebrew and Github libraries is no big deal, because you can always create a fresh Docker image from your Dockerfile files. Or even just work without a virtual environment. This might be a bit off-topic, but I think it is important to share this information. If you want a concrete example, read about the PyMafka incident on PyPi. |
Very nice repo! I'll be promoting this tool to some of the researchers in my group.
See marcelotrevisani/souschef#32 for a breakdown of the methods I've been using with
conda
. Figured it was worth mentioning.I have this methodology implemented for a few different repositories, and it's worked pretty nicely.
The text was updated successfully, but these errors were encountered: